[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561

jhuber6 · 2024-06-24T21:39:26Z

Summary:
The clang-nvlink-wrapper is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard .cubin, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU libc implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.

<install>/lib/nvptx64-nvidia-cuda/libc.a
<install>/lib/nvptx64-nvidia-cuda/libc++.a
<install>/lib/nvptx64-nvidia-cuda/libomp.a
<install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a

Linking in these libraries will then simply require passing -lc like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient nvlink linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
ld.lld, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.
This also copies some amount of logic from the clang-linker-wrapper,
but not enough for it to be worthwhile to merge them I feel. In the
future it may be possible to delete that handling from there entirely.

llvmbot · 2024-06-24T21:40:01Z

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
The clang-nvlink-wrapper is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard .cubin, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU libc implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.

&lt;install&gt;/lib/nvptx64-nvidia-cuda/libc.a
&lt;install&gt;/lib/nvptx64-nvidia-cuda/libc++.a
&lt;install&gt;/lib/nvptx64-nvidia-cuda/libomp.a
&lt;install&gt;/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a

Linking in these libraries will then simply require passing -lc like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient nvlink linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
ld.lld, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.

Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff

8 Files Affected:

(modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53)
(modified) clang/lib/Driver/ToolChains/Cuda.h (+3)
(modified) clang/test/Driver/cuda-cross-compiling.c (+4-4)
(added) clang/test/Driver/nvlink-wrapper.c (+64)
(modified) clang/tools/CMakeLists.txt (+1)
(added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44)
(added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671)
(added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-    SmallString<256> Filename(Output.getFilename());
-    llvm::sys::path::replace_extension(Filename, "cubin");
-    OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
     C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+    addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+                  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto &II : Inputs) {
-    if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
-        II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
-      C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-          << getToolChain().getTripleString();
-      continue;
-    }
-
-    // The 'nvlink' application performs RDC-mode linking when given a '.o'
-    // file and device linking when given a '.cubin' file. We always want to
-    // perform device linking, so just rename any '.o' files.
-    // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-    if (II.isFilename()) {
-      auto InputFile = getToolChain().getInputFilename(II);
-      if (llvm::sys::path::extension(InputFile) != ".cubin") {
-        // If there are no actions above this one then this is direct input and
-        // we can copy it. Otherwise the input is internal so a `.cubin` file
-        // should exist.
-        if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-          const char *CubinF =
-              Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-                  llvm::sys::path::stem(InputFile), "cubin"));
-          if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-            continue;
-
-          CmdArgs.push_back(CubinF);
-        } else {
-          SmallString<256> Filename(InputFile);
-          llvm::sys::path::replace_extension(Filename, "cubin");
-          CmdArgs.push_back(Args.MakeArgString(Filename));
-        }
-      } else {
-        CmdArgs.push_back(Args.MakeArgString(InputFile));
-      }
-    } else if (!II.isNothing()) {
-      II.getInputArg().renderAsInput(Args, CmdArgs);
-    }
-  }
-
   C.addCommand(std::make_unique<Command>(
       JA, *this,
       ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
                           "--options-file"},
-      Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
-      Inputs, Output));
+      Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+      CmdArgs, Inputs, Output));
 }
 
 void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +908,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
   if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
     return ToolChain::getInputFilename(Input);
 
-  // Replace extension for object files with cubin because nvlink relies on
-  // these particular file names.
-  SmallString<256> Filename(ToolChain::getInputFilename(Input));
-  llvm::sys::path::replace_extension(Filename, "cubin");
-  return std::string(Filename);
+  return ToolChain::getInputFilename(Input);
 }
 
 llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 43c17ba7c0ba0..0735c36f116bc 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
   bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
     return false;
   }
+  bool HasNativeLLVMSupport() const override { return true; }
   bool isPICDefaultForced() const override { return false; }
   bool SupportsProfiling() const override { return false; }
 
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
     return &HostTC.getTriple();
   }
 
+  bool HasNativeLLVMSupport() const override { return false; }
+
   std::string getInputFilename(const InputInfo &Input) const override;
 
   llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..f839c36d23e51 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
 // RUN:   | FileCheck -check-prefix=ARGS %s
 
 //      ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].o"
 
 //
 // Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
@@ -55,7 +55,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINK %s
 
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.o"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
 
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
 
 // Tests for handling a missing architecture.
 //
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..488bcd062467f
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,64 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int y;
+extern int __attribute__((weak)) w;
+
+int bar() {
+  return x + y + w;
+}
+#else
+int __attribute__((visibility("hidden"))) x = 999;
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
 add_clang_subdirectory(clang-fuzzer)
 add_clang_subdirectory(clang-import-test)
 add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
 add_clang_subdirectory(clang-offload-packager)
 add_clang_subdirectory(clang-offload-bundler)
 add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-nvlink-wrapper/CMakeLists.txt b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
new file mode 100644
index 0000000000000..d46f66994cf39
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
@@ -0,0 +1,44 @@
+set(LLVM_LINK_COMPONENTS
+  ${LLVM_TARGETS_TO_BUILD}
+  BitWriter
+  Core
+  BinaryFormat
+  MC
+  Target
+  TransformUtils
+  Analysis
+  Passes
+  IRReader
+  Object
+  Option
+  Support
+  TargetParser
+  CodeGen
+  LTO
+  )
+
+set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
+tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
+add_public_tablegen_target(NVLinkWrapperOpts)
+
+if(NOT CLANG_BUILT_STANDALONE)
+  set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
+endif()
+
+add_clang_tool(clang-nvlink-wrapper
+  ClangNVLinkWrapper.cpp
+
+  DEPENDS
+  ${tablegen_deps}
+  )
+
+set(CLANG_NVLINK_WRAPPER_LIB_DEPS
+  clangBasic
+  )
+
+target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
+
+target_link_libraries(clang-nvlink-wrapper
+  PRIVATE
+  ${CLANG_NVLINK_WRAPPER_LIB_DEPS}
+  )
diff --git a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
new file mode 100644
index 0000000000000..d48bf4e37ecfe
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
@@ -0,0 +1,671 @@
+//===-- clang-nvlink-wrapper/ClangNVLinkWrapper.cpp - NVIDIA linker util --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This tool wraps around the NVIDIA linker called 'nvlink'. The NVIDIA linker
+// is required to create NVPTX applications, but does not support common
+// features like LTO or archives. This utility wraps around the tool to cover
+// its deficiencies. This tool can be removed once NVIDIA improves their linker
+// or ports it to `ld.lld`.
+//
+//===---------------------------------------------------------------------===//
+
+#include "clang/Basic/Version.h"
+
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/BinaryFormat/Magic.h"
+#include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/IR/DiagnosticPrinter.h"
+#include "llvm/LTO/LTO.h"
+#include "llvm/Object/Archive.h"
+#include "llvm/Object/ArchiveWriter.h"
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Object/IRObjectFile.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Option/ArgList.h"
+#include "llvm/Option/OptTable.h"
+#include "llvm/Option/Option.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileOutputBuffer.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/InitLLVM.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/StringSaver.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/WithColor.h"
+
+using namespace llvm;
+using namespace llvm::opt;
+using namespace llvm::object;
+
+static void printVersion(raw_ostream &OS) {
+  OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
+}
+
+/// The value of `argv[0]` when run.
+static const char *Executable;
+
+/// Temporary files to be cleaned up.
+static SmallVector<SmallString<128>> TempFiles;
+
+/// Codegen flags for LTO backend.
+static codegen::RegisterCodeGenFlags CodeGenFlags;
+
+namespace {
+/// Must not overlap with llvm::opt::DriverFlag.
+enum WrapperFlags {
+  WrapperOnlyOption = (1 << 4), // Options only used by the linker wrapper.
+  DeviceOnlyOption = (1 << 5),  // Options only used for device linking.
+};
+
+enum ID {
+  OPT_INVALID = 0, // This is not an option ID.
+#define OPTION(...) LLVM_MAKE_OPT_ID(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+  LastOption
+#undef OPTION
+};
+
+#define PREFIX(NAME, VALUE)                                                    \
+  static constexpr StringLiteral NAME##_init[] = VALUE;                        \
+  static constexpr ArrayRef<StringLiteral> NAME(NAME##_init,                   \
+                                                std::size(NAME##_init) - 1);
+#include "NVLinkOpts.inc"
+#undef PREFIX
+
+static constexpr OptTable::Info InfoTable[] = {
+#define OPTION(...) LLVM_CONSTRUCT_OPT_INFO(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+#undef OPTION
+};
+
+class WrapperOptTable : public opt::GenericOptTable {
+public:
+  WrapperOptTable() : opt::GenericOptTable(InfoTable) {}
+};
+
+const OptTable &getOptTable() {
+  static const WrapperOptTable *Table = []() {
+    auto Result = std::make_unique<WrapperOptTable>();
+    return Result.release();
+  }();
+  return *Table;
+}
+
+[[noreturn]] void reportError(Error E) {
+  outs().flush();
+  logAllUnhandledErrors(std::move(E), WithColor::error(errs(), Executable));
+  exit(EXIT_FAILURE);
+}
+
+void diagnosticHandler(const DiagnosticInfo &DI) {
+  std::string ErrStorage;
+  raw_string_ostream OS(ErrStorage);
+  DiagnosticPrinterRawOStream DP(OS);
+  DI.print(DP);
+
+  switch (DI.getSeverity()) {
+  case DS_Error:
+    WithColor::error(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Warning:
+    WithColor::warning(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Note:
+    WithColor::note(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Remark:
+    WithColor::remark(errs()) << ErrStorage << "\n";
+    break;
+  }
+}
+
+Expected<StringRef> createTempFile(const ArgList &Args, const Twine &Prefix,
+                                   StringRef Extension) {
+  SmallString<128> OutputFile;
+  if (Args.hasArg(OPT_save_temps)) {
+    (Prefix + "." + Extension).toNullTerminatedStringRef(OutputFile);
+  } else {
+    if (std::error_code EC =
+            sys::fs::createTemporaryFile(Prefix, Extension, OutputFile))
+      return createFileError(OutputFile, EC);
+  }
+
+  TempFiles.emplace_back(std::move(OutputFile));
+  return TempFiles.back();
+}
+
+Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
+  ErrorOr<std::string> Path = sys::findProgramByName(Name, Paths);
+  if (!Path)
+    Path = sys::findProgramByName(Name);
+  if (!Path)
+    return createStringError(Path.getError(),
+                             "Unable to find '" + Name + "' in path");
+  return *Path;
+}
+
+std::optional<std::string> findFile(StringRef Dir, StringRef Root,
+                                    const Twine &Name) {
+  SmallString<128> Path;
+  if (Dir.starts_with("="))
+    sys::path::append(Path, Root, Dir.substr(1), Name);
+  else
+    sys::path::append(Path, Dir, Name);
+
+  if (sys::fs::exists(Path))
+    return static_cast<std::string>(Path);
+  return std::nullopt;
+}
+
+std::optional<std::string>
+findFromSearchPaths(StringRef Name, StringRef Root,
+                    ArrayRef<StringRef> SearchPaths) {
+  for (StringRef Dir : SearchPaths)
+    if (std::optional<std::string> File = findFile(Dir, Root, Name))
+      return File;
+  return std::nullopt;
+}
+
+std::optional<std::string>
+searchLibraryBaseName(StringRef Name, StringRef Root,
+                      ArrayRef<StringRef> SearchPaths) {
+  for (StringRef Dir : SearchPaths)
+    if (std::optional<std::string> File =
+            findFile(Dir, Root, "lib" + Name + ".a"))
+      return File;
+  return std::nullopt;
+}
+
+/// Search for static libraries in the linker's library path given input like
+/// `-lfoo` or `-l:libfoo.a`.
+std::optional<std::string> searchLibrary(StringRef Input, StringRef Root,
+                                         ArrayRef<StringRef> SearchPaths) {
+  if (Input.starts_with(":") || Input.ends_with(".lib"))
+    return findFromSearchPaths(Input.drop_front(), Root, SearchPaths);
+  return searchLibraryBaseName(Input, Root, SearchPaths);
+}
+
+void printCommands(ArrayRef<StringRef> CmdArgs) {
+  if (CmdArgs.empty())
+    return;
+
+  llvm::errs() << " \"" << CmdArgs.front() << "\" ";
+  for (auto IC = std::next(CmdArgs.begin()), IE = CmdArgs.end(); IC != IE; ++IC)
+    llvm::errs() << *IC << (std::next(IC) != IE ? " " : "\n");
+}
+
+/// A minimum symbol interface that provides the necessary information to
+/// extract archive members and resolve LTO symbols.
+struct Symbol {
+  enum Flags {
+    None = 0,
+    Undefined = 1 << 0,
+    Weak = 1 << 1,
+  };
+
+  Symbol()
+      : File(), Flags(Undefined), Name(), UsedInRegularObj(false), Lazy(false) {
+  }
+
+  Symbol(MemoryBufferRef File, const irsymtab::Reader::SymbolRef Sym, bool Lazy)
+      : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+    if (Sym.isUndefined())
+      Flags |= Undefined;
+    if (Sym.isWeak())
+      Flags |= Weak;
+    Name = Sym.getName();
+  }
+
+  Symbol(MemoryBufferRef File, const SymbolRef Sym, bool Lazy)
+      : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+    auto FlagsOrErr = Sym.getFlags();
+    if (!FlagsOrErr)
+      reportError(FlagsOrErr.takeError());
+    if (*FlagsOrErr & SymbolRef::SF_Undefined)
+      Flags |= Undefined;
+    if (*FlagsOrErr & SymbolRef::SF_Weak)
+      Flags |= Weak;
+
+    auto NameOrErr = Sym.getName();
+    if (!NameOrErr)
+      reportError(NameOrErr.takeError());
+    Name = *NameOrErr;
+  }
+
+  Symbol Resolve(Symbol Other) {
+    if (File.getBuffer().empty())
+      return Other.Lazy ? *this : Other;
+    if (Other.isUndefined())
+      return *this;
+    if (isWeak() && isUndefined() && Other.Lazy)
+      return *this;
+    if (isWeak() && !Other.isWeak())
+      return Other;
+    if (isUndefined() && !Other.isUndefined())
+      return Other;
+    return *this;
+  }
+
+  bool isWeak() const { return Flags & Weak; }
+  bool isUndefined() const { return Flags & Undefined; }
+
+  MemoryBufferRef File;
+  uint32_t Flags;
+  StringRef Name;
+  bool UsedInRegularObj;
+  bool Lazy;
+};
+
+Expected<StringRef> runPTXAs(StringRef File, const ArgList &Args) {
+  std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str();
+  Expected<std::string> PTXAsPath = findProgram("ptxas", {CudaPath + "/bin"});
+  if (!PTXAsPath)
+    return PTXAsPath.takeError();
+
+  auto TempFileOrErr =
+      creat...
[truncated]

llvmbot · 2024-06-24T21:40:01Z

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The clang-nvlink-wrapper is a utility that I removed awhile back
during the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.

While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard .cubin, and rejects link
jobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU libc implementation,
where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.

The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.

&lt;install&gt;/lib/nvptx64-nvidia-cuda/libc.a
&lt;install&gt;/lib/nvptx64-nvidia-cuda/libc++.a
&lt;install&gt;/lib/nvptx64-nvidia-cuda/libomp.a
&lt;install&gt;/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a

Linking in these libraries will then simply require passing -lc like
is already done for non-GPU toolchains. However, this doesn't work with
the currently deficient nvlink linker, so I consider this a blocking
issue to massively improving the state of building GPU libraries.

In the future we may be able to convince NVIDIA to port their linker to
ld.lld, but for now this is the only workable solution that allows us
to hack around the weird behavior of their closed-source software.

Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff

8 Files Affected:

(modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53)
(modified) clang/lib/Driver/ToolChains/Cuda.h (+3)
(modified) clang/test/Driver/cuda-cross-compiling.c (+4-4)
(added) clang/test/Driver/nvlink-wrapper.c (+64)
(modified) clang/tools/CMakeLists.txt (+1)
(added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44)
(added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+671)
(added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+68)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-    SmallString<256> Filename(Output.getFilename());
-    llvm::sys::path::replace_extension(Filename, "cubin");
-    OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
     C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+    addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+                  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto &II : Inputs) {
-    if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
-        II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
-      C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-          << getToolChain().getTripleString();
-      continue;
-    }
-
-    // The 'nvlink' application performs RDC-mode linking when given a '.o'
-    // file and device linking when given a '.cubin' file. We always want to
-    // perform device linking, so just rename any '.o' files.
-    // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-    if (II.isFilename()) {
-      auto InputFile = getToolChain().getInputFilename(II);
-      if (llvm::sys::path::extension(InputFile) != ".cubin") {
-        // If there are no actions above this one then this is direct input and
-        // we can copy it. Otherwise the input is internal so a `.cubin` file
-        // should exist.
-        if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-          const char *CubinF =
-              Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-                  llvm::sys::path::stem(InputFile), "cubin"));
-          if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-            continue;
-
-          CmdArgs.push_back(CubinF);
-        } else {
-          SmallString<256> Filename(InputFile);
-          llvm::sys::path::replace_extension(Filename, "cubin");
-          CmdArgs.push_back(Args.MakeArgString(Filename));
-        }
-      } else {
-        CmdArgs.push_back(Args.MakeArgString(InputFile));
-      }
-    } else if (!II.isNothing()) {
-      II.getInputArg().renderAsInput(Args, CmdArgs);
-    }
-  }
-
   C.addCommand(std::make_unique<Command>(
       JA, *this,
       ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
                           "--options-file"},
-      Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
-      Inputs, Output));
+      Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+      CmdArgs, Inputs, Output));
 }
 
 void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +908,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
   if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
     return ToolChain::getInputFilename(Input);
 
-  // Replace extension for object files with cubin because nvlink relies on
-  // these particular file names.
-  SmallString<256> Filename(ToolChain::getInputFilename(Input));
-  llvm::sys::path::replace_extension(Filename, "cubin");
-  return std::string(Filename);
+  return ToolChain::getInputFilename(Input);
 }
 
 llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 43c17ba7c0ba0..0735c36f116bc 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
   bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
     return false;
   }
+  bool HasNativeLLVMSupport() const override { return true; }
   bool isPICDefaultForced() const override { return false; }
   bool SupportsProfiling() const override { return false; }
 
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
     return &HostTC.getTriple();
   }
 
+  bool HasNativeLLVMSupport() const override { return false; }
+
   std::string getInputFilename(const InputInfo &Input) const override;
 
   llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..f839c36d23e51 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
 // RUN:   | FileCheck -check-prefix=ARGS %s
 
 //      ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].o"
 
 //
 // Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
@@ -55,7 +55,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINK %s
 
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.o"
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
 
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
 
 // Tests for handling a missing architecture.
 //
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..488bcd062467f
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,64 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int y;
+extern int __attribute__((weak)) w;
+
+int bar() {
+  return x + y + w;
+}
+#else
+int __attribute__((visibility("hidden"))) x = 999;
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
 add_clang_subdirectory(clang-fuzzer)
 add_clang_subdirectory(clang-import-test)
 add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
 add_clang_subdirectory(clang-offload-packager)
 add_clang_subdirectory(clang-offload-bundler)
 add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-nvlink-wrapper/CMakeLists.txt b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
new file mode 100644
index 0000000000000..d46f66994cf39
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
@@ -0,0 +1,44 @@
+set(LLVM_LINK_COMPONENTS
+  ${LLVM_TARGETS_TO_BUILD}
+  BitWriter
+  Core
+  BinaryFormat
+  MC
+  Target
+  TransformUtils
+  Analysis
+  Passes
+  IRReader
+  Object
+  Option
+  Support
+  TargetParser
+  CodeGen
+  LTO
+  )
+
+set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
+tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
+add_public_tablegen_target(NVLinkWrapperOpts)
+
+if(NOT CLANG_BUILT_STANDALONE)
+  set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
+endif()
+
+add_clang_tool(clang-nvlink-wrapper
+  ClangNVLinkWrapper.cpp
+
+  DEPENDS
+  ${tablegen_deps}
+  )
+
+set(CLANG_NVLINK_WRAPPER_LIB_DEPS
+  clangBasic
+  )
+
+target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
+
+target_link_libraries(clang-nvlink-wrapper
+  PRIVATE
+  ${CLANG_NVLINK_WRAPPER_LIB_DEPS}
+  )
diff --git a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
new file mode 100644
index 0000000000000..d48bf4e37ecfe
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
@@ -0,0 +1,671 @@
+//===-- clang-nvlink-wrapper/ClangNVLinkWrapper.cpp - NVIDIA linker util --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This tool wraps around the NVIDIA linker called 'nvlink'. The NVIDIA linker
+// is required to create NVPTX applications, but does not support common
+// features like LTO or archives. This utility wraps around the tool to cover
+// its deficiencies. This tool can be removed once NVIDIA improves their linker
+// or ports it to `ld.lld`.
+//
+//===---------------------------------------------------------------------===//
+
+#include "clang/Basic/Version.h"
+
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/BinaryFormat/Magic.h"
+#include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/IR/DiagnosticPrinter.h"
+#include "llvm/LTO/LTO.h"
+#include "llvm/Object/Archive.h"
+#include "llvm/Object/ArchiveWriter.h"
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Object/IRObjectFile.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Option/ArgList.h"
+#include "llvm/Option/OptTable.h"
+#include "llvm/Option/Option.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileOutputBuffer.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/InitLLVM.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/StringSaver.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/WithColor.h"
+
+using namespace llvm;
+using namespace llvm::opt;
+using namespace llvm::object;
+
+static void printVersion(raw_ostream &OS) {
+  OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
+}
+
+/// The value of `argv[0]` when run.
+static const char *Executable;
+
+/// Temporary files to be cleaned up.
+static SmallVector<SmallString<128>> TempFiles;
+
+/// Codegen flags for LTO backend.
+static codegen::RegisterCodeGenFlags CodeGenFlags;
+
+namespace {
+/// Must not overlap with llvm::opt::DriverFlag.
+enum WrapperFlags {
+  WrapperOnlyOption = (1 << 4), // Options only used by the linker wrapper.
+  DeviceOnlyOption = (1 << 5),  // Options only used for device linking.
+};
+
+enum ID {
+  OPT_INVALID = 0, // This is not an option ID.
+#define OPTION(...) LLVM_MAKE_OPT_ID(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+  LastOption
+#undef OPTION
+};
+
+#define PREFIX(NAME, VALUE)                                                    \
+  static constexpr StringLiteral NAME##_init[] = VALUE;                        \
+  static constexpr ArrayRef<StringLiteral> NAME(NAME##_init,                   \
+                                                std::size(NAME##_init) - 1);
+#include "NVLinkOpts.inc"
+#undef PREFIX
+
+static constexpr OptTable::Info InfoTable[] = {
+#define OPTION(...) LLVM_CONSTRUCT_OPT_INFO(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+#undef OPTION
+};
+
+class WrapperOptTable : public opt::GenericOptTable {
+public:
+  WrapperOptTable() : opt::GenericOptTable(InfoTable) {}
+};
+
+const OptTable &getOptTable() {
+  static const WrapperOptTable *Table = []() {
+    auto Result = std::make_unique<WrapperOptTable>();
+    return Result.release();
+  }();
+  return *Table;
+}
+
+[[noreturn]] void reportError(Error E) {
+  outs().flush();
+  logAllUnhandledErrors(std::move(E), WithColor::error(errs(), Executable));
+  exit(EXIT_FAILURE);
+}
+
+void diagnosticHandler(const DiagnosticInfo &DI) {
+  std::string ErrStorage;
+  raw_string_ostream OS(ErrStorage);
+  DiagnosticPrinterRawOStream DP(OS);
+  DI.print(DP);
+
+  switch (DI.getSeverity()) {
+  case DS_Error:
+    WithColor::error(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Warning:
+    WithColor::warning(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Note:
+    WithColor::note(errs(), Executable) << ErrStorage << "\n";
+    break;
+  case DS_Remark:
+    WithColor::remark(errs()) << ErrStorage << "\n";
+    break;
+  }
+}
+
+Expected<StringRef> createTempFile(const ArgList &Args, const Twine &Prefix,
+                                   StringRef Extension) {
+  SmallString<128> OutputFile;
+  if (Args.hasArg(OPT_save_temps)) {
+    (Prefix + "." + Extension).toNullTerminatedStringRef(OutputFile);
+  } else {
+    if (std::error_code EC =
+            sys::fs::createTemporaryFile(Prefix, Extension, OutputFile))
+      return createFileError(OutputFile, EC);
+  }
+
+  TempFiles.emplace_back(std::move(OutputFile));
+  return TempFiles.back();
+}
+
+Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
+  ErrorOr<std::string> Path = sys::findProgramByName(Name, Paths);
+  if (!Path)
+    Path = sys::findProgramByName(Name);
+  if (!Path)
+    return createStringError(Path.getError(),
+                             "Unable to find '" + Name + "' in path");
+  return *Path;
+}
+
+std::optional<std::string> findFile(StringRef Dir, StringRef Root,
+                                    const Twine &Name) {
+  SmallString<128> Path;
+  if (Dir.starts_with("="))
+    sys::path::append(Path, Root, Dir.substr(1), Name);
+  else
+    sys::path::append(Path, Dir, Name);
+
+  if (sys::fs::exists(Path))
+    return static_cast<std::string>(Path);
+  return std::nullopt;
+}
+
+std::optional<std::string>
+findFromSearchPaths(StringRef Name, StringRef Root,
+                    ArrayRef<StringRef> SearchPaths) {
+  for (StringRef Dir : SearchPaths)
+    if (std::optional<std::string> File = findFile(Dir, Root, Name))
+      return File;
+  return std::nullopt;
+}
+
+std::optional<std::string>
+searchLibraryBaseName(StringRef Name, StringRef Root,
+                      ArrayRef<StringRef> SearchPaths) {
+  for (StringRef Dir : SearchPaths)
+    if (std::optional<std::string> File =
+            findFile(Dir, Root, "lib" + Name + ".a"))
+      return File;
+  return std::nullopt;
+}
+
+/// Search for static libraries in the linker's library path given input like
+/// `-lfoo` or `-l:libfoo.a`.
+std::optional<std::string> searchLibrary(StringRef Input, StringRef Root,
+                                         ArrayRef<StringRef> SearchPaths) {
+  if (Input.starts_with(":") || Input.ends_with(".lib"))
+    return findFromSearchPaths(Input.drop_front(), Root, SearchPaths);
+  return searchLibraryBaseName(Input, Root, SearchPaths);
+}
+
+void printCommands(ArrayRef<StringRef> CmdArgs) {
+  if (CmdArgs.empty())
+    return;
+
+  llvm::errs() << " \"" << CmdArgs.front() << "\" ";
+  for (auto IC = std::next(CmdArgs.begin()), IE = CmdArgs.end(); IC != IE; ++IC)
+    llvm::errs() << *IC << (std::next(IC) != IE ? " " : "\n");
+}
+
+/// A minimum symbol interface that provides the necessary information to
+/// extract archive members and resolve LTO symbols.
+struct Symbol {
+  enum Flags {
+    None = 0,
+    Undefined = 1 << 0,
+    Weak = 1 << 1,
+  };
+
+  Symbol()
+      : File(), Flags(Undefined), Name(), UsedInRegularObj(false), Lazy(false) {
+  }
+
+  Symbol(MemoryBufferRef File, const irsymtab::Reader::SymbolRef Sym, bool Lazy)
+      : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+    if (Sym.isUndefined())
+      Flags |= Undefined;
+    if (Sym.isWeak())
+      Flags |= Weak;
+    Name = Sym.getName();
+  }
+
+  Symbol(MemoryBufferRef File, const SymbolRef Sym, bool Lazy)
+      : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+    auto FlagsOrErr = Sym.getFlags();
+    if (!FlagsOrErr)
+      reportError(FlagsOrErr.takeError());
+    if (*FlagsOrErr & SymbolRef::SF_Undefined)
+      Flags |= Undefined;
+    if (*FlagsOrErr & SymbolRef::SF_Weak)
+      Flags |= Weak;
+
+    auto NameOrErr = Sym.getName();
+    if (!NameOrErr)
+      reportError(NameOrErr.takeError());
+    Name = *NameOrErr;
+  }
+
+  Symbol Resolve(Symbol Other) {
+    if (File.getBuffer().empty())
+      return Other.Lazy ? *this : Other;
+    if (Other.isUndefined())
+      return *this;
+    if (isWeak() && isUndefined() && Other.Lazy)
+      return *this;
+    if (isWeak() && !Other.isWeak())
+      return Other;
+    if (isUndefined() && !Other.isUndefined())
+      return Other;
+    return *this;
+  }
+
+  bool isWeak() const { return Flags & Weak; }
+  bool isUndefined() const { return Flags & Undefined; }
+
+  MemoryBufferRef File;
+  uint32_t Flags;
+  StringRef Name;
+  bool UsedInRegularObj;
+  bool Lazy;
+};
+
+Expected<StringRef> runPTXAs(StringRef File, const ArgList &Args) {
+  std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str();
+  Expected<std::string> PTXAsPath = findProgram("ptxas", {CudaPath + "/bin"});
+  if (!PTXAsPath)
+    return PTXAsPath.takeError();
+
+  auto TempFileOrErr =
+      creat...
[truncated]

jlebar · 2024-06-25T02:24:02Z

@Artem-B asked me to review nvptx patches while he's OOO, but this one is pretty far outside my depth. Are you OK waiting until he's back? I don't know exactly when that will be, but based on his IMs to me, he should be back early July.

jhuber6 · 2024-06-25T02:30:46Z

@Artem-B asked me to review nvptx patches while he's OOO, but this one is pretty far outside my depth. Are you OK waiting until he's back? I don't know exactly when that will be, but based on his IMs to me, he should be back early July.

No problem, I knew that it would probably take awhile to get reviewed given the size. I believe he said he'd be back early July as well, so maybe next week? It'd probably require his input, along with some of the other interested parties in clang to see how they feel about reviving one of these old tools.

(However if you know anything about the NVPTX varargs API I think #96015 is mostly just waiting for someone to say that it's a mostly correct lowering)

jhuber6 · 2024-06-25T12:26:01Z

@MaskRay So, I think my symbol resolution is (unsurprisingly) subtly broken. Is there a canonical way to handle this? I first thought that we could simply perform the symbol resolutions as normal for every file, but keep track of which symbols were "lazy". However, I couldn't figure out how to then tell if a lazy symbol should be extracted or not because there's no information on which files use which symbols. Maybe I just scan all the files and see if they reference a symbol that's marked defined and lazy?

Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in llvm#96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on llvm#96561

jhuber6 · 2024-06-27T21:58:36Z

Re-did it and tested it against libc in #96972 so it will have a CI running it one that lands. it works for other cases I've tested, but let me know if something else should be added.

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm#96561.

Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. Address comments

dyung · 2024-07-22T23:44:45Z

@jhuber6 this change seems to be causing build failures on some bots:

Can you take a look?

dyung · 2024-07-22T23:46:15Z

@jhuber6 this change seems to be causing build failures on some bots:

https://lab.llvm.org/buildbot/#/builders/174/builds/2129

https://lab.llvm.org/buildbot/#/builders/190/builds/2383

Can you take a look?

Nevermind, I just noticed you already fixed it. Sorry for the noise!

jhuber6 · 2024-07-22T23:46:30Z

@jhuber6 this change seems to be causing build failures on some bots:
* https://lab.llvm.org/buildbot/#/builders/174/builds/2129

* https://lab.llvm.org/buildbot/#/builders/190/builds/2383
Can you take a look?

Gah, sorry forgot to suppress that error when it's just running tests. I'll have it fixed in a few minutes or revert it if I can't fix it fast enough.

dyung · 2024-07-22T23:50:33Z

https://lab.llvm.org/buildbot/#/builders/174/builds/2130

Actually I noticed (and I think you already did as well), that there seems to be a test failure of nvwrapper.c after you fixed the build error. Just wanted to let you know in case you were not already aware.

jhuber6 · 2024-07-22T23:55:18Z

https://lab.llvm.org/buildbot/#/builders/174/builds/2130

Actually I noticed (and I think you already did as well), that there seems to be a test failure of nvwrapper.c after you fixed the build error. Just wanted to let you know in case you were not already aware.

Yep, it's caused by the program lookup not finding nvlink because I sometimes forget that not everything has CUDA installed.

MaskRay · 2024-07-22T23:56:03Z

clang/test/CMakeLists.txt CLANG_TEST_DEPS also needs an update. `

The test might fail due to /tmp/Rel/bin/clang-nvlink-wrapper: error: Unable to find 'nvlink' in path

jhuber6 · 2024-07-22T23:58:56Z

clang/test/CMakeLists.txt CLANG_TEST_DEPS also needs an update. `

The test might fail due to /tmp/Rel/bin/clang-nvlink-wrapper: error: Unable to find 'nvlink' in path

Also noticed that one, just pushed a fix a minute ago. Sorry for the mess.

dyung · 2024-07-23T00:27:47Z

@jhuber6 it looks like there still might be build failures on Windows:
https://lab.llvm.org/buildbot/#/builders/46/builds/2077

FAILED: tools/clang/tools/clang-nvlink-wrapper/CMakeFiles/clang-nvlink-wrapper.dir/ClangNVLinkWrapper.cpp.obj 
C:\bin\ccache.exe C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools\clang\tools\clang-nvlink-wrapper -IZ:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper -IZ:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\include -Itools\clang\include -Iinclude -IZ:\b\llvm-clang-x86_64-sie-win\llvm-project\llvm\include /DWIN32 /D_WINDOWS   /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -w14062 -we4238 /Gw /O2 /Ob2  -MD  /EHs-c- /GR- -UNDEBUG -g -O0 -std:c++17 /showIncludes /Fotools\clang\tools\clang-nvlink-wrapper\CMakeFiles\clang-nvlink-wrapper.dir\ClangNVLinkWrapper.cpp.obj /Fdtools\clang\tools\clang-nvlink-wrapper\CMakeFiles\clang-nvlink-wrapper.dir\ /FS -c Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper\ClangNVLinkWrapper.cpp
cl : Command line warning D9002 : ignoring unknown option '-g'
cl : Command line warning D9002 : ignoring unknown option '-O0'
Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper\ClangNVLinkWrapper.cpp(429): error C2440: 'initializing': cannot convert from 'const ValueTy' to '_Ty2 &&'
        with
        [
            ValueTy=`anonymous-namespace'::Symbol
        ]
        and
        [
            _Ty2=`anonymous-namespace'::Symbol
        ]
Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper\ClangNVLinkWrapper.cpp(429): note: Conversion loses qualifiers
Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper\ClangNVLinkWrapper.cpp(459): error C2440: 'initializing': cannot convert from 'const ValueTy' to '_Ty2 &&'
        with
        [
            ValueTy=`anonymous-namespace'::Symbol
        ]
        and
        [
            _Ty2=`anonymous-namespace'::Symbol
        ]
Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\tools\clang-nvlink-wrapper\ClangNVLinkWrapper.cpp(459): note: Conversion loses qualifiers

bogner · 2024-07-23T00:58:28Z

clang/tools/clang-nvlink-wrapper/CMakeLists.txt

+  clangBasic
+  )
+
+target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")


This looks suspicious. Do we actually want to build this with -g -O0 all the time or was this left in from debugging or something like that? In the unlikely event that we do want this for some reason, it won't work as is on windows anyway.

Whoops, thanks for pointing that out.

nico · 2024-07-23T02:09:44Z

Why do we need a new binary for this, instead of having something like clang -cc1_nvlink that calls a custom mode within clang?

And if there's a good reason for that, could clang-linker-wrapper and clang-nvlink-wrapper at least be the same binary?

Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in llvm#96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on llvm#96561

Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in #96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on #96561

vvereschaka · 2024-07-23T03:18:37Z

@jhuber6 ,
there is still few failed bot because of these changes

would you fix the problem or revert the changes?

jhuber6 · 2024-07-23T03:22:26Z

@jhuber6 , there is still few failed bot because of these changes
* https://lab.llvm.org/buildbot/#/builders/193/builds/1224

* https://lab.llvm.org/buildbot/#/builders/2/builds/2790
would you fix the problem or revert the changes?

Looks like it's complaining about the const-ness of the value decomposition? It's fine one my end so I don't know if it's a Windows thing, I'll just try putting const there and seeing if that fixes it.

vvereschaka · 2024-07-23T03:38:30Z

Thank you @jhuber6
both of those failed builders got fixed.

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm#96561.

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote #96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on #96561.

Artem-B · 2024-07-23T18:54:49Z

@nico:

Why do we need a new binary for this, instead of having something like clang -cc1_nvlink that calls a custom mode within clang?

Do we have existing precedents for such built-in tools, other than cc1 itself? If the linker wrapper can be part of clang itself, it would make some things easier logistically.

MaskRay · 2024-07-23T18:59:48Z

clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp

+
+// Various tools (e.g., llc and opt) duplicate this series of declarations for
+// options related to passes and remarks.
+static cl::opt<bool> RemarksWithHotness(


For new user-facing tools, ideally use OptTable instead of cl::opt. The latter has lots of minor issues, e.g. not differentiating single/double-dash options.

So, most everything uses the OptTable interface, but there's a few options like these which each tool individually seems to define that interface w/ LLVM. It's pretty much required to give the linker similar semantics to opt when you pass -mllvm to it I believe.

jhuber6 · 2024-07-23T19:32:28Z

Why do we need a new binary for this, instead of having something like clang -cc1_nvlink that calls a custom mode within clang?

And if there's a good reason for that, could clang-linker-wrapper and clang-nvlink-wrapper at least be the same binary?

Sorry, missed this in the deluge of build failures. I don't think this really fits with a different -cc1 mode because it's just a linker step that we augment. To me, this is like suggesting we merge ld.lld into clang so we can do everything in a single invocation. It's certainly possible, but I think this fits alright into the workflow.

) Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. This also copies some amount of logic from the clang-linker-wrapper, but not enough for it to be worthwhile to merge them I feel. In the future it may be possible to delete that handling from there entirely. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251377

Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in #96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on #96561

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote #96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on #96561.

Introduced in llvm/llvm-project#96561. I've added the file to clang-tools-extra, as we also put many similar binaries like clang-linker-wrapper in there.

jhuber6 requested review from AlexMaclean, Artem-B, jdoerfert, jlebar, JonChesterfield, MaskRay, shiltian and yxsamliu June 24, 2024 21:39

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jun 24, 2024

jhuber6 force-pushed the NvlinkWrapper branch from d48deac to 8a52bec Compare June 24, 2024 23:08

jhuber6 force-pushed the NvlinkWrapper branch from 8a52bec to 5edeeb9 Compare June 25, 2024 02:24

jhuber6 force-pushed the NvlinkWrapper branch from 5edeeb9 to 6c70e54 Compare June 25, 2024 12:19

jhuber6 force-pushed the NvlinkWrapper branch 2 times, most recently from 859f6a7 to 849c8da Compare June 27, 2024 21:55

jhuber6 mentioned this pull request Jun 27, 2024

[libc] Remove workarounds for lack of functional NVPTX linker #96972

Merged

jdoerfert changed the title ~~[Clang] Introduce 'clang-nvlink-wrappaer' to work around 'nvlink'~~ [Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' Jun 28, 2024

jhuber6 force-pushed the NvlinkWrapper branch 4 times, most recently from 2bb5bd0 to 2d3957a Compare July 3, 2024 01:49

jhuber6 mentioned this pull request Jul 3, 2024

[LinkerWrapper] Pass all files to the device linker #97573

Merged

jhuber6 force-pushed the NvlinkWrapper branch from 455513d to 9cd5a58 Compare July 22, 2024 22:49

jhuber6 merged commit 37d0568 into llvm:main Jul 22, 2024
5 of 7 checks passed

bogner reviewed Jul 23, 2024

View reviewed changes

MaskRay reviewed Jul 23, 2024

View reviewed changes

[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561

[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561

Uh oh!

Conversation

jhuber6 commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 24, 2024

Uh oh!

llvmbot commented Jun 24, 2024

Uh oh!

jlebar commented Jun 25, 2024

Uh oh!

jhuber6 commented Jun 25, 2024

Uh oh!

jhuber6 commented Jun 25, 2024

Uh oh!

jhuber6 commented Jun 27, 2024

Uh oh!

Uh oh!

dyung commented Jul 22, 2024

Uh oh!

dyung commented Jul 22, 2024

Uh oh!

jhuber6 commented Jul 22, 2024

Uh oh!

dyung commented Jul 22, 2024

Uh oh!

jhuber6 commented Jul 22, 2024

Uh oh!

MaskRay commented Jul 22, 2024

Uh oh!

jhuber6 commented Jul 22, 2024

Uh oh!

dyung commented Jul 23, 2024

Uh oh!

bogner Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

nico commented Jul 23, 2024

Uh oh!

vvereschaka commented Jul 23, 2024

Uh oh!

jhuber6 commented Jul 23, 2024

Uh oh!

vvereschaka commented Jul 23, 2024

Uh oh!

Artem-B commented Jul 23, 2024

Uh oh!

MaskRay Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Jul 23, 2024

Uh oh!

Uh oh!

jhuber6 commented Jun 24, 2024 •

edited

Loading