[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` #98170

jhuber6 · 2024-07-09T15:27:07Z

Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include -lc and
-lm like a standard target does. This removes the old handling that
went through the fat binary -lcgpu.

llvmbot · 2024-07-09T15:27:39Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The previous patches (The other commits in this chain) allow the
offloading toolchain to directly invoke the device linker. Because of
this, we can now just have the toolchain implicitly include -lc and
-lm like a standard target does. This removes the old handling that
went through the fat binary -lcgpu.

Patch is 64.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98170.diff

16 Files Affected:

(added) clang/docs/ClangNVLinkWrapper.rst (+64)
(modified) clang/docs/index.rst (+1)
(modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+11)
(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1)
(modified) clang/lib/Driver/ToolChains/Cuda.cpp (+18-53)
(modified) clang/lib/Driver/ToolChains/Cuda.h (+3)
(modified) clang/test/Driver/cuda-cross-compiling.c (+4-4)
(modified) clang/test/Driver/linker-wrapper-libs.c (+6-6)
(modified) clang/test/Driver/linker-wrapper.c (+2-2)
(added) clang/test/Driver/nvlink-wrapper.c (+65)
(modified) clang/test/lit.cfg.py (+1)
(modified) clang/tools/CMakeLists.txt (+1)
(modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+60-38)
(added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44)
(added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+776)
(added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+88)

diff --git a/clang/docs/ClangNVLinkWrapper.rst b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0000000000000..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+====================
+Clang nvlink Wrapper
+====================
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+============
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=====
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>
+
+  OPTIONS:
+    --arch <value>       Specify the 'sm_' name of the target architecture.
+    --cuda-path=<dir>    Set the system CUDA path
+    --dry-run            Print generated commands without running.
+    --feature <value>    Specify the '+ptx' freature to use for LTO.
+    -g                   Specify that this was a debug compile.
+    -help-hidden         Display all available options
+    -help                Display available options (--help-hidden for more)
+    -L <dir>             Add <dir> to the library search path
+    -l <libname>         Search for library <libname>
+    -mllvm <arg>         Arguments passed to LLVM, including Clang invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' for a list of options.
+    -o <path>            Path to file to write output
+    --plugin-opt=jobs=<value>
+                         Number of LTO codegen partitions
+    --plugin-opt=lto-partitions=<value>
+                         Number of LTO codegen partitions
+    --plugin-opt=O<O0, O1, O2, or O3>
+                         Optimization level for LTO
+    --plugin-opt=thinlto<value>
+                         Enable the thin-lto backend
+    --plugin-opt=<value> Options passed to LLVM, not including the Clang invocation. Use '--plugin-opt=--help' for a list of options.
+    --save-temps         Save intermediate results
+    --version            Display the version number and exit
+    -v                   Print verbose information
+
+Example
+=======
+
+This tool is intended to be invoked when targeting the NVPTX toolchain directly. 
+
+.. code-block:: console
+
+  clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c
diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index a35a867b96bd7..9bae0bd83243b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -92,6 +92,7 @@ Using Clang Tools
    ClangFormatStyleOptions
    ClangFormattedStatus
    ClangLinkerWrapper
+   ClangNVLinkWrapper
    ClangOffloadBundler
    ClangOffloadPackager
    ClangRepl
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 453daed7cc7d5..56d06a6f3ecad 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   else if (Args.hasArg(options::OPT_mcpu_EQ))
     CmdArgs.push_back(Args.MakeArgString(
         "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+
+  // If the user's toolchain has the 'include/amdgcn-amd-amdhsa/` path, we
+  // assume it supports the standard C libraries for the GPU and include them.
+  bool HasLibC = getToolChain().getStdlibIncludePath().has_value();
+  if (!Args.hasArg(options::OPT_nogpulib) &&
+      !Args.hasArg(options::OPT_nolibc) &&
+      Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC)) {
+    CmdArgs.push_back("-lc");
+    CmdArgs.push_back("-lm");
+  }
+    
   CmdArgs.push_back("-o");
   CmdArgs.push_back(Output.getFilename());
   C.addCommand(std::make_unique<Command>(
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index ab1590104b790..42dde72f13907 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1257,6 +1257,7 @@ bool tools::addOpenMPRuntime(const Compilation &C, ArgStringList &CmdArgs,
     addOpenMPDeviceLibC(C, Args, CmdArgs);
 
   addArchSpecificRPath(TC, Args, CmdArgs);
+
   addOpenMPRuntimeLibraryPath(TC, Args, CmdArgs);
 
   return true;
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 08a4633902654..d5094568eadd3 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-    SmallString<256> Filename(Output.getFilename());
-    llvm::sys::path::replace_extension(Filename, "cubin");
-    OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
     C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,21 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+    addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+                  C.getDriver().getLTOMode() == LTOK_Thin);
+
+  // If the user's toolchain has the 'include/nvptx64-nvidia-cuda/` path, we
+  // assume it supports the standard C libraries for the GPU and include them.
+  bool HasLibC = getToolChain().getStdlibIncludePath().has_value();
+  if (!Args.hasArg(options::OPT_nogpulib) &&
+      !Args.hasArg(options::OPT_nolibc) &&
+      Args.hasFlag(options::OPT_gpulibc, options::OPT_nogpulibc, HasLibC)) {
+    CmdArgs.push_back("-lc");
+    CmdArgs.push_back("-lm");
+  }
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +633,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto &II : Inputs) {
-    if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
-        II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
-      C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-          << getToolChain().getTripleString();
-      continue;
-    }
-
-    // The 'nvlink' application performs RDC-mode linking when given a '.o'
-    // file and device linking when given a '.cubin' file. We always want to
-    // perform device linking, so just rename any '.o' files.
-    // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-    if (II.isFilename()) {
-      auto InputFile = getToolChain().getInputFilename(II);
-      if (llvm::sys::path::extension(InputFile) != ".cubin") {
-        // If there are no actions above this one then this is direct input and
-        // we can copy it. Otherwise the input is internal so a `.cubin` file
-        // should exist.
-        if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-          const char *CubinF =
-              Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-                  llvm::sys::path::stem(InputFile), "cubin"));
-          if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-            continue;
-
-          CmdArgs.push_back(CubinF);
-        } else {
-          SmallString<256> Filename(InputFile);
-          llvm::sys::path::replace_extension(Filename, "cubin");
-          CmdArgs.push_back(Args.MakeArgString(Filename));
-        }
-      } else {
-        CmdArgs.push_back(Args.MakeArgString(InputFile));
-      }
-    } else if (!II.isNothing()) {
-      II.getInputArg().renderAsInput(Args, CmdArgs);
-    }
-  }
-
   C.addCommand(std::make_unique<Command>(
       JA, *this,
       ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
                           "--options-file"},
-      Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
-      Inputs, Output));
+      Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+      CmdArgs, Inputs, Output));
 }
 
 void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +918,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
   if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
     return ToolChain::getInputFilename(Input);
 
-  // Replace extension for object files with cubin because nvlink relies on
-  // these particular file names.
-  SmallString<256> Filename(ToolChain::getInputFilename(Input));
-  llvm::sys::path::replace_extension(Filename, "cubin");
-  return std::string(Filename);
+  return ToolChain::getInputFilename(Input);
 }
 
 llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 7464d88cb350b..7a6a6fb209012 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
   bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
     return false;
   }
+  bool HasNativeLLVMSupport() const override { return true; }
   bool isPICDefaultForced() const override { return false; }
   bool SupportsProfiling() const override { return false; }
 
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
     return &HostTC.getTriple();
   }
 
+  bool HasNativeLLVMSupport() const override { return false; }
+
   std::string getInputFilename(const InputInfo &Input) const override;
 
   llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..42d56cbfcc321 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
 // RUN:   | FileCheck -check-prefix=ARGS %s
 
 //      ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}"[[CUBIN]].o"
 
 //
 // Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
@@ -55,7 +55,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINK %s
 
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}[[CUBIN:.+]].o
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
 
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
 
 // Tests for handling a missing architecture.
 //
diff --git a/clang/test/Driver/linker-wrapper-libs.c b/clang/test/Driver/linker-wrapper-libs.c
index 22cc24f2e258a..cb5c7c137a0ba 100644
--- a/clang/test/Driver/linker-wrapper-libs.c
+++ b/clang/test/Driver/linker-wrapper-libs.c
@@ -48,7 +48,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-RESOLVES
 
-// LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
 //
@@ -72,7 +72,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL
 
-// LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
 //
@@ -96,7 +96,7 @@ int bar() { return weak; }
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL-NONE
 
 // LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
-// LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 
 //
 // Check that we do not extract an external weak symbol.
@@ -161,7 +161,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.o %t.a %t.a -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL-DEFINED
 
-// LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-GLOBAL-DEFINED-NOT: {{.*}}gfx1030{{.*}}.o
 // LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
@@ -185,7 +185,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.o --whole-archive %t.a -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-WHOLE-ARCHIVE
 
-// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
-// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_52 {{.*}}.s
+// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_52 {{.*}}.o
 // LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a {{.*}}.o
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index b9fa08ace0ff7..63d43921be9ac 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -48,7 +48,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --save-temps -O2 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LTO-TEMPS
 
-// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.s -save-temps
+// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.o -save-temps
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
@@ -147,7 +147,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --clang-backend \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND
 
-// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.bc
+// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..fdda93f1f9cdc
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,65 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int __attribute__((weak)) w;
+
+int bar() {
+  return x + w;
+}
+#else
+extern int y;
+int __attribute__((visibility("hidden"))) x = 999;
+int baz() { return y + x; }
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index e5630a07424c7..92a3361ce672e 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -95,6 +95,7 @@
     "llvm-ifs",
     "yaml2obj",
     "clang-linker-wrapper",
+    "clang-nvlink-wrapper",
     "llvm-lto",
     "llvm-lto2",
     "llvm-profdata",
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
 add_clang_subdirectory(clang-fuzzer)
 add_clang_subdirectory(clang-import-test)
 add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
 add_clang_subdirectory(clang-offload-packager)
 add_clang_subdirectory(clang-offload-bundler)
 add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.c...
[truncated]

jhuber6 · 2024-07-09T15:30:54Z

So, one thing I've noticed is that passing -lc and -lm to the ld.lld invocation greatly increases link times for trivial applications. This is because the handling in ld.lld will intentionally extract known libcall functions from LTO static libraries. We then have handling in the LTO internalization pass that prevents these calls from being internalized so that the backend may emit calls to them. This has the result of increasing the compile time as it extracts about fifty math functions, as well as bloating the resulting binary because they do not get optimized out. The AMDGPU backend doesn't emit any of these as far as I'm aware. Right now this simply goes off of a list of all libcalls. I wonder if I can make a separate list that's per-target, just to show that the AMDGPU target doesn't emit any of these for now and should be internalized / not extracted. WDYT @yxsamliu and @arsenm ?

github-actions · 2024-07-09T15:31:00Z

✅ With the latest revision this PR passed the C/C++ code formatter.

yxsamliu · 2024-07-09T16:13:29Z

clang/lib/Driver/ToolChains/AMDGPU.cpp

+
+  // If the user's toolchain has the 'include/amdgcn-amd-amdhsa/` path, we
+  // assume it supports the standard C libraries for the GPU and include them.
+  bool HasLibC = getToolChain().getStdlibIncludePath().has_value();


maybe refactor as a member of Tool::addOffloadLibCArgs

It's not really offload in this context, since it deals with clang --target=amdgcn-amd-amdhsa. But I could see putting it in CommonArgs.

yxsamliu · 2024-07-09T16:16:55Z

clang/lib/Driver/ToolChains/AMDGPU.cpp

@@ -633,6 +633,17 @@ void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
  else if (Args.hasArg(options::OPT_mcpu_EQ))
    CmdArgs.push_back(Args.MakeArgString(
        "-plugin-opt=mcpu=" + Args.getLastArgValue(options::OPT_mcpu_EQ)));
+


HIPAMD toolchain has its lld command argument handling in AMDGCN::Linker::constructLldCommand. Also need change there.

So, I wasn't sure if I should also apply this to HIP / CUDA stuff just yet. HIP doesn't pass LTO bitcode to the linker, so we can't do a full link with the library. CUDA doesn't even invoke the linker either. I think in the future it would be nice to provide these to HIP and CUDA by default however.

MaskRay

LGTM, if @yxsamliu who has reviewed this is happy.

Summary: The previous patches (The other commits in this chain) allow the offloading toolchain to directly invoke the device linker. Because of this, we can now just have the toolchain implicitly include `-lc` and `-lm` like a standard target does. This removes the old handling that went through the fat binary `-lcgpu`.

Summary: The previous patches (The other commits in this chain) allow the offloading toolchain to directly invoke the device linker. Because of this, we can now just have the toolchain implicitly include `-lc` and `-lm` like a standard target does. This removes the old handling that went through the fat binary `-lcgpu`. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250724

jhuber6 requested review from arsenm, Artem-B, jdoerfert, JonChesterfield, MaskRay, shiltian and yxsamliu July 9, 2024 15:27

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jul 9, 2024

jhuber6 force-pushed the LinkDevice branch from d86b888 to fcb5586 Compare July 9, 2024 15:31

yxsamliu reviewed Jul 9, 2024

View reviewed changes

MaskRay approved these changes Jul 14, 2024

View reviewed changes

jhuber6 force-pushed the LinkDevice branch 5 times, most recently from 38c79ab to b609f5e Compare July 23, 2024 20:25

jhuber6 added 2 commits July 23, 2024 18:14

Remove old method for passing C libraries.

ea463c1

jhuber6 force-pushed the LinkDevice branch from b50719b to ea463c1 Compare July 23, 2024 23:14

jhuber6 merged commit 4f516aa into llvm:main Jul 23, 2024
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` #98170

[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` #98170

Uh oh!

jhuber6 commented Jul 9, 2024

Uh oh!

llvmbot commented Jul 9, 2024 •

edited

Loading

Uh oh!

jhuber6 commented Jul 9, 2024

Uh oh!

github-actions bot commented Jul 9, 2024 •

edited

Loading

Uh oh!

yxsamliu Jul 9, 2024 •

edited

Loading

Uh oh!

jhuber6 Jul 9, 2024

Uh oh!

yxsamliu Jul 9, 2024

Uh oh!

jhuber6 Jul 9, 2024

Uh oh!

MaskRay left a comment

Uh oh!

Uh oh!

Uh oh!

[Clang] Make the GPU toolchains implicitly link -lm and -lc #98170

[Clang] Make the GPU toolchains implicitly link -lm and -lc #98170

Uh oh!

Conversation

jhuber6 commented Jul 9, 2024

Uh oh!

llvmbot commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented Jul 9, 2024

Uh oh!

github-actions bot commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yxsamliu Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

yxsamliu Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

MaskRay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` #98170

[Clang] Make the GPU toolchains implicitly link `-lm` and `-lc` #98170

llvmbot commented Jul 9, 2024 •

edited

Loading

github-actions bot commented Jul 9, 2024 •

edited

Loading

yxsamliu Jul 9, 2024 •

edited

Loading