[LinkerWrapper] Pass all files to the device linker #97573

jhuber6 · 2024-07-03T13:27:08Z

Summary:
The linker wrapper's job is to extract embedded device code from fat
binaries and create linked images that can then be embedded and
executed. In order to support LTO, we originally reinvented all of the
LTO handling that ld.lld normally does. Primarily, this was because
nvlink didn't support this at all, and we have special hacks required
for offloading languages interacting with archive libraries.

Now since I wrote #96561 we
should be able to pass all the inputs to the device linker
transparently. This has the advantage of allowing the clang Driver to
do its own handling. Primarily, this will be used to implicitly pass
libraries to the device link job to make it more consistent with other
toolchains.

The JIT support is a notable departure, however there is an option
called --lto-emit-llvm that performs the exact function where we want
the final link job to output LLVM-IR that we can then embed instead.

This patch does not fully delete the LTO handling, primarily because I
think the SPIR-V people might want it. To see only the relevant patches,
ignore the first commit of the nvlink-wrapper.

Depends on #96561.

llvmbot · 2024-07-03T13:27:38Z

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The linker wrapper's job is to extract embedded device code from fat
binaries and create linked images that can then be embedded and
executed. In order to support LTO, we originally reinvented all of the
LTO handling that ld.lld normally does. Primarily, this was because
nvlink didn't support this at all, and we have special hacks required
for offloading languages interacting with archive libraries.

Now since I wrote #96561 we
should be able to pass all the inputs to the device linker
transparently. This has the advantage of allowing the clang Driver to
do its own handling. Primarily, this will be used to implicitly pass
libraries to the device link job to make it more consistent with other
toolchains.

The JIT support is a notable departure, however there is an option
called --lto-emit-llvm that performs the exact function where we want
the final link job to output LLVM-IR that we can then embed instead.

This patch does not fully delete the LTO handling, primarily because I
think the SPIR-V people might want it. To see only the relevant patches,
ignore the first commit of the nvlink-wrapper.

Depends on #96561.

Patch is 62.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97573.diff

14 Files Affected:

(added) clang/docs/ClangNVLinkWrapper.rst (+64)
(modified) clang/docs/index.rst (+1)
(modified) clang/lib/Driver/ToolChains/Cuda.cpp (+8-53)
(modified) clang/lib/Driver/ToolChains/Cuda.h (+3)
(modified) clang/test/Driver/cuda-cross-compiling.c (+4-4)
(modified) clang/test/Driver/linker-wrapper-libs.c (+6-6)
(modified) clang/test/Driver/linker-wrapper.c (+2-2)
(added) clang/test/Driver/nvlink-wrapper.c (+65)
(modified) clang/test/lit.cfg.py (+1)
(modified) clang/tools/CMakeLists.txt (+1)
(modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+60-38)
(added) clang/tools/clang-nvlink-wrapper/CMakeLists.txt (+44)
(added) clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp (+776)
(added) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+88)

diff --git a/clang/docs/ClangNVLinkWrapper.rst b/clang/docs/ClangNVLinkWrapper.rst
new file mode 100644
index 0000000000000..0a312bdbf3066
--- /dev/null
+++ b/clang/docs/ClangNVLinkWrapper.rst
@@ -0,0 +1,64 @@
+====================
+Clang nvlink Wrapper
+====================
+
+.. contents::
+   :local:
+
+.. _clang-nvlink-wrapper:
+
+Introduction
+============
+
+This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose 
+of this wrapper is to provide an interface similar to the ``ld.lld`` linker 
+while still relying on NVIDIA's proprietary linker to produce the final output. 
+Features include, static archive (.a) linking, LTO, and accepting files ending 
+in ``.o`` without error.
+
+Usage
+=====
+
+This tool can be used with the following options. Any arguments not intended
+only for the linker wrapper will be forwarded to ``nvlink``.
+
+.. code-block:: console
+
+  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
+  This enables static linking and LTO handling for NVPTX targets.
+
+  USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>
+
+  OPTIONS:
+    --arch <value>       Specify the 'sm_' name of the target architecture.
+    --cuda-path=<dir>    Set the system CUDA path
+    --dry-run            Print generated commands without running.
+    --feature <value>    Specify the '+ptx' freature to use for LTO.
+    -g                   Specify that this was a debug compile.
+    -help-hidden         Display all available options
+    -help                Display available options (--help-hidden for more)
+    -L <dir>             Add <dir> to the library search path
+    -l <libname>         Search for library <libname>
+    -mllvm <arg>         Arguments passed to LLVM, including Clang invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' for a list of options.
+    -o <path>            Path to file to write output
+    --plugin-opt=jobs=<value>
+                         Number of LTO codegen partitions
+    --plugin-opt=lto-partitions=<value>
+                         Number of LTO codegen partitions
+    --plugin-opt=O<O0, O1, O2, or O3>
+                         Optimization level for LTO
+    --plugin-opt=thinlto<value>
+                         Enable the thin-lto backend
+    --plugin-opt=<value> Options passed to LLVM, not including the Clang invocation. Use '--plugin-opt=--help' for a list of options.
+    --save-temps         Save intermediate results
+    --version            Display the version number and exit
+    -v                   Print verbose information
+
+Example
+=======
+
+This tool is intended to be invoked when targeting the NVPTX toolchain directly. 
+
+.. code-block:: console
+
+  clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c
diff --git a/clang/docs/index.rst b/clang/docs/index.rst
index a35a867b96bd7..9bae0bd83243b 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -92,6 +92,7 @@ Using Clang Tools
    ClangFormatStyleOptions
    ClangFormattedStatus
    ClangLinkerWrapper
+   ClangNVLinkWrapper
    ClangOffloadBundler
    ClangOffloadPackager
    ClangRepl
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 08a4633902654..a8dd91e61f96c 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
   CmdArgs.push_back("--output-file");
   std::string OutputFileName = TC.getInputFilename(Output);
 
-  // If we are invoking `nvlink` internally we need to output a `.cubin` file.
-  // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-  if (!C.getInputArgs().getLastArg(options::OPT_c)) {
-    SmallString<256> Filename(Output.getFilename());
-    llvm::sys::path::replace_extension(Filename, "cubin");
-    OutputFileName = Filename.str();
-  }
   if (Output.isFilename() && OutputFileName != Output.getFilename())
     C.addTempFile(Args.MakeArgString(OutputFileName));
 
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   // Add standard library search paths passed on the command line.
   Args.AddAllArgs(CmdArgs, options::OPT_L);
   getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+  if (C.getDriver().isUsingLTO())
+    addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+                  C.getDriver().getLTOMode() == LTOK_Thin);
 
   // Add paths for the default clang library path.
   SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
   CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
 
-  for (const auto &II : Inputs) {
-    if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
-        II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
-      C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
-          << getToolChain().getTripleString();
-      continue;
-    }
-
-    // The 'nvlink' application performs RDC-mode linking when given a '.o'
-    // file and device linking when given a '.cubin' file. We always want to
-    // perform device linking, so just rename any '.o' files.
-    // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
-    if (II.isFilename()) {
-      auto InputFile = getToolChain().getInputFilename(II);
-      if (llvm::sys::path::extension(InputFile) != ".cubin") {
-        // If there are no actions above this one then this is direct input and
-        // we can copy it. Otherwise the input is internal so a `.cubin` file
-        // should exist.
-        if (II.getAction() && II.getAction()->getInputs().size() == 0) {
-          const char *CubinF =
-              Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
-                  llvm::sys::path::stem(InputFile), "cubin"));
-          if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
-            continue;
-
-          CmdArgs.push_back(CubinF);
-        } else {
-          SmallString<256> Filename(InputFile);
-          llvm::sys::path::replace_extension(Filename, "cubin");
-          CmdArgs.push_back(Args.MakeArgString(Filename));
-        }
-      } else {
-        CmdArgs.push_back(Args.MakeArgString(InputFile));
-      }
-    } else if (!II.isNothing()) {
-      II.getInputArg().renderAsInput(Args, CmdArgs);
-    }
-  }
-
   C.addCommand(std::make_unique<Command>(
       JA, *this,
       ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
                           "--options-file"},
-      Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
-      Inputs, Output));
+      Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+      CmdArgs, Inputs, Output));
 }
 
 void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +908,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
   if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
     return ToolChain::getInputFilename(Input);
 
-  // Replace extension for object files with cubin because nvlink relies on
-  // these particular file names.
-  SmallString<256> Filename(ToolChain::getInputFilename(Input));
-  llvm::sys::path::replace_extension(Filename, "cubin");
-  return std::string(Filename);
+  return ToolChain::getInputFilename(Input);
 }
 
 llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 7464d88cb350b..7a6a6fb209012 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
   bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
     return false;
   }
+  bool HasNativeLLVMSupport() const override { return true; }
   bool isPICDefaultForced() const override { return false; }
   bool SupportsProfiling() const override { return false; }
 
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
     return &HostTC.getTriple();
   }
 
+  bool HasNativeLLVMSupport() const override { return false; }
+
   std::string getInputFilename(const InputInfo &Input) const override;
 
   llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..42d56cbfcc321 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
 // RUN:   | FileCheck -check-prefix=ARGS %s
 
 //      ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}"[[CUBIN]].o"
 
 //
 // Test the generated arguments to the CUDA binary utils when targeting NVPTX. 
@@ -55,7 +55,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINK %s
 
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}[[CUBIN:.+]].o
 
 //
 // Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
 // RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=LINKER-ARGS %s
 
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
 
 // Tests for handling a missing architecture.
 //
diff --git a/clang/test/Driver/linker-wrapper-libs.c b/clang/test/Driver/linker-wrapper-libs.c
index 22cc24f2e258a..cb5c7c137a0ba 100644
--- a/clang/test/Driver/linker-wrapper-libs.c
+++ b/clang/test/Driver/linker-wrapper-libs.c
@@ -48,7 +48,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-RESOLVES
 
-// LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
 //
@@ -72,7 +72,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL
 
-// LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
 //
@@ -96,7 +96,7 @@ int bar() { return weak; }
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL-NONE
 
 // LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
-// LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL-NONE-NOT: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 
 //
 // Check that we do not extract an external weak symbol.
@@ -161,7 +161,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.o %t.a %t.a -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL-DEFINED
 
-// LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-GLOBAL-DEFINED-NOT: {{.*}}gfx1030{{.*}}.o
 // LIBRARY-GLOBAL-DEFINED: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
 
@@ -185,7 +185,7 @@ int bar() { return weak; }
 // RUN:   --linker-path=/usr/bin/ld %t.o --whole-archive %t.a -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-WHOLE-ARCHIVE
 
-// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.s {{.*}}.o
+// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_70 {{.*}}.o {{.*}}.o
 // LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
-// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_52 {{.*}}.s
+// LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=nvptx64-nvidia-cuda -march=sm_52 {{.*}}.o
 // LIBRARY-WHOLE-ARCHIVE: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a {{.*}}.o
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index b9fa08ace0ff7..63d43921be9ac 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -48,7 +48,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --save-temps -O2 \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=AMDGPU-LTO-TEMPS
 
-// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.s -save-temps
+// AMDGPU-LTO-TEMPS: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -O2 -Wl,--no-undefined {{.*}}.o -save-temps
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=x86_64-unknown-linux-gnu \
@@ -147,7 +147,7 @@ __attribute__((visibility("protected"), used)) int x;
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run --clang-backend \
 // RUN:   --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=CLANG-BACKEND
 
-// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.bc
+// CLANG-BACKEND: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx908 -O2 -Wl,--no-undefined {{.*}}.o
 
 // RUN: clang-offload-packager -o %t.out \
 // RUN:   --image=file=%t.elf.o,kind=openmp,triple=nvptx64-nvidia-cuda,arch=sm_70
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..fdda93f1f9cdc
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,65 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int __attribute__((weak)) w;
+
+int bar() {
+  return x + w;
+}
+#else
+extern int y;
+int __attribute__((visibility("hidden"))) x = 999;
+int baz() { return y + x; }
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index e5630a07424c7..92a3361ce672e 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -95,6 +95,7 @@
     "llvm-ifs",
     "yaml2obj",
     "clang-linker-wrapper",
+    "clang-nvlink-wrapper",
     "llvm-lto",
     "llvm-lto2",
     "llvm-profdata",
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
 add_clang_subdirectory(clang-fuzzer)
 add_clang_subdirectory(clang-import-test)
 add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
 add_clang_subdirectory(clang-offload-packager)
 add_clang_subdirectory(clang-offload-bundler)
 add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 9027076119cf9..15b215bf078f8 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -242,6 +242,13 @@ Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
   return *Path;
 }
 
+/// We will defer LTO to the target's linker if we are not doing JIT and it is
+/// supported by the toolchain.
+bool linkerSupportsLTO(const ArgList &Args) {
+  llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
+  return Triple.isNVPTX() || Triple.isAMDGPU();
+}
+
 /// Returns the hashed value for a constant string.
 std::string getHash(StringRef Str) {
   llvm::MD5 Hasher;
@@ -504,11 +511,10 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
     llvm::copy(LinkerArgs, std::back_inserter(CmdArgs));
   }
 
-  // Pass on -mllvm options to the clang invocation.
-  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm)) {
-    CmdArgs.push_back("-mllvm");
-    CmdArgs.push_back(Arg->getValue());
-  }
+  // Pass on -mllvm options to the linker invocation.
+  for (const opt::Arg *Arg : Args.filtered(OPT_mllvm))
+    CmdArgs.push_back(
+        Args.MakeArgString("-Wl,-mllvm=" + StringRef(Arg->getValue())));
 
   if (Args.hasArg(OPT_debug))
     CmdArgs.push_back("-g");
@@ -516,6 +522,12 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
   if (SaveTemps)
     CmdArgs.push_back("-save-temps");
 
+  if (SaveTemps && linkerSupportsLTO(Args))
+    CmdArgs.push_back("-Wl,--save-temps");
+
+  if (Args.hasArg(OPT_embed_bitcode))
+    CmdArgs.push_back("-Wl,--lto-emit-llvm");
+
   if (Verbose)
     CmdArgs.push_back("-v");
 
@@ -536,8 +548,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
                       Args.MakeArgString(Arg.split('=').second)});
   }
 
-  // The OpenMPOpt pass can introduce new calls and is expensive, we do not want
-  // this when running CodeGen through clang.
+  // The OpenMPOpt pass ca...
[truncated]

saiislam

The second commit to "Pass all files to the device linker" looks good to me.

Overall, I am fine with the first commit of re-introducing clang-nvlinker-wrapper as well. Last time, we added it to wrap static device archives only (https://reviews.llvm.org/D108291).
But, let's wait for @Artem-B to finish the review of #96561

Artem-B · 2024-07-22T22:32:04Z

clang/docs/ClangNVLinkWrapper.rst

+    -help                Display available options (--help-hidden for more)
+    -L <dir>             Add <dir> to the library search path
+    -l <libname>         Search for library <libname>
+    -mllvm <arg>         Arguments passed to LLVM, including Clang invocations, for which the '-mllvm' prefix is preserved. Use '-mllvm --help' for a list of options.


Nit: Should we wrap long lines here it for readability? I think code blocks in restructured markup will try to scroll along the line w/o wrapping it.

Yep, can do that.

Artem-B · 2024-07-22T22:39:11Z

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp


  if (Args.hasArg(OPT_debug))
    CmdArgs.push_back("-g");

  if (SaveTemps)
    CmdArgs.push_back("-save-temps");

+  if (SaveTemps && linkerSupportsLTO(Args))


Is -save-temps propagated from the top-level compilation?
If so, how do we handle --save-temps=obj -o /some/other/dir/foo.o ?

In this logic we only have the link job remaining, so we pass -Wl,--save-temps to it.

My question is the origin of that option. E.g. is the user uses --save-temps=obj -o /some/other/dir/foo.o at the top-level clang invocation, what do we get if the offloading w/ the new driver and this patch are in effect?

Will the intermediate files end up in the current directory? In the object file directory? Both? E.g clang's files end up in the object file directory, but linker's intermediate files end up in the current dir? Or, perhaps not even created, if clangs--save-temps are not propagated to the linker-wrapper.

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm#96561.

llvm-ci · 2024-07-23T15:40:14Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/2083

Here is the relevant piece of the build log for the reference:

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointWatchpoint.py (612 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentDelayedCrashWithBreakpointSignal.py (613 of 1994)
PASS: lldb-api :: functionalities/progress_reporting/TestTrimmedProgressReporting.py (614 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyCrash.py (615 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManyBreakpoints.py (616 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalBreak.py (617 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentNWatchNBreak.py (618 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalDelayWatch.py (619 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalDelayBreak.py (620 of 1994)
PASS: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentManySignals.py (621 of 1994)
FAIL: lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalNWatchNBreak.py (622 of 1994)
******************** TEST 'lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalNWatchNBreak.py' FAILED ********************
Script:
--
/usr/bin/python3.8 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/functionalities/thread/concurrent_events -p TestConcurrentSignalNWatchNBreak.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision 1a3cfe5b9dc9c80a375506262b54b51d929df52d)
  clang revision 1a3cfe5b9dc9c80a375506262b54b51d929df52d
  llvm revision 1a3cfe5b9dc9c80a375506262b54b51d929df52d
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

Watchpoint 1 hit:
old value: 0
new value: 1

Watchpoint 1 hit:
old value: 1
new value: 1

Watchpoint 1 hit:
old value: 1
new value: 1

Watchpoint 1 hit:
old value: 1
new value: 1

Watchpoint 1 hit:
old value: 1
new value: 1

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test (TestConcurrentSignalNWatchNBreak.ConcurrentSignalNWatchNBreak)

llvm-ci · 2024-07-23T15:40:55Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building clang at step 10 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/2459

Here is the relevant piece of the build log for the reference:

Step 10 (Add check check-offload) failure: test (failure)
...
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_host_pinned_memory.c (26 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: libc/assert.c (27 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_host_pinned_memory_alloc.c (28 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_device_managed_memory.c (29 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/ompx_sync.cpp (30 of 792)
UNSUPPORTED: libomptarget :: amdgcn-amd-amdhsa :: mapping/device_ptr_update.c (31 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: mapping/alloc_fail.c (32 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/ompx_3d.cpp (33 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: libc/global_ctor_dtor.cpp (34 of 792)
XPASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_amdgpu.c (35 of 792)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_amdgpu.c' FAILED ********************
Exit Code: 0

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_amdgpu.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_amdgpu.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libcgpu-amdgpu.a /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -O1 -mllvm -openmp-opt-inline-device
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_amdgpu.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_amdgpu.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libcgpu-amdgpu.a /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -O1 -mllvm -openmp-opt-inline-device
# RUN: at line 2
env LIBOMPTARGET_SHARED_MEMORY_SIZE=256    /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_amdgpu.c.tmp | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_amdgpu.c
# executed command: env LIBOMPTARGET_SHARED_MEMORY_SIZE=256 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_amdgpu.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_amdgpu.c

--

********************
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/ompx_3d.c (36 of 792)
XPASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_mixed_amdgpu.c (37 of 792)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: api/omp_dynamic_shared_memory_mixed_amdgpu.c' FAILED ********************
Exit Code: 0

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_mixed_amdgpu.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_mixed_amdgpu.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libcgpu-amdgpu.a /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -O1 -mllvm -openmp-opt-inline-device -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_mixed_amdgpu.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_mixed_amdgpu.c.tmp /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libcgpu-amdgpu.a /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -O1 -mllvm -openmp-opt-inline-device -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api
# RUN: at line 2
env LIBOMPTARGET_NEXTGEN_PLUGINS=1    /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_mixed_amdgpu.c.tmp | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_mixed_amdgpu.c
# executed command: env LIBOMPTARGET_NEXTGEN_PLUGINS=1 /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/api/Output/omp_dynamic_shared_memory_mixed_amdgpu.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/offload/test/api/omp_dynamic_shared_memory_mixed_amdgpu.c

--

********************
PASS: libomptarget :: amdgcn-amd-amdhsa :: libc/host_call.c (38 of 792)
UNSUPPORTED: libomptarget :: amdgcn-amd-amdhsa :: mapping/map_both_pointer_pointee.c (39 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/omp_device_memory.c (40 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: jit/type_punning.c (41 of 792)
PASS: libomptarget :: amdgcn-amd-amdhsa :: api/ompx_sync.c (42 of 792)

llvm-ci · 2024-07-23T19:12:48Z

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building clang at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/3960

Here is the relevant piece of the build log for the reference:

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Driver/linker-wrapper-passes.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 8: mkdir -p /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp
+ mkdir -p /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp
RUN: at line 9: /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -emit-llvm-bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.bc      -disable-O0-optnone -triple=x86_64-unknown-linux-gnu /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -emit-llvm-bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.bc -disable-O0-optnone -triple=x86_64-unknown-linux-gnu /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
RUN: at line 11: /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -emit-llvm-bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc      -disable-O0-optnone -triple=amdgcn-amd-amdhsa /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -emit-llvm-bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc -disable-O0-optnone -triple=amdgcn-amd-amdhsa /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
RUN: at line 13: /build/buildbot/premerge-monolithic-linux/build/bin/opt /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc      -passes=forceattrs -force-remove-attribute=f:noinline
+ /build/buildbot/premerge-monolithic-linux/build/bin/opt /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc -passes=forceattrs -force-remove-attribute=f:noinline
RUN: at line 15: /build/buildbot/premerge-monolithic-linux/build/bin/clang-offload-packager -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-x86_64-unknown-linux-gnu.out      --image=file=/build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang-offload-packager -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-x86_64-unknown-linux-gnu.out --image=file=/build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-amdgcn-amd-amdhsa.bc,triple=amdgcn-amd-amdhsa
RUN: at line 17: /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -S -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.s      -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa      -fembed-offload-object=/build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-x86_64-unknown-linux-gnu.out      /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.bc
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1 -S -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.s -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -fembed-offload-object=/build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/openmp-x86_64-unknown-linux-gnu.out /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.bc
RUN: at line 21: /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1as -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.o      -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64      /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.s
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang -cc1as -o /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.o -triple x86_64-unknown-linux-gnu -filetype obj -target-cpu x86-64 /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.s
RUN: at line 26: /build/buildbot/premerge-monolithic-linux/build/bin/clang-linker-wrapper -o a.out --embed-bitcode      --linker-path=/usr/bin/true /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.o      --offload-opt=-load-pass-plugin=/build/buildbot/premerge-monolithic-linux/build/lib/Bye.so --offload-opt=-wave-goodbye      --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 |    /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -match-full-lines -check-prefixes=OUT /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck -match-full-lines -check-prefixes=OUT /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c
+ /build/buildbot/premerge-monolithic-linux/build/bin/clang-linker-wrapper -o a.out --embed-bitcode --linker-path=/usr/bin/true /build/buildbot/premerge-monolithic-linux/build/tools/clang/test/Driver/Output/linker-wrapper-passes.c.tmp/host-x86_64-unknown-linux-gnu.o --offload-opt=-load-pass-plugin=/build/buildbot/premerge-monolithic-linux/build/lib/Bye.so --offload-opt=-wave-goodbye '--offload-opt=-passes=function(goodbye),module(inline)'
/build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c:51:9: error: OUT: expected string not found in input
// OUT: Bye: f
        ^
<stdin>:1:1: note: scanning from here
clang: error: cannot determine amdgcn architecture: /build/buildbot/premerge-monolithic-linux/build/bin/amdgpu-arch: ; consider passing it via '-mcpu'
^
<stdin>:1:45: note: possible intended match here
clang: error: cannot determine amdgcn architecture: /build/buildbot/premerge-monolithic-linux/build/bin/amdgpu-arch: ; consider passing it via '-mcpu'
                                            ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/clang/test/Driver/linker-wrapper-passes.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: clang: error: cannot determine amdgcn architecture: /build/buildbot/premerge-monolithic-linux/build/bin/amdgpu-arch: ; consider passing it via '-mcpu' 
check:51'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
check:51'1                                                 ?                                                                                                           possible intended match
            2: /build/buildbot/premerge-monolithic-linux/build/bin/clang-linker-wrapper: error: 'clang' failed 
check:51'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote #96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on #96561.

jhuber6 requested review from Artem-B, jdoerfert, jplehr, saiislam, shiltian and yxsamliu July 3, 2024 13:27

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jul 3, 2024

jhuber6 force-pushed the ReworkDeviceLinking branch 2 times, most recently from 0eb160c to 256b676 Compare July 9, 2024 14:07

saiislam approved these changes Jul 17, 2024

View reviewed changes

Artem-B reviewed Jul 22, 2024

View reviewed changes

jhuber6 force-pushed the ReworkDeviceLinking branch from 256b676 to aae059e Compare July 23, 2024 15:06

jhuber6 merged commit 1a3cfe5 into llvm:main Jul 23, 2024
4 of 6 checks passed

This was referenced Mar 21, 2025

[mlir][vector] Allow multi dim vectors in vector.scatter #132217

Merged

[docs][GitHub] Document alternative approach to stacked PRs #132424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LinkerWrapper] Pass all files to the device linker #97573

[LinkerWrapper] Pass all files to the device linker #97573

Uh oh!

jhuber6 commented Jul 3, 2024

Uh oh!

llvmbot commented Jul 3, 2024 •

edited

Loading

Uh oh!

saiislam left a comment •

edited

Loading

Uh oh!

Artem-B Jul 22, 2024

Uh oh!

jhuber6 Jul 22, 2024

Uh oh!

Artem-B Jul 22, 2024

Uh oh!

jhuber6 Jul 22, 2024

Uh oh!

Artem-B Jul 22, 2024

Uh oh!

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

Uh oh!

[LinkerWrapper] Pass all files to the device linker #97573

[LinkerWrapper] Pass all files to the device linker #97573

Uh oh!

Conversation

jhuber6 commented Jul 3, 2024

Uh oh!

llvmbot commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saiislam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Artem-B Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

jhuber6 Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

Artem-B Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

llvm-ci commented Jul 23, 2024

Uh oh!

Uh oh!

llvmbot commented Jul 3, 2024 •

edited

Loading

saiislam left a comment •

edited

Loading