Skip to content

Commit 3f82335

Browse files
jhuber6yuxuanchen1997
authored andcommitted
[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' (#96561)
Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. This also copies some amount of logic from the clang-linker-wrapper, but not enough for it to be worthwhile to merge them I feel. In the future it may be possible to delete that handling from there entirely. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251377
1 parent 2ab800d commit 3f82335

File tree

11 files changed

+1076
-57
lines changed

11 files changed

+1076
-57
lines changed

clang/docs/ClangNVLinkWrapper.rst

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
====================
2+
Clang nvlink Wrapper
3+
====================
4+
5+
.. contents::
6+
:local:
7+
8+
.. _clang-nvlink-wrapper:
9+
10+
Introduction
11+
============
12+
13+
This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
14+
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
15+
while still relying on NVIDIA's proprietary linker to produce the final output.
16+
17+
``nvlink`` has a number of known quirks that make it difficult to use in a
18+
unified offloading setting. For example, it does not accept ``.o`` files as they
19+
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will
20+
provide a linker error. ``nvlink`` also does not support link time optimization
21+
and ignores many standard linker arguments. This tool works around these issues.
22+
23+
Usage
24+
=====
25+
26+
This tool can be used with the following options. Any arguments not intended
27+
only for the linker wrapper will be forwarded to ``nvlink``.
28+
29+
.. code-block:: console
30+
31+
OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
32+
This enables static linking and LTO handling for NVPTX targets.
33+
34+
USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>
35+
36+
OPTIONS:
37+
--arch <value> Specify the 'sm_' name of the target architecture.
38+
--cuda-path=<dir> Set the system CUDA path
39+
--dry-run Print generated commands without running.
40+
--feature <value> Specify the '+ptx' freature to use for LTO.
41+
-g Specify that this was a debug compile.
42+
-help-hidden Display all available options
43+
-help Display available options (--help-hidden for more)
44+
-L <dir> Add <dir> to the library search path
45+
-l <libname> Search for library <libname>
46+
-mllvm <arg> Arguments passed to LLVM, including Clang invocations,
47+
for which the '-mllvm' prefix is preserved. Use '-mllvm
48+
--help' for a list of options.
49+
-o <path> Path to file to write output
50+
--plugin-opt=jobs=<value>
51+
Number of LTO codegen partitions
52+
--plugin-opt=lto-partitions=<value>
53+
Number of LTO codegen partitions
54+
--plugin-opt=O<O0, O1, O2, or O3>
55+
Optimization level for LTO
56+
--plugin-opt=thinlto<value>
57+
Enable the thin-lto backend
58+
--plugin-opt=<value> Arguments passed to LLVM, including Clang invocations,
59+
for which the '-mllvm' prefix is preserved. Use '-mllvm
60+
--help' for a list of options.
61+
--save-temps Save intermediate results
62+
--version Display the version number and exit
63+
-v Print verbose information
64+
65+
Example
66+
=======
67+
68+
This tool is intended to be invoked when targeting the NVPTX toolchain directly
69+
as a cross-compiling target. This can be used to create standalone GPU
70+
executables with normal linking semantics similar to standard compilation.
71+
72+
.. code-block:: console
73+
74+
clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c

clang/docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ Using Clang Tools
9292
ClangFormatStyleOptions
9393
ClangFormattedStatus
9494
ClangLinkerWrapper
95+
ClangNVLinkWrapper
9596
ClangOffloadBundler
9697
ClangOffloadPackager
9798
ClangRepl

clang/lib/Driver/ToolChains/Cuda.cpp

Lines changed: 12 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
461461
CmdArgs.push_back("--output-file");
462462
std::string OutputFileName = TC.getInputFilename(Output);
463463

464-
// If we are invoking `nvlink` internally we need to output a `.cubin` file.
465-
// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
466-
if (!C.getInputArgs().getLastArg(options::OPT_c)) {
467-
SmallString<256> Filename(Output.getFilename());
468-
llvm::sys::path::replace_extension(Filename, "cubin");
469-
OutputFileName = Filename.str();
470-
}
471464
if (Output.isFilename() && OutputFileName != Output.getFilename())
472465
C.addTempFile(Args.MakeArgString(OutputFileName));
473466

@@ -612,64 +605,34 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
612605
CmdArgs.push_back("-arch");
613606
CmdArgs.push_back(Args.MakeArgString(GPUArch));
614607

608+
if (Args.hasArg(options::OPT_ptxas_path_EQ))
609+
CmdArgs.push_back(Args.MakeArgString(
610+
"--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));
611+
615612
// Add paths specified in LIBRARY_PATH environment variable as -L options.
616613
addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
617614

618615
// Add standard library search paths passed on the command line.
619616
Args.AddAllArgs(CmdArgs, options::OPT_L);
620617
getToolChain().AddFilePathLibArgs(Args, CmdArgs);
618+
AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
619+
620+
if (C.getDriver().isUsingLTO())
621+
addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
622+
C.getDriver().getLTOMode() == LTOK_Thin);
621623

622624
// Add paths for the default clang library path.
623625
SmallString<256> DefaultLibPath =
624626
llvm::sys::path::parent_path(TC.getDriver().Dir);
625627
llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
626628
CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
627629

628-
for (const auto &II : Inputs) {
629-
if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
630-
II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
631-
C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
632-
<< getToolChain().getTripleString();
633-
continue;
634-
}
635-
636-
// The 'nvlink' application performs RDC-mode linking when given a '.o'
637-
// file and device linking when given a '.cubin' file. We always want to
638-
// perform device linking, so just rename any '.o' files.
639-
// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
640-
if (II.isFilename()) {
641-
auto InputFile = getToolChain().getInputFilename(II);
642-
if (llvm::sys::path::extension(InputFile) != ".cubin") {
643-
// If there are no actions above this one then this is direct input and
644-
// we can copy it. Otherwise the input is internal so a `.cubin` file
645-
// should exist.
646-
if (II.getAction() && II.getAction()->getInputs().size() == 0) {
647-
const char *CubinF =
648-
Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
649-
llvm::sys::path::stem(InputFile), "cubin"));
650-
if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
651-
continue;
652-
653-
CmdArgs.push_back(CubinF);
654-
} else {
655-
SmallString<256> Filename(InputFile);
656-
llvm::sys::path::replace_extension(Filename, "cubin");
657-
CmdArgs.push_back(Args.MakeArgString(Filename));
658-
}
659-
} else {
660-
CmdArgs.push_back(Args.MakeArgString(InputFile));
661-
}
662-
} else if (!II.isNothing()) {
663-
II.getInputArg().renderAsInput(Args, CmdArgs);
664-
}
665-
}
666-
667630
C.addCommand(std::make_unique<Command>(
668631
JA, *this,
669632
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
670633
"--options-file"},
671-
Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
672-
Inputs, Output));
634+
Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
635+
CmdArgs, Inputs, Output));
673636
}
674637

675638
void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +912,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
949912
if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
950913
return ToolChain::getInputFilename(Input);
951914

952-
// Replace extension for object files with cubin because nvlink relies on
953-
// these particular file names.
954-
SmallString<256> Filename(ToolChain::getInputFilename(Input));
955-
llvm::sys::path::replace_extension(Filename, "cubin");
956-
return std::string(Filename);
915+
return ToolChain::getInputFilename(Input);
957916
}
958917

959918
llvm::opt::DerivedArgList *

clang/lib/Driver/ToolChains/Cuda.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
155155
bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
156156
return false;
157157
}
158+
bool HasNativeLLVMSupport() const override { return true; }
158159
bool isPICDefaultForced() const override { return false; }
159160
bool SupportsProfiling() const override { return false; }
160161

@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
192193
return &HostTC.getTriple();
193194
}
194195

196+
bool HasNativeLLVMSupport() const override { return false; }
197+
195198
std::string getInputFilename(const InputInfo &Input) const override;
196199

197200
llvm::opt::DerivedArgList *

clang/test/Driver/cuda-cross-compiling.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@
3232
// RUN: | FileCheck -check-prefix=ARGS %s
3333

3434
// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
35-
// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
36-
// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
35+
// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
36+
// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}"[[CUBIN]].o"
3737

3838
//
3939
// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
@@ -55,7 +55,7 @@
5555
// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
5656
// RUN: | FileCheck -check-prefix=LINK %s
5757

58-
// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
58+
// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}[[CUBIN:.+]].o
5959

6060
//
6161
// Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
7272
// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
7373
// RUN: | FileCheck -check-prefix=LINKER-ARGS %s
7474

75-
// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
75+
// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
7676

7777
// Tests for handling a missing architecture.
7878
//

clang/test/Driver/nvlink-wrapper.c

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
// REQUIRES: x86-registered-target
2+
// REQUIRES: nvptx-registered-target
3+
4+
#if defined(X)
5+
extern int y;
6+
int foo() { return y; }
7+
8+
int x = 0;
9+
#elif defined(Y)
10+
int y = 42;
11+
#elif defined(Z)
12+
int z = 42;
13+
#elif defined(W)
14+
int w = 42;
15+
#elif defined(U)
16+
extern int x;
17+
extern int __attribute__((weak)) w;
18+
19+
int bar() {
20+
return x + w;
21+
}
22+
#else
23+
extern int y;
24+
int __attribute__((visibility("hidden"))) x = 999;
25+
int baz() { return y + x; }
26+
#endif
27+
28+
// Create various inputs to test basic linking and LTO capabilities. Creating a
29+
// CUDA binary requires access to the `ptxas` executable, so we just use x64.
30+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
31+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
32+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
33+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
34+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
35+
// RUN: llvm-ar rcs %t-x.a %t-x.o
36+
// RUN: llvm-ar rcs %t-y.a %t-y.o
37+
// RUN: llvm-ar rcs %t-z.a %t-z.o
38+
// RUN: llvm-ar rcs %t-w.a %t-w.o
39+
40+
//
41+
// Check that we forward any unrecognized argument to 'nvlink'.
42+
//
43+
// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
44+
// RUN: | FileCheck %s --check-prefix=ARGS
45+
// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
46+
47+
//
48+
// Check the symbol resolution for static archives. We expect to only link
49+
// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
50+
// is not used at all.
51+
//
52+
// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
53+
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
54+
// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
55+
56+
// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
57+
58+
//
59+
// Check that the LTO interface works and properly preserves symbols used in a
60+
// regular object file.
61+
//
62+
// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
63+
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
64+
// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
65+
// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin

clang/test/lit.cfg.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@
9595
"llvm-ifs",
9696
"yaml2obj",
9797
"clang-linker-wrapper",
98+
"clang-nvlink-wrapper",
9899
"llvm-lto",
99100
"llvm-lto2",
100101
"llvm-profdata",

clang/tools/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
99
add_clang_subdirectory(clang-fuzzer)
1010
add_clang_subdirectory(clang-import-test)
1111
add_clang_subdirectory(clang-linker-wrapper)
12+
add_clang_subdirectory(clang-nvlink-wrapper)
1213
add_clang_subdirectory(clang-offload-packager)
1314
add_clang_subdirectory(clang-offload-bundler)
1415
add_clang_subdirectory(clang-scan-deps)
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
set(LLVM_LINK_COMPONENTS
2+
${LLVM_TARGETS_TO_BUILD}
3+
BitWriter
4+
Core
5+
BinaryFormat
6+
MC
7+
Target
8+
TransformUtils
9+
Analysis
10+
Passes
11+
IRReader
12+
Object
13+
Option
14+
Support
15+
TargetParser
16+
CodeGen
17+
LTO
18+
)
19+
20+
set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
21+
tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
22+
add_public_tablegen_target(NVLinkWrapperOpts)
23+
24+
if(NOT CLANG_BUILT_STANDALONE)
25+
set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
26+
endif()
27+
28+
add_clang_tool(clang-nvlink-wrapper
29+
ClangNVLinkWrapper.cpp
30+
31+
DEPENDS
32+
${tablegen_deps}
33+
)
34+
35+
set(CLANG_NVLINK_WRAPPER_LIB_DEPS
36+
clangBasic
37+
)
38+
39+
target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
40+
41+
target_link_libraries(clang-nvlink-wrapper
42+
PRIVATE
43+
${CLANG_NVLINK_WRAPPER_LIB_DEPS}
44+
)

0 commit comments

Comments
 (0)