Skip to content

[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions clang/docs/ClangNVLinkWrapper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
====================
Clang nvlink Wrapper
====================

.. contents::
:local:

.. _clang-nvlink-wrapper:

Introduction
============

This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
while still relying on NVIDIA's proprietary linker to produce the final output.

``nvlink`` has a number of known quirks that make it difficult to use in a
unified offloading setting. For example, it does not accept ``.o`` files as they
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will
provide a linker error. ``nvlink`` also does not support link time optimization
and ignores many standard linker arguments. This tool works around these issues.

Usage
=====

This tool can be used with the following options. Any arguments not intended
only for the linker wrapper will be forwarded to ``nvlink``.

.. code-block:: console

OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
This enables static linking and LTO handling for NVPTX targets.

USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>

OPTIONS:
--arch <value> Specify the 'sm_' name of the target architecture.
--cuda-path=<dir> Set the system CUDA path
--dry-run Print generated commands without running.
--feature <value> Specify the '+ptx' freature to use for LTO.
-g Specify that this was a debug compile.
-help-hidden Display all available options
-help Display available options (--help-hidden for more)
-L <dir> Add <dir> to the library search path
-l <libname> Search for library <libname>
-mllvm <arg> Arguments passed to LLVM, including Clang invocations,
for which the '-mllvm' prefix is preserved. Use '-mllvm
--help' for a list of options.
-o <path> Path to file to write output
--plugin-opt=jobs=<value>
Number of LTO codegen partitions
--plugin-opt=lto-partitions=<value>
Number of LTO codegen partitions
--plugin-opt=O<O0, O1, O2, or O3>
Optimization level for LTO
--plugin-opt=thinlto<value>
Enable the thin-lto backend
--plugin-opt=<value> Arguments passed to LLVM, including Clang invocations,
for which the '-mllvm' prefix is preserved. Use '-mllvm
--help' for a list of options.
--save-temps Save intermediate results
--version Display the version number and exit
-v Print verbose information

Example
=======

This tool is intended to be invoked when targeting the NVPTX toolchain directly
as a cross-compiling target. This can be used to create standalone GPU
executables with normal linking semantics similar to standard compilation.

.. code-block:: console

clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c
1 change: 1 addition & 0 deletions clang/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Using Clang Tools
ClangFormatStyleOptions
ClangFormattedStatus
ClangLinkerWrapper
ClangNVLinkWrapper
ClangOffloadBundler
ClangOffloadPackager
ClangRepl
Expand Down
65 changes: 12 additions & 53 deletions clang/lib/Driver/ToolChains/Cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("--output-file");
std::string OutputFileName = TC.getInputFilename(Output);

// If we are invoking `nvlink` internally we need to output a `.cubin` file.
// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
if (!C.getInputArgs().getLastArg(options::OPT_c)) {
SmallString<256> Filename(Output.getFilename());
llvm::sys::path::replace_extension(Filename, "cubin");
OutputFileName = Filename.str();
}
if (Output.isFilename() && OutputFileName != Output.getFilename())
C.addTempFile(Args.MakeArgString(OutputFileName));

Expand Down Expand Up @@ -612,64 +605,34 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-arch");
CmdArgs.push_back(Args.MakeArgString(GPUArch));

if (Args.hasArg(options::OPT_ptxas_path_EQ))
CmdArgs.push_back(Args.MakeArgString(
"--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));

// Add paths specified in LIBRARY_PATH environment variable as -L options.
addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");

// Add standard library search paths passed on the command line.
Args.AddAllArgs(CmdArgs, options::OPT_L);
getToolChain().AddFilePathLibArgs(Args, CmdArgs);
AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);

if (C.getDriver().isUsingLTO())
addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
C.getDriver().getLTOMode() == LTOK_Thin);

// Add paths for the default clang library path.
SmallString<256> DefaultLibPath =
llvm::sys::path::parent_path(TC.getDriver().Dir);
llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));

for (const auto &II : Inputs) {
if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
<< getToolChain().getTripleString();
continue;
}

// The 'nvlink' application performs RDC-mode linking when given a '.o'
// file and device linking when given a '.cubin' file. We always want to
// perform device linking, so just rename any '.o' files.
// FIXME: This should hopefully be removed if NVIDIA updates their tooling.
if (II.isFilename()) {
auto InputFile = getToolChain().getInputFilename(II);
if (llvm::sys::path::extension(InputFile) != ".cubin") {
// If there are no actions above this one then this is direct input and
// we can copy it. Otherwise the input is internal so a `.cubin` file
// should exist.
if (II.getAction() && II.getAction()->getInputs().size() == 0) {
const char *CubinF =
Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
llvm::sys::path::stem(InputFile), "cubin"));
if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
continue;

CmdArgs.push_back(CubinF);
} else {
SmallString<256> Filename(InputFile);
llvm::sys::path::replace_extension(Filename, "cubin");
CmdArgs.push_back(Args.MakeArgString(Filename));
}
} else {
CmdArgs.push_back(Args.MakeArgString(InputFile));
}
} else if (!II.isNothing()) {
II.getInputArg().renderAsInput(Args, CmdArgs);
}
}

C.addCommand(std::make_unique<Command>(
JA, *this,
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
"--options-file"},
Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
Inputs, Output));
Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
CmdArgs, Inputs, Output));
}

void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
Expand Down Expand Up @@ -949,11 +912,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
return ToolChain::getInputFilename(Input);

// Replace extension for object files with cubin because nvlink relies on
// these particular file names.
SmallString<256> Filename(ToolChain::getInputFilename(Input));
llvm::sys::path::replace_extension(Filename, "cubin");
return std::string(Filename);
return ToolChain::getInputFilename(Input);
}

llvm::opt::DerivedArgList *
Expand Down
3 changes: 3 additions & 0 deletions clang/lib/Driver/ToolChains/Cuda.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
return false;
}
bool HasNativeLLVMSupport() const override { return true; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }

Expand Down Expand Up @@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
return &HostTC.getTriple();
}

bool HasNativeLLVMSupport() const override { return false; }

std::string getInputFilename(const InputInfo &Input) const override;

llvm::opt::DerivedArgList *
Expand Down
8 changes: 4 additions & 4 deletions clang/test/Driver/cuda-cross-compiling.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@
// RUN: | FileCheck -check-prefix=ARGS %s

// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}"[[CUBIN]].o"

//
// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
Expand All @@ -55,7 +55,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
// RUN: | FileCheck -check-prefix=LINK %s

// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61"{{.*}}[[CUBIN:.+]].o

//
// Test to ensure that we enable handling global constructors in a freestanding
Expand All @@ -72,7 +72,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
// RUN: | FileCheck -check-prefix=LINKER-ARGS %s

// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"

// Tests for handling a missing architecture.
//
Expand Down
65 changes: 65 additions & 0 deletions clang/test/Driver/nvlink-wrapper.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// REQUIRES: x86-registered-target
// REQUIRES: nvptx-registered-target

#if defined(X)
extern int y;
int foo() { return y; }

int x = 0;
#elif defined(Y)
int y = 42;
#elif defined(Z)
int z = 42;
#elif defined(W)
int w = 42;
#elif defined(U)
extern int x;
extern int __attribute__((weak)) w;

int bar() {
return x + w;
}
#else
extern int y;
int __attribute__((visibility("hidden"))) x = 999;
int baz() { return y + x; }
#endif

// Create various inputs to test basic linking and LTO capabilities. Creating a
// CUDA binary requires access to the `ptxas` executable, so we just use x64.
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
// RUN: llvm-ar rcs %t-x.a %t-x.o
// RUN: llvm-ar rcs %t-y.a %t-y.o
// RUN: llvm-ar rcs %t-z.a %t-z.o
// RUN: llvm-ar rcs %t-w.a %t-w.o

//
// Check that we forward any unrecognized argument to 'nvlink'.
//
// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
// RUN: | FileCheck %s --check-prefix=ARGS
// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin

//
// Check the symbol resolution for static archives. We expect to only link
// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
// is not used at all.
//
// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin

// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o

//
// Check that the LTO interface works and properly preserves symbols used in a
// regular object file.
//
// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
1 change: 1 addition & 0 deletions clang/test/lit.cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
"llvm-ifs",
"yaml2obj",
"clang-linker-wrapper",
"clang-nvlink-wrapper",
"llvm-lto",
"llvm-lto2",
"llvm-profdata",
Expand Down
1 change: 1 addition & 0 deletions clang/tools/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
add_clang_subdirectory(clang-fuzzer)
add_clang_subdirectory(clang-import-test)
add_clang_subdirectory(clang-linker-wrapper)
add_clang_subdirectory(clang-nvlink-wrapper)
add_clang_subdirectory(clang-offload-packager)
add_clang_subdirectory(clang-offload-bundler)
add_clang_subdirectory(clang-scan-deps)
Expand Down
44 changes: 44 additions & 0 deletions clang/tools/clang-nvlink-wrapper/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
set(LLVM_LINK_COMPONENTS
${LLVM_TARGETS_TO_BUILD}
BitWriter
Core
BinaryFormat
MC
Target
TransformUtils
Analysis
Passes
IRReader
Object
Option
Support
TargetParser
CodeGen
LTO
)

set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
add_public_tablegen_target(NVLinkWrapperOpts)

if(NOT CLANG_BUILT_STANDALONE)
set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
endif()

add_clang_tool(clang-nvlink-wrapper
ClangNVLinkWrapper.cpp

DEPENDS
${tablegen_deps}
)

set(CLANG_NVLINK_WRAPPER_LIB_DEPS
clangBasic
)

target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious. Do we actually want to build this with -g -O0 all the time or was this left in from debugging or something like that? In the unlikely event that we do want this for some reason, it won't work as is on windows anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks for pointing that out.


target_link_libraries(clang-nvlink-wrapper
PRIVATE
${CLANG_NVLINK_WRAPPER_LIB_DEPS}
)
Loading
Loading