-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
==================== | ||
Clang nvlink Wrapper | ||
==================== | ||
|
||
.. contents:: | ||
:local: | ||
|
||
.. _clang-nvlink-wrapper: | ||
|
||
Introduction | ||
============ | ||
|
||
This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose | ||
of this wrapper is to provide an interface similar to the ``ld.lld`` linker | ||
while still relying on NVIDIA's proprietary linker to produce the final output. | ||
|
||
``nvlink`` has a number of known quirks that make it difficult to use in a | ||
unified offloading setting. For example, it does not accept ``.o`` files as they | ||
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will | ||
provide a linker error. ``nvlink`` also does not support link time optimization | ||
and ignores many standard linker arguments. This tool works around these issues. | ||
|
||
Usage | ||
===== | ||
|
||
This tool can be used with the following options. Any arguments not intended | ||
only for the linker wrapper will be forwarded to ``nvlink``. | ||
|
||
.. code-block:: console | ||
|
||
OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker. | ||
This enables static linking and LTO handling for NVPTX targets. | ||
|
||
USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink> | ||
|
||
OPTIONS: | ||
--arch <value> Specify the 'sm_' name of the target architecture. | ||
--cuda-path=<dir> Set the system CUDA path | ||
--dry-run Print generated commands without running. | ||
--feature <value> Specify the '+ptx' freature to use for LTO. | ||
-g Specify that this was a debug compile. | ||
-help-hidden Display all available options | ||
-help Display available options (--help-hidden for more) | ||
-L <dir> Add <dir> to the library search path | ||
-l <libname> Search for library <libname> | ||
-mllvm <arg> Arguments passed to LLVM, including Clang invocations, | ||
for which the '-mllvm' prefix is preserved. Use '-mllvm | ||
--help' for a list of options. | ||
-o <path> Path to file to write output | ||
--plugin-opt=jobs=<value> | ||
Number of LTO codegen partitions | ||
--plugin-opt=lto-partitions=<value> | ||
Number of LTO codegen partitions | ||
--plugin-opt=O<O0, O1, O2, or O3> | ||
Optimization level for LTO | ||
--plugin-opt=thinlto<value> | ||
Enable the thin-lto backend | ||
--plugin-opt=<value> Arguments passed to LLVM, including Clang invocations, | ||
for which the '-mllvm' prefix is preserved. Use '-mllvm | ||
--help' for a list of options. | ||
--save-temps Save intermediate results | ||
--version Display the version number and exit | ||
-v Print verbose information | ||
|
||
Example | ||
======= | ||
|
||
This tool is intended to be invoked when targeting the NVPTX toolchain directly | ||
as a cross-compiling target. This can be used to create standalone GPU | ||
executables with normal linking semantics similar to standard compilation. | ||
|
||
.. code-block:: console | ||
|
||
clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
// REQUIRES: x86-registered-target | ||
// REQUIRES: nvptx-registered-target | ||
|
||
#if defined(X) | ||
extern int y; | ||
int foo() { return y; } | ||
|
||
int x = 0; | ||
#elif defined(Y) | ||
int y = 42; | ||
#elif defined(Z) | ||
int z = 42; | ||
#elif defined(W) | ||
int w = 42; | ||
#elif defined(U) | ||
extern int x; | ||
extern int __attribute__((weak)) w; | ||
|
||
int bar() { | ||
return x + w; | ||
} | ||
#else | ||
extern int y; | ||
int __attribute__((visibility("hidden"))) x = 999; | ||
int baz() { return y + x; } | ||
#endif | ||
|
||
// Create various inputs to test basic linking and LTO capabilities. Creating a | ||
// CUDA binary requires access to the `ptxas` executable, so we just use x64. | ||
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o | ||
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o | ||
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o | ||
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o | ||
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o | ||
// RUN: llvm-ar rcs %t-x.a %t-x.o | ||
// RUN: llvm-ar rcs %t-y.a %t-y.o | ||
// RUN: llvm-ar rcs %t-z.a %t-z.o | ||
// RUN: llvm-ar rcs %t-w.a %t-w.o | ||
|
||
// | ||
// Check that we forward any unrecognized argument to 'nvlink'. | ||
// | ||
// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \ | ||
// RUN: | FileCheck %s --check-prefix=ARGS | ||
// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin | ||
|
||
// | ||
// Check the symbol resolution for static archives. We expect to only link | ||
// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a` | ||
// is not used at all. | ||
// | ||
// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \ | ||
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK | ||
// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin | ||
|
||
// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o | ||
|
||
// | ||
// Check that the LTO interface works and properly preserves symbols used in a | ||
// regular object file. | ||
// | ||
// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \ | ||
// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO | ||
// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin | ||
// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
set(LLVM_LINK_COMPONENTS | ||
${LLVM_TARGETS_TO_BUILD} | ||
BitWriter | ||
Core | ||
BinaryFormat | ||
MC | ||
Target | ||
TransformUtils | ||
Analysis | ||
Passes | ||
IRReader | ||
Object | ||
Option | ||
Support | ||
TargetParser | ||
CodeGen | ||
LTO | ||
) | ||
|
||
set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td) | ||
tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs) | ||
add_public_tablegen_target(NVLinkWrapperOpts) | ||
|
||
if(NOT CLANG_BUILT_STANDALONE) | ||
set(tablegen_deps intrinsics_gen NVLinkWrapperOpts) | ||
endif() | ||
|
||
add_clang_tool(clang-nvlink-wrapper | ||
ClangNVLinkWrapper.cpp | ||
|
||
DEPENDS | ||
${tablegen_deps} | ||
) | ||
|
||
set(CLANG_NVLINK_WRAPPER_LIB_DEPS | ||
clangBasic | ||
) | ||
|
||
target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0") | ||
|
||
target_link_libraries(clang-nvlink-wrapper | ||
PRIVATE | ||
${CLANG_NVLINK_WRAPPER_LIB_DEPS} | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. Do we actually want to build this with
-g -O0
all the time or was this left in from debugging or something like that? In the unlikely event that we do want this for some reason, it won't work as is on windows anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, thanks for pointing that out.