-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Clang] Introduce 'clang-nvlink-wrapper' to work around 'nvlink' #96561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) ChangesSummary: While NVIDIA provides a linker called 'nvlink', its main interface is The main reason I want to re-intorudce this tool is because I am
Linking in these libraries will then simply require passing In the future we may be able to convince NVIDIA to port their linker to Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff 8 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("--output-file");
std::string OutputFileName = TC.getInputFilename(Output);
- // If we are invoking `nvlink` internally we need to output a `.cubin` file.
- // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
- if (!C.getInputArgs().getLastArg(options::OPT_c)) {
- SmallString<256> Filename(Output.getFilename());
- llvm::sys::path::replace_extension(Filename, "cubin");
- OutputFileName = Filename.str();
- }
if (Output.isFilename() && OutputFileName != Output.getFilename())
C.addTempFile(Args.MakeArgString(OutputFileName));
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
// Add standard library search paths passed on the command line.
Args.AddAllArgs(CmdArgs, options::OPT_L);
getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+ AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+ if (C.getDriver().isUsingLTO())
+ addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+ C.getDriver().getLTOMode() == LTOK_Thin);
// Add paths for the default clang library path.
SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
- for (const auto &II : Inputs) {
- if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
- II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
- C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
- << getToolChain().getTripleString();
- continue;
- }
-
- // The 'nvlink' application performs RDC-mode linking when given a '.o'
- // file and device linking when given a '.cubin' file. We always want to
- // perform device linking, so just rename any '.o' files.
- // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
- if (II.isFilename()) {
- auto InputFile = getToolChain().getInputFilename(II);
- if (llvm::sys::path::extension(InputFile) != ".cubin") {
- // If there are no actions above this one then this is direct input and
- // we can copy it. Otherwise the input is internal so a `.cubin` file
- // should exist.
- if (II.getAction() && II.getAction()->getInputs().size() == 0) {
- const char *CubinF =
- Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
- llvm::sys::path::stem(InputFile), "cubin"));
- if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
- continue;
-
- CmdArgs.push_back(CubinF);
- } else {
- SmallString<256> Filename(InputFile);
- llvm::sys::path::replace_extension(Filename, "cubin");
- CmdArgs.push_back(Args.MakeArgString(Filename));
- }
- } else {
- CmdArgs.push_back(Args.MakeArgString(InputFile));
- }
- } else if (!II.isNothing()) {
- II.getInputArg().renderAsInput(Args, CmdArgs);
- }
- }
-
C.addCommand(std::make_unique<Command>(
JA, *this,
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
"--options-file"},
- Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
- Inputs, Output));
+ Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+ CmdArgs, Inputs, Output));
}
void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +908,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
return ToolChain::getInputFilename(Input);
- // Replace extension for object files with cubin because nvlink relies on
- // these particular file names.
- SmallString<256> Filename(ToolChain::getInputFilename(Input));
- llvm::sys::path::replace_extension(Filename, "cubin");
- return std::string(Filename);
+ return ToolChain::getInputFilename(Input);
}
llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 43c17ba7c0ba0..0735c36f116bc 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
return false;
}
+ bool HasNativeLLVMSupport() const override { return true; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
return &HostTC.getTriple();
}
+ bool HasNativeLLVMSupport() const override { return false; }
+
std::string getInputFilename(const InputInfo &Input) const override;
llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..f839c36d23e51 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
// RUN: | FileCheck -check-prefix=ARGS %s
// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].o"
//
// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
@@ -55,7 +55,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
// RUN: | FileCheck -check-prefix=LINK %s
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.o"
//
// Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
// RUN: | FileCheck -check-prefix=LINKER-ARGS %s
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
// Tests for handling a missing architecture.
//
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..488bcd062467f
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,64 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int y;
+extern int __attribute__((weak)) w;
+
+int bar() {
+ return x + y + w;
+}
+#else
+int __attribute__((visibility("hidden"))) x = 999;
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN: | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
add_clang_subdirectory(clang-fuzzer)
add_clang_subdirectory(clang-import-test)
add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
add_clang_subdirectory(clang-offload-packager)
add_clang_subdirectory(clang-offload-bundler)
add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-nvlink-wrapper/CMakeLists.txt b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
new file mode 100644
index 0000000000000..d46f66994cf39
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
@@ -0,0 +1,44 @@
+set(LLVM_LINK_COMPONENTS
+ ${LLVM_TARGETS_TO_BUILD}
+ BitWriter
+ Core
+ BinaryFormat
+ MC
+ Target
+ TransformUtils
+ Analysis
+ Passes
+ IRReader
+ Object
+ Option
+ Support
+ TargetParser
+ CodeGen
+ LTO
+ )
+
+set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
+tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
+add_public_tablegen_target(NVLinkWrapperOpts)
+
+if(NOT CLANG_BUILT_STANDALONE)
+ set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
+endif()
+
+add_clang_tool(clang-nvlink-wrapper
+ ClangNVLinkWrapper.cpp
+
+ DEPENDS
+ ${tablegen_deps}
+ )
+
+set(CLANG_NVLINK_WRAPPER_LIB_DEPS
+ clangBasic
+ )
+
+target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
+
+target_link_libraries(clang-nvlink-wrapper
+ PRIVATE
+ ${CLANG_NVLINK_WRAPPER_LIB_DEPS}
+ )
diff --git a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
new file mode 100644
index 0000000000000..d48bf4e37ecfe
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
@@ -0,0 +1,671 @@
+//===-- clang-nvlink-wrapper/ClangNVLinkWrapper.cpp - NVIDIA linker util --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This tool wraps around the NVIDIA linker called 'nvlink'. The NVIDIA linker
+// is required to create NVPTX applications, but does not support common
+// features like LTO or archives. This utility wraps around the tool to cover
+// its deficiencies. This tool can be removed once NVIDIA improves their linker
+// or ports it to `ld.lld`.
+//
+//===---------------------------------------------------------------------===//
+
+#include "clang/Basic/Version.h"
+
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/BinaryFormat/Magic.h"
+#include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/IR/DiagnosticPrinter.h"
+#include "llvm/LTO/LTO.h"
+#include "llvm/Object/Archive.h"
+#include "llvm/Object/ArchiveWriter.h"
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Object/IRObjectFile.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Option/ArgList.h"
+#include "llvm/Option/OptTable.h"
+#include "llvm/Option/Option.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileOutputBuffer.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/InitLLVM.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/StringSaver.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/WithColor.h"
+
+using namespace llvm;
+using namespace llvm::opt;
+using namespace llvm::object;
+
+static void printVersion(raw_ostream &OS) {
+ OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
+}
+
+/// The value of `argv[0]` when run.
+static const char *Executable;
+
+/// Temporary files to be cleaned up.
+static SmallVector<SmallString<128>> TempFiles;
+
+/// Codegen flags for LTO backend.
+static codegen::RegisterCodeGenFlags CodeGenFlags;
+
+namespace {
+/// Must not overlap with llvm::opt::DriverFlag.
+enum WrapperFlags {
+ WrapperOnlyOption = (1 << 4), // Options only used by the linker wrapper.
+ DeviceOnlyOption = (1 << 5), // Options only used for device linking.
+};
+
+enum ID {
+ OPT_INVALID = 0, // This is not an option ID.
+#define OPTION(...) LLVM_MAKE_OPT_ID(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+ LastOption
+#undef OPTION
+};
+
+#define PREFIX(NAME, VALUE) \
+ static constexpr StringLiteral NAME##_init[] = VALUE; \
+ static constexpr ArrayRef<StringLiteral> NAME(NAME##_init, \
+ std::size(NAME##_init) - 1);
+#include "NVLinkOpts.inc"
+#undef PREFIX
+
+static constexpr OptTable::Info InfoTable[] = {
+#define OPTION(...) LLVM_CONSTRUCT_OPT_INFO(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+#undef OPTION
+};
+
+class WrapperOptTable : public opt::GenericOptTable {
+public:
+ WrapperOptTable() : opt::GenericOptTable(InfoTable) {}
+};
+
+const OptTable &getOptTable() {
+ static const WrapperOptTable *Table = []() {
+ auto Result = std::make_unique<WrapperOptTable>();
+ return Result.release();
+ }();
+ return *Table;
+}
+
+[[noreturn]] void reportError(Error E) {
+ outs().flush();
+ logAllUnhandledErrors(std::move(E), WithColor::error(errs(), Executable));
+ exit(EXIT_FAILURE);
+}
+
+void diagnosticHandler(const DiagnosticInfo &DI) {
+ std::string ErrStorage;
+ raw_string_ostream OS(ErrStorage);
+ DiagnosticPrinterRawOStream DP(OS);
+ DI.print(DP);
+
+ switch (DI.getSeverity()) {
+ case DS_Error:
+ WithColor::error(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Warning:
+ WithColor::warning(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Note:
+ WithColor::note(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Remark:
+ WithColor::remark(errs()) << ErrStorage << "\n";
+ break;
+ }
+}
+
+Expected<StringRef> createTempFile(const ArgList &Args, const Twine &Prefix,
+ StringRef Extension) {
+ SmallString<128> OutputFile;
+ if (Args.hasArg(OPT_save_temps)) {
+ (Prefix + "." + Extension).toNullTerminatedStringRef(OutputFile);
+ } else {
+ if (std::error_code EC =
+ sys::fs::createTemporaryFile(Prefix, Extension, OutputFile))
+ return createFileError(OutputFile, EC);
+ }
+
+ TempFiles.emplace_back(std::move(OutputFile));
+ return TempFiles.back();
+}
+
+Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
+ ErrorOr<std::string> Path = sys::findProgramByName(Name, Paths);
+ if (!Path)
+ Path = sys::findProgramByName(Name);
+ if (!Path)
+ return createStringError(Path.getError(),
+ "Unable to find '" + Name + "' in path");
+ return *Path;
+}
+
+std::optional<std::string> findFile(StringRef Dir, StringRef Root,
+ const Twine &Name) {
+ SmallString<128> Path;
+ if (Dir.starts_with("="))
+ sys::path::append(Path, Root, Dir.substr(1), Name);
+ else
+ sys::path::append(Path, Dir, Name);
+
+ if (sys::fs::exists(Path))
+ return static_cast<std::string>(Path);
+ return std::nullopt;
+}
+
+std::optional<std::string>
+findFromSearchPaths(StringRef Name, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ for (StringRef Dir : SearchPaths)
+ if (std::optional<std::string> File = findFile(Dir, Root, Name))
+ return File;
+ return std::nullopt;
+}
+
+std::optional<std::string>
+searchLibraryBaseName(StringRef Name, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ for (StringRef Dir : SearchPaths)
+ if (std::optional<std::string> File =
+ findFile(Dir, Root, "lib" + Name + ".a"))
+ return File;
+ return std::nullopt;
+}
+
+/// Search for static libraries in the linker's library path given input like
+/// `-lfoo` or `-l:libfoo.a`.
+std::optional<std::string> searchLibrary(StringRef Input, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ if (Input.starts_with(":") || Input.ends_with(".lib"))
+ return findFromSearchPaths(Input.drop_front(), Root, SearchPaths);
+ return searchLibraryBaseName(Input, Root, SearchPaths);
+}
+
+void printCommands(ArrayRef<StringRef> CmdArgs) {
+ if (CmdArgs.empty())
+ return;
+
+ llvm::errs() << " \"" << CmdArgs.front() << "\" ";
+ for (auto IC = std::next(CmdArgs.begin()), IE = CmdArgs.end(); IC != IE; ++IC)
+ llvm::errs() << *IC << (std::next(IC) != IE ? " " : "\n");
+}
+
+/// A minimum symbol interface that provides the necessary information to
+/// extract archive members and resolve LTO symbols.
+struct Symbol {
+ enum Flags {
+ None = 0,
+ Undefined = 1 << 0,
+ Weak = 1 << 1,
+ };
+
+ Symbol()
+ : File(), Flags(Undefined), Name(), UsedInRegularObj(false), Lazy(false) {
+ }
+
+ Symbol(MemoryBufferRef File, const irsymtab::Reader::SymbolRef Sym, bool Lazy)
+ : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+ if (Sym.isUndefined())
+ Flags |= Undefined;
+ if (Sym.isWeak())
+ Flags |= Weak;
+ Name = Sym.getName();
+ }
+
+ Symbol(MemoryBufferRef File, const SymbolRef Sym, bool Lazy)
+ : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+ auto FlagsOrErr = Sym.getFlags();
+ if (!FlagsOrErr)
+ reportError(FlagsOrErr.takeError());
+ if (*FlagsOrErr & SymbolRef::SF_Undefined)
+ Flags |= Undefined;
+ if (*FlagsOrErr & SymbolRef::SF_Weak)
+ Flags |= Weak;
+
+ auto NameOrErr = Sym.getName();
+ if (!NameOrErr)
+ reportError(NameOrErr.takeError());
+ Name = *NameOrErr;
+ }
+
+ Symbol Resolve(Symbol Other) {
+ if (File.getBuffer().empty())
+ return Other.Lazy ? *this : Other;
+ if (Other.isUndefined())
+ return *this;
+ if (isWeak() && isUndefined() && Other.Lazy)
+ return *this;
+ if (isWeak() && !Other.isWeak())
+ return Other;
+ if (isUndefined() && !Other.isUndefined())
+ return Other;
+ return *this;
+ }
+
+ bool isWeak() const { return Flags & Weak; }
+ bool isUndefined() const { return Flags & Undefined; }
+
+ MemoryBufferRef File;
+ uint32_t Flags;
+ StringRef Name;
+ bool UsedInRegularObj;
+ bool Lazy;
+};
+
+Expected<StringRef> runPTXAs(StringRef File, const ArgList &Args) {
+ std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str();
+ Expected<std::string> PTXAsPath = findProgram("ptxas", {CudaPath + "/bin"});
+ if (!PTXAsPath)
+ return PTXAsPath.takeError();
+
+ auto TempFileOrErr =
+ creat...
[truncated]
|
@llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: While NVIDIA provides a linker called 'nvlink', its main interface is The main reason I want to re-intorudce this tool is because I am
Linking in these libraries will then simply require passing In the future we may be able to convince NVIDIA to port their linker to Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96561.diff 8 Files Affected:
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 2dfc7457b0ac7..54724cc1ad08e 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -461,13 +461,6 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("--output-file");
std::string OutputFileName = TC.getInputFilename(Output);
- // If we are invoking `nvlink` internally we need to output a `.cubin` file.
- // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
- if (!C.getInputArgs().getLastArg(options::OPT_c)) {
- SmallString<256> Filename(Output.getFilename());
- llvm::sys::path::replace_extension(Filename, "cubin");
- OutputFileName = Filename.str();
- }
if (Output.isFilename() && OutputFileName != Output.getFilename())
C.addTempFile(Args.MakeArgString(OutputFileName));
@@ -618,6 +611,11 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
// Add standard library search paths passed on the command line.
Args.AddAllArgs(CmdArgs, options::OPT_L);
getToolChain().AddFilePathLibArgs(Args, CmdArgs);
+ AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs, JA);
+
+ if (C.getDriver().isUsingLTO())
+ addLTOOptions(getToolChain(), Args, CmdArgs, Output, Inputs[0],
+ C.getDriver().getLTOMode() == LTOK_Thin);
// Add paths for the default clang library path.
SmallString<256> DefaultLibPath =
@@ -625,51 +623,12 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
llvm::sys::path::append(DefaultLibPath, CLANG_INSTALL_LIBDIR_BASENAME);
CmdArgs.push_back(Args.MakeArgString(Twine("-L") + DefaultLibPath));
- for (const auto &II : Inputs) {
- if (II.getType() == types::TY_LLVM_IR || II.getType() == types::TY_LTO_IR ||
- II.getType() == types::TY_LTO_BC || II.getType() == types::TY_LLVM_BC) {
- C.getDriver().Diag(diag::err_drv_no_linker_llvm_support)
- << getToolChain().getTripleString();
- continue;
- }
-
- // The 'nvlink' application performs RDC-mode linking when given a '.o'
- // file and device linking when given a '.cubin' file. We always want to
- // perform device linking, so just rename any '.o' files.
- // FIXME: This should hopefully be removed if NVIDIA updates their tooling.
- if (II.isFilename()) {
- auto InputFile = getToolChain().getInputFilename(II);
- if (llvm::sys::path::extension(InputFile) != ".cubin") {
- // If there are no actions above this one then this is direct input and
- // we can copy it. Otherwise the input is internal so a `.cubin` file
- // should exist.
- if (II.getAction() && II.getAction()->getInputs().size() == 0) {
- const char *CubinF =
- Args.MakeArgString(getToolChain().getDriver().GetTemporaryPath(
- llvm::sys::path::stem(InputFile), "cubin"));
- if (llvm::sys::fs::copy_file(InputFile, C.addTempFile(CubinF)))
- continue;
-
- CmdArgs.push_back(CubinF);
- } else {
- SmallString<256> Filename(InputFile);
- llvm::sys::path::replace_extension(Filename, "cubin");
- CmdArgs.push_back(Args.MakeArgString(Filename));
- }
- } else {
- CmdArgs.push_back(Args.MakeArgString(InputFile));
- }
- } else if (!II.isNothing()) {
- II.getInputArg().renderAsInput(Args, CmdArgs);
- }
- }
-
C.addCommand(std::make_unique<Command>(
JA, *this,
ResponseFileSupport{ResponseFileSupport::RF_Full, llvm::sys::WEM_UTF8,
"--options-file"},
- Args.MakeArgString(getToolChain().GetProgramPath("nvlink")), CmdArgs,
- Inputs, Output));
+ Args.MakeArgString(getToolChain().GetProgramPath("clang-nvlink-wrapper")),
+ CmdArgs, Inputs, Output));
}
void NVPTX::getNVPTXTargetFeatures(const Driver &D, const llvm::Triple &Triple,
@@ -949,11 +908,7 @@ std::string CudaToolChain::getInputFilename(const InputInfo &Input) const {
if (Input.getType() != types::TY_Object || getDriver().offloadDeviceOnly())
return ToolChain::getInputFilename(Input);
- // Replace extension for object files with cubin because nvlink relies on
- // these particular file names.
- SmallString<256> Filename(ToolChain::getInputFilename(Input));
- llvm::sys::path::replace_extension(Filename, "cubin");
- return std::string(Filename);
+ return ToolChain::getInputFilename(Input);
}
llvm::opt::DerivedArgList *
diff --git a/clang/lib/Driver/ToolChains/Cuda.h b/clang/lib/Driver/ToolChains/Cuda.h
index 43c17ba7c0ba0..0735c36f116bc 100644
--- a/clang/lib/Driver/ToolChains/Cuda.h
+++ b/clang/lib/Driver/ToolChains/Cuda.h
@@ -155,6 +155,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXToolChain : public ToolChain {
bool isPIEDefault(const llvm::opt::ArgList &Args) const override {
return false;
}
+ bool HasNativeLLVMSupport() const override { return true; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
@@ -192,6 +193,8 @@ class LLVM_LIBRARY_VISIBILITY CudaToolChain : public NVPTXToolChain {
return &HostTC.getTriple();
}
+ bool HasNativeLLVMSupport() const override { return false; }
+
std::string getInputFilename(const InputInfo &Input) const override;
llvm::opt::DerivedArgList *
diff --git a/clang/test/Driver/cuda-cross-compiling.c b/clang/test/Driver/cuda-cross-compiling.c
index 1dc4520f485db..f839c36d23e51 100644
--- a/clang/test/Driver/cuda-cross-compiling.c
+++ b/clang/test/Driver/cuda-cross-compiling.c
@@ -32,8 +32,8 @@
// RUN: | FileCheck -check-prefix=ARGS %s
// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
-// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
-// ARGS-NEXT: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+// ARGS-NEXT: ptxas{{.*}}"-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].o" "[[PTX]].s" "-c"
+// ARGS-NEXT: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].o"
//
// Test the generated arguments to the CUDA binary utils when targeting NVPTX.
@@ -55,7 +55,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -march=sm_61 -### %t.o 2>&1 \
// RUN: | FileCheck -check-prefix=LINK %s
-// LINK: nvlink{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.cubin"
+// LINK: clang-nvlink-wrapper{{.*}}"-o" "a.out" "-arch" "sm_61" {{.*}} "{{.*}}.o"
//
// Test to ensure that we enable handling global constructors in a freestanding
@@ -72,7 +72,7 @@
// RUN: %clang -target nvptx64-nvidia-cuda -Wl,-v -Wl,a,b -march=sm_52 -### %s 2>&1 \
// RUN: | FileCheck -check-prefix=LINKER-ARGS %s
-// LINKER-ARGS: nvlink{{.*}}"-v"{{.*}}"a" "b"
+// LINKER-ARGS: clang-nvlink-wrapper{{.*}}"-v"{{.*}}"a" "b"
// Tests for handling a missing architecture.
//
diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c
new file mode 100644
index 0000000000000..488bcd062467f
--- /dev/null
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -0,0 +1,64 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+#if defined(X)
+extern int y;
+int foo() { return y; }
+
+int x = 0;
+#elif defined(Y)
+int y = 42;
+#elif defined(Z)
+int z = 42;
+#elif defined(W)
+int w = 42;
+#elif defined(U)
+extern int x;
+extern int y;
+extern int __attribute__((weak)) w;
+
+int bar() {
+ return x + y + w;
+}
+#else
+int __attribute__((visibility("hidden"))) x = 999;
+#endif
+
+// Create various inputs to test basic linking and LTO capabilities. Creating a
+// CUDA binary requires access to the `ptxas` executable, so we just use x64.
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DX -o %t-x.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DY -o %t-y.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DZ -o %t-z.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DW -o %t-w.o
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -DU -o %t-u.o
+// RUN: llvm-ar rcs %t-x.a %t-x.o
+// RUN: llvm-ar rcs %t-y.a %t-y.o
+// RUN: llvm-ar rcs %t-z.a %t-z.o
+// RUN: llvm-ar rcs %t-w.a %t-w.o
+
+//
+// Check that we forward any unrecognized argument to 'nvlink'.
+//
+// RUN: clang-nvlink-wrapper --dry-run -arch sm_52 %t-u.o -foo -o a.out 2>&1 \
+// RUN: | FileCheck %s --check-prefix=ARGS
+// ARGS: nvlink{{.*}} -arch sm_52 -foo -o a.out [[INPUT:.+]].cubin
+
+//
+// Check the symbol resolution for static archives. We expect to only link
+// `libx.a` and `liby.a` because extern weak symbols do not extract and `libz.a`
+// is not used at all.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t-x.a %t-u.o %t-y.a %t-z.a %t-w.a \
+// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LINK
+// LINK: nvlink{{.*}} -arch sm_52 -o a.out [[INPUT:.+]].cubin {{.*}}-x-{{.*}}.cubin{{.*}}-y-{{.*}}.cubin
+
+// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.o
+
+//
+// Check that the LTO interface works and properly preserves symbols used in a
+// regular object file.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
+// LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
+// LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
diff --git a/clang/tools/CMakeLists.txt b/clang/tools/CMakeLists.txt
index bdd8004be3e02..4885afc1584d0 100644
--- a/clang/tools/CMakeLists.txt
+++ b/clang/tools/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_subdirectory(clang-format-vs)
add_clang_subdirectory(clang-fuzzer)
add_clang_subdirectory(clang-import-test)
add_clang_subdirectory(clang-linker-wrapper)
+add_clang_subdirectory(clang-nvlink-wrapper)
add_clang_subdirectory(clang-offload-packager)
add_clang_subdirectory(clang-offload-bundler)
add_clang_subdirectory(clang-scan-deps)
diff --git a/clang/tools/clang-nvlink-wrapper/CMakeLists.txt b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
new file mode 100644
index 0000000000000..d46f66994cf39
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/CMakeLists.txt
@@ -0,0 +1,44 @@
+set(LLVM_LINK_COMPONENTS
+ ${LLVM_TARGETS_TO_BUILD}
+ BitWriter
+ Core
+ BinaryFormat
+ MC
+ Target
+ TransformUtils
+ Analysis
+ Passes
+ IRReader
+ Object
+ Option
+ Support
+ TargetParser
+ CodeGen
+ LTO
+ )
+
+set(LLVM_TARGET_DEFINITIONS NVLinkOpts.td)
+tablegen(LLVM NVLinkOpts.inc -gen-opt-parser-defs)
+add_public_tablegen_target(NVLinkWrapperOpts)
+
+if(NOT CLANG_BUILT_STANDALONE)
+ set(tablegen_deps intrinsics_gen NVLinkWrapperOpts)
+endif()
+
+add_clang_tool(clang-nvlink-wrapper
+ ClangNVLinkWrapper.cpp
+
+ DEPENDS
+ ${tablegen_deps}
+ )
+
+set(CLANG_NVLINK_WRAPPER_LIB_DEPS
+ clangBasic
+ )
+
+target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0")
+
+target_link_libraries(clang-nvlink-wrapper
+ PRIVATE
+ ${CLANG_NVLINK_WRAPPER_LIB_DEPS}
+ )
diff --git a/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
new file mode 100644
index 0000000000000..d48bf4e37ecfe
--- /dev/null
+++ b/clang/tools/clang-nvlink-wrapper/ClangNVLinkWrapper.cpp
@@ -0,0 +1,671 @@
+//===-- clang-nvlink-wrapper/ClangNVLinkWrapper.cpp - NVIDIA linker util --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===---------------------------------------------------------------------===//
+//
+// This tool wraps around the NVIDIA linker called 'nvlink'. The NVIDIA linker
+// is required to create NVPTX applications, but does not support common
+// features like LTO or archives. This utility wraps around the tool to cover
+// its deficiencies. This tool can be removed once NVIDIA improves their linker
+// or ports it to `ld.lld`.
+//
+//===---------------------------------------------------------------------===//
+
+#include "clang/Basic/Version.h"
+
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/BinaryFormat/Magic.h"
+#include "llvm/CodeGen/CommandFlags.h"
+#include "llvm/IR/DiagnosticPrinter.h"
+#include "llvm/LTO/LTO.h"
+#include "llvm/Object/Archive.h"
+#include "llvm/Object/ArchiveWriter.h"
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Object/IRObjectFile.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Option/ArgList.h"
+#include "llvm/Option/OptTable.h"
+#include "llvm/Option/Option.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileOutputBuffer.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/InitLLVM.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/StringSaver.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/WithColor.h"
+
+using namespace llvm;
+using namespace llvm::opt;
+using namespace llvm::object;
+
+static void printVersion(raw_ostream &OS) {
+ OS << clang::getClangToolFullVersion("clang-nvlink-wrapper") << '\n';
+}
+
+/// The value of `argv[0]` when run.
+static const char *Executable;
+
+/// Temporary files to be cleaned up.
+static SmallVector<SmallString<128>> TempFiles;
+
+/// Codegen flags for LTO backend.
+static codegen::RegisterCodeGenFlags CodeGenFlags;
+
+namespace {
+/// Must not overlap with llvm::opt::DriverFlag.
+enum WrapperFlags {
+ WrapperOnlyOption = (1 << 4), // Options only used by the linker wrapper.
+ DeviceOnlyOption = (1 << 5), // Options only used for device linking.
+};
+
+enum ID {
+ OPT_INVALID = 0, // This is not an option ID.
+#define OPTION(...) LLVM_MAKE_OPT_ID(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+ LastOption
+#undef OPTION
+};
+
+#define PREFIX(NAME, VALUE) \
+ static constexpr StringLiteral NAME##_init[] = VALUE; \
+ static constexpr ArrayRef<StringLiteral> NAME(NAME##_init, \
+ std::size(NAME##_init) - 1);
+#include "NVLinkOpts.inc"
+#undef PREFIX
+
+static constexpr OptTable::Info InfoTable[] = {
+#define OPTION(...) LLVM_CONSTRUCT_OPT_INFO(__VA_ARGS__),
+#include "NVLinkOpts.inc"
+#undef OPTION
+};
+
+class WrapperOptTable : public opt::GenericOptTable {
+public:
+ WrapperOptTable() : opt::GenericOptTable(InfoTable) {}
+};
+
+const OptTable &getOptTable() {
+ static const WrapperOptTable *Table = []() {
+ auto Result = std::make_unique<WrapperOptTable>();
+ return Result.release();
+ }();
+ return *Table;
+}
+
+[[noreturn]] void reportError(Error E) {
+ outs().flush();
+ logAllUnhandledErrors(std::move(E), WithColor::error(errs(), Executable));
+ exit(EXIT_FAILURE);
+}
+
+void diagnosticHandler(const DiagnosticInfo &DI) {
+ std::string ErrStorage;
+ raw_string_ostream OS(ErrStorage);
+ DiagnosticPrinterRawOStream DP(OS);
+ DI.print(DP);
+
+ switch (DI.getSeverity()) {
+ case DS_Error:
+ WithColor::error(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Warning:
+ WithColor::warning(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Note:
+ WithColor::note(errs(), Executable) << ErrStorage << "\n";
+ break;
+ case DS_Remark:
+ WithColor::remark(errs()) << ErrStorage << "\n";
+ break;
+ }
+}
+
+Expected<StringRef> createTempFile(const ArgList &Args, const Twine &Prefix,
+ StringRef Extension) {
+ SmallString<128> OutputFile;
+ if (Args.hasArg(OPT_save_temps)) {
+ (Prefix + "." + Extension).toNullTerminatedStringRef(OutputFile);
+ } else {
+ if (std::error_code EC =
+ sys::fs::createTemporaryFile(Prefix, Extension, OutputFile))
+ return createFileError(OutputFile, EC);
+ }
+
+ TempFiles.emplace_back(std::move(OutputFile));
+ return TempFiles.back();
+}
+
+Expected<std::string> findProgram(StringRef Name, ArrayRef<StringRef> Paths) {
+ ErrorOr<std::string> Path = sys::findProgramByName(Name, Paths);
+ if (!Path)
+ Path = sys::findProgramByName(Name);
+ if (!Path)
+ return createStringError(Path.getError(),
+ "Unable to find '" + Name + "' in path");
+ return *Path;
+}
+
+std::optional<std::string> findFile(StringRef Dir, StringRef Root,
+ const Twine &Name) {
+ SmallString<128> Path;
+ if (Dir.starts_with("="))
+ sys::path::append(Path, Root, Dir.substr(1), Name);
+ else
+ sys::path::append(Path, Dir, Name);
+
+ if (sys::fs::exists(Path))
+ return static_cast<std::string>(Path);
+ return std::nullopt;
+}
+
+std::optional<std::string>
+findFromSearchPaths(StringRef Name, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ for (StringRef Dir : SearchPaths)
+ if (std::optional<std::string> File = findFile(Dir, Root, Name))
+ return File;
+ return std::nullopt;
+}
+
+std::optional<std::string>
+searchLibraryBaseName(StringRef Name, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ for (StringRef Dir : SearchPaths)
+ if (std::optional<std::string> File =
+ findFile(Dir, Root, "lib" + Name + ".a"))
+ return File;
+ return std::nullopt;
+}
+
+/// Search for static libraries in the linker's library path given input like
+/// `-lfoo` or `-l:libfoo.a`.
+std::optional<std::string> searchLibrary(StringRef Input, StringRef Root,
+ ArrayRef<StringRef> SearchPaths) {
+ if (Input.starts_with(":") || Input.ends_with(".lib"))
+ return findFromSearchPaths(Input.drop_front(), Root, SearchPaths);
+ return searchLibraryBaseName(Input, Root, SearchPaths);
+}
+
+void printCommands(ArrayRef<StringRef> CmdArgs) {
+ if (CmdArgs.empty())
+ return;
+
+ llvm::errs() << " \"" << CmdArgs.front() << "\" ";
+ for (auto IC = std::next(CmdArgs.begin()), IE = CmdArgs.end(); IC != IE; ++IC)
+ llvm::errs() << *IC << (std::next(IC) != IE ? " " : "\n");
+}
+
+/// A minimum symbol interface that provides the necessary information to
+/// extract archive members and resolve LTO symbols.
+struct Symbol {
+ enum Flags {
+ None = 0,
+ Undefined = 1 << 0,
+ Weak = 1 << 1,
+ };
+
+ Symbol()
+ : File(), Flags(Undefined), Name(), UsedInRegularObj(false), Lazy(false) {
+ }
+
+ Symbol(MemoryBufferRef File, const irsymtab::Reader::SymbolRef Sym, bool Lazy)
+ : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+ if (Sym.isUndefined())
+ Flags |= Undefined;
+ if (Sym.isWeak())
+ Flags |= Weak;
+ Name = Sym.getName();
+ }
+
+ Symbol(MemoryBufferRef File, const SymbolRef Sym, bool Lazy)
+ : File(File), Flags(0), UsedInRegularObj(false), Lazy(Lazy) {
+ auto FlagsOrErr = Sym.getFlags();
+ if (!FlagsOrErr)
+ reportError(FlagsOrErr.takeError());
+ if (*FlagsOrErr & SymbolRef::SF_Undefined)
+ Flags |= Undefined;
+ if (*FlagsOrErr & SymbolRef::SF_Weak)
+ Flags |= Weak;
+
+ auto NameOrErr = Sym.getName();
+ if (!NameOrErr)
+ reportError(NameOrErr.takeError());
+ Name = *NameOrErr;
+ }
+
+ Symbol Resolve(Symbol Other) {
+ if (File.getBuffer().empty())
+ return Other.Lazy ? *this : Other;
+ if (Other.isUndefined())
+ return *this;
+ if (isWeak() && isUndefined() && Other.Lazy)
+ return *this;
+ if (isWeak() && !Other.isWeak())
+ return Other;
+ if (isUndefined() && !Other.isUndefined())
+ return Other;
+ return *this;
+ }
+
+ bool isWeak() const { return Flags & Weak; }
+ bool isUndefined() const { return Flags & Undefined; }
+
+ MemoryBufferRef File;
+ uint32_t Flags;
+ StringRef Name;
+ bool UsedInRegularObj;
+ bool Lazy;
+};
+
+Expected<StringRef> runPTXAs(StringRef File, const ArgList &Args) {
+ std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str();
+ Expected<std::string> PTXAsPath = findProgram("ptxas", {CudaPath + "/bin"});
+ if (!PTXAsPath)
+ return PTXAsPath.takeError();
+
+ auto TempFileOrErr =
+ creat...
[truncated]
|
@Artem-B asked me to review nvptx patches while he's OOO, but this one is pretty far outside my depth. Are you OK waiting until he's back? I don't know exactly when that will be, but based on his IMs to me, he should be back early July. |
No problem, I knew that it would probably take awhile to get reviewed given the size. I believe he said he'd be back early July as well, so maybe next week? It'd probably require his input, along with some of the other interested parties in clang to see how they feel about reviving one of these old tools. (However if you know anything about the NVPTX varargs API I think #96015 is mostly just waiting for someone to say that it's a mostly correct lowering) |
@MaskRay So, I think my symbol resolution is (unsurprisingly) subtly broken. Is there a canonical way to handle this? I first thought that we could simply perform the symbol resolutions as normal for every file, but keep track of which symbols were "lazy". However, I couldn't figure out how to then tell if a lazy symbol should be extracted or not because there's no information on which files use which symbols. Maybe I just scan all the files and see if they reference a symbol that's marked defined and lazy? |
859f6a7
to
849c8da
Compare
Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in llvm#96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on llvm#96561
Re-did it and tested it against |
2bb5bd0
to
2d3957a
Compare
Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm#96561.
Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. Address comments
@jhuber6 this change seems to be causing build failures on some bots:
Can you take a look? |
Nevermind, I just noticed you already fixed it. Sorry for the noise! |
Gah, sorry forgot to suppress that error when it's just running tests. I'll have it fixed in a few minutes or revert it if I can't fix it fast enough. |
https://lab.llvm.org/buildbot/#/builders/174/builds/2130 Actually I noticed (and I think you already did as well), that there seems to be a test failure of nvwrapper.c after you fixed the build error. Just wanted to let you know in case you were not already aware. |
Yep, it's caused by the program lookup not finding |
clang/test/CMakeLists.txt CLANG_TEST_DEPS also needs an update. ` The test might fail due to |
Also noticed that one, just pushed a fix a minute ago. Sorry for the mess. |
@jhuber6 it looks like there still might be build failures on Windows:
|
clangBasic | ||
) | ||
|
||
target_compile_options(clang-nvlink-wrapper PRIVATE "-g" "-O0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. Do we actually want to build this with -g -O0
all the time or was this left in from debugging or something like that? In the unlikely event that we do want this for some reason, it won't work as is on windows anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, thanks for pointing that out.
Why do we need a new binary for this, instead of having something like And if there's a good reason for that, could clang-linker-wrapper and clang-nvlink-wrapper at least be the same binary? |
Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in llvm#96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on llvm#96561
Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in #96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on #96561
@jhuber6 ,
would you fix the problem or revert the changes? |
Looks like it's complaining about the const-ness of the value decomposition? It's fine one my end so I don't know if it's a Windows thing, I'll just try putting |
Thank you @jhuber6 |
Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm#96561.
Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote #96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on #96561.
Do we have existing precedents for such built-in tools, other than cc1 itself? If the linker wrapper can be part of clang itself, it would make some things easier logistically. |
|
||
// Various tools (e.g., llc and opt) duplicate this series of declarations for | ||
// options related to passes and remarks. | ||
static cl::opt<bool> RemarksWithHotness( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new user-facing tools, ideally use OptTable instead of cl::opt. The latter has lots of minor issues, e.g. not differentiating single/double-dash options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, most everything uses the OptTable interface, but there's a few options like these which each tool individually seems to define that interface w/ LLVM. It's pretty much required to give the linker similar semantics to opt
when you pass -mllvm
to it I believe.
Sorry, missed this in the deluge of build failures. I don't think this really fits with a different |
) Summary: The `clang-nvlink-wrapper` is a utility that I removed awhile back during the transition to the new driver. This patch adds back in a new, upgraded version that does LTO + archive linking. It's not an easy choice to reintroduce something I happily deleted, but this is the only way to move forward with improving GPU support in LLVM. While NVIDIA provides a linker called 'nvlink', its main interface is very difficult to work with. It does not provide LTO, or static linking, requires all files to be named a non-standard `.cubin`, and rejects link jobs that other linkers would be fine with (i.e empty). I have spent a great deal of time hacking around this in the GPU `libc` implementation, where I deliberately avoid LTO and static linking and have about 100 lines of hacky CMake dedicated to storing these files in a format that the clang-linker-wrapper accepts to avoid this limitation. The main reason I want to re-intorudce this tool is because I am planning on creating a more standard C/C++ toolchain for GPUs to use. This will install files like the following. ``` <install>/lib/nvptx64-nvidia-cuda/libc.a <install>/lib/nvptx64-nvidia-cuda/libc++.a <install>/lib/nvptx64-nvidia-cuda/libomp.a <install>/lib/clang/19/lib/nvptx64-nvidia-cuda/libclang_rt.builtins.a ``` Linking in these libraries will then simply require passing `-lc` like is already done for non-GPU toolchains. However, this doesn't work with the currently deficient `nvlink` linker, so I consider this a blocking issue to massively improving the state of building GPU libraries. In the future we may be able to convince NVIDIA to port their linker to `ld.lld`, but for now this is the only workable solution that allows us to hack around the weird behavior of their closed-source software. This also copies some amount of logic from the clang-linker-wrapper, but not enough for it to be worthwhile to merge them I feel. In the future it may be possible to delete that handling from there entirely. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251377
Summary: Currently we have several hacks to work around the fact that the NVPTX linker, 'nvlink', does not support static libraries or LTO linking. The patch in #96561 introduces a wrapper in the toolchain that allows us to use a standard `ld.lld` like interface. This means all the divergence with this target can be removed. Depends on #96561
Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote #96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on #96561.
Introduced in llvm/llvm-project#96561. I've added the file to clang-tools-extra, as we also put many similar binaries like clang-linker-wrapper in there.
Summary:
The
clang-nvlink-wrapper
is a utility that I removed awhile backduring the transition to the new driver. This patch adds back in a new,
upgraded version that does LTO + archive linking. It's not an easy
choice to reintroduce something I happily deleted, but this is the only
way to move forward with improving GPU support in LLVM.
While NVIDIA provides a linker called 'nvlink', its main interface is
very difficult to work with. It does not provide LTO, or static linking,
requires all files to be named a non-standard
.cubin
, and rejects linkjobs that other linkers would be fine with (i.e empty). I have spent a
great deal of time hacking around this in the GPU
libc
implementation,where I deliberately avoid LTO and static linking and have about 100
lines of hacky CMake dedicated to storing these files in a format that
the clang-linker-wrapper accepts to avoid this limitation.
The main reason I want to re-intorudce this tool is because I am
planning on creating a more standard C/C++ toolchain for GPUs to use.
This will install files like the following.
Linking in these libraries will then simply require passing
-lc
likeis already done for non-GPU toolchains. However, this doesn't work with
the currently deficient
nvlink
linker, so I consider this a blockingissue to massively improving the state of building GPU libraries.
In the future we may be able to convince NVIDIA to port their linker to
ld.lld
, but for now this is the only workable solution that allows usto hack around the weird behavior of their closed-source software.
This also copies some amount of logic from the clang-linker-wrapper,
but not enough for it to be worthwhile to merge them I feel. In the
future it may be possible to delete that handling from there entirely.