Skip to content

[libclc] Move __clc_ldexp to CLC library #126078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2025

Conversation

frasercrmck
Copy link
Contributor

This function was already conceptually in the CLC namespace - this just formally moves it over.

Note however that this commit marks a change in how libclc functions may be overridden by targets.

Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety.

This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro.

One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's not interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use.

We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granularity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted.

Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.

@frasercrmck frasercrmck added the libclc libclc OpenCL library label Feb 6, 2025
@frasercrmck frasercrmck requested a review from arsenm February 6, 2025 14:54
@arsenm
Copy link
Contributor

arsenm commented Feb 6, 2025

The amdgpu customization should be removed. This is legacy code from before we had the ldexp intrinsic. Regular __builtin_ldexp* should work now

@frasercrmck
Copy link
Contributor Author

The amdgpu customization should be removed. This is legacy code from before we had the ldexp intrinsic. Regular __builtin_ldexp* should work now

Good to know, thanks. No one else in libclc is using __builtin_ldexp, so in its current form it'd still have to be some sort of a customization. Everyone is opting for the software implementation.

We could have __builtin_ldexp* in the default implementation, but guard it behind a macro that only AMD defines. Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

@arsenm
Copy link
Contributor

arsenm commented Feb 6, 2025

Good to know, thanks. No one else in libclc is using __builtin_ldexp, so in its current form it'd still have to be some sort of a customization. Everyone is opting for the software implementation.

Really we just want to get to llvm.ldexp. It's unfortunate we're stuck routing through these old GCC builtins that have any libm baggage associated with them

Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

Yes

@frasercrmck
Copy link
Contributor Author

Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

Yes

I'll go add that builtin and come back to this - thanks.

What do you think about libclc making use of weak linkage in principle?

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 6, 2025

I'll go add that builtin and come back to this - thanks.

What do you think about libclc making use of weak linkage in principle?

Weak should be handled properly by the targets, the one downside is that it prevents optimization within a TU because the symbol's initializer could change, but LTO / noRDC-mode compilation makes that irrelevant. Also, even though NVPTX is an ELF platform it doesn't handle extern weak properly (It's supposed to be defined to zero if not defined). I think using weak is fine in the sense that it's well supported by LLVM and the GPU targets, but I don't know OpenCL's perspective.

Also completely unrelated question since you seem to know about libclc. It seems to require clang to be compiled, but can't be built through the runtimes interface. Do you know how difficult it would be to make that work? I'm somewhat interested in having it work through the compilation steps I use for other GPU libraries like compiler-rt, libc, libcxx, libcxxabi, and soon openmp.

Basically it would look like this.

-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='libc;libclc

That would create a separate CMake invocation that would run with the https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_TARGET.html set to amdgcn-amd-amdhsa (or whatever triple you prefer.) So you'd be doing one job with only the AMD triple set. I don't know where the libraries get stored, but by default they'd go in lib/amdgcn-amd-amdhsa which clang knows where to find.

@frasercrmck
Copy link
Contributor Author

I'll go add that builtin and come back to this - thanks.
What do you think about libclc making use of weak linkage in principle?

Weak should be handled properly by the targets, the one downside is that it prevents optimization within a TU because the symbol's initializer could change, but LTO / noRDC-mode compilation makes that irrelevant. Also, even though NVPTX is an ELF platform it doesn't handle extern weak properly (It's supposed to be defined to zero if not defined). I think using weak is fine in the sense that it's well supported by LLVM and the GPU targets, but I don't know OpenCL's perspective.

Thanks for the information. I don't think we rely on intra-TU optimizations too heavily as we generally build each builtin function in isolation. As for what OpenCL thinks, I suspect it's left loose in the specification. It "works" as far as clang's concerned. The 'weak' CLC functions will generally always be inlined into a regular non-weak user-facing builtin at the interface layer.

Also completely unrelated question since you seem to know about libclc. It seems to require clang to be compiled, but can't be built through the runtimes interface. Do you know how difficult it would be to make that work? I'm somewhat interested in having it work through the compilation steps I use for other GPU libraries like compiler-rt, libc, libcxx, libcxxabi, and soon openmp.

Basically it would look like this.

-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='libc;libclc

That would create a separate CMake invocation that would run with the https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_TARGET.html set to amdgcn-amd-amdhsa (or whatever triple you prefer.) So you'd be doing one job with only the AMD triple set. I don't know where the libraries get stored, but by default they'd go in lib/amdgcn-amd-amdhsa which clang knows where to find.

I hadn't heard of this mechanism, thanks for pointing it out. It makes complete sense to me to allow this. I will take a look at it - without knowing how it works it's hard to say whether it'll be a lot of work.

Regarding triples, libclc doesn't currently have a amdgcn-amd-amdhsa target, per se. It's got amdgcn-- or amdgcn--amdhsa. Again I don't know how the runtimes triples work and how strict the triple matching would be or if libclc would have to massage triples into its own internal set. If a triple isn't found, like x86_64-unknown-linux-gnu is it silently ignored, warned, errored?

As for where the libraries are placed, it's currently not at all integrated with clang. They're currently in, e.g., build/tools/libclc/amdgcn--amdhsa.bc. There's no in-tree use of libclc, which I'd like to change. I know that @arsenm has been proposing that the libraries are made known to clang but I've not had the chance to look into that either.

@arsenm
Copy link
Contributor

arsenm commented Feb 13, 2025

We currently don't do anything with the vendor. i.e. amdgcn-amd-amdhsa and amdgcn--amdhsa are identical. I believe spirv-amd-... does something since recently in the clang driver

@arsenm
Copy link
Contributor

arsenm commented Feb 13, 2025

As for where the libraries are placed, it's currently not at all integrated with clang.

I think the correct place is the clang resource directory

@frasercrmck
Copy link
Contributor Author

Oh, also for context, there was #124709 a few weeks ago. So it appears that there are other people interested in moving libclc to the runtimes infrastructure.

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 13, 2025

I think the correct place is the clang resource directory

Tough to say, the compiler resource directory is for things the compiler emits as a part of the language, yet we don't store libomp there. I'd say it makes sense, if you had lib/clang/21/lib/amdgcn-amd-amdhsa/libclang_rt.clc.a it would be in-line with things like builtins and instrumentation.

Regarding triples, libclc doesn't currently have a amdgcn-amd-amdhsa target, per se. It's got amdgcn-- or amdgcn--amdhsa. Again I don't know how the runtimes triples work and how strict the triple matching would be or if libclc would have to massage triples into its own internal set. If a triple isn't found, like x86_64-unknown-linux-gnu is it silently ignored, warned, errored?

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--. The one problem I see is that if we do the per-target install directories, then amdgcn-- and amdgcn-amd-amdhsa will have different directories. I wonder if we could just unify those somehow, or just accept amdgcn-amd-amdhsa in libclc as the same thing.

Oh, also for context, there was #124709 a few weeks ago. So it appears that there are other people interested in moving libclc to the runtimes infrastructure.

Yep, part of why I'm bringing it up. So, the runtimes build is basically a CMake ExternalProject that points to the project. The file ./runtimes/CMakeLists.txt is the entry point for that build. So, the llvm/runtimes/CMakeListst.txt basically creates sub-jobs by pointing them to ./runtimes/CMakeLists.txt. This is mostly helpful because it guarantees you're going to be using an up-to-date clang compiler for the build.

Right now what libclc does would be pretty easy to port to a normal LLVM_ENABLE_RUNTIMES=libclc build, because we currently create a few jobs and compile them with --target=. So, that should likely still work. You can also do cross-compiling through the runtimes interface, which is my request. So likely we'd have LLVM_ENABLE_RUNTIMES=libclc work as it does now, But also for -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc to basically just enable amdgcn-- currently.

I have an unfortunate amount of experience mangling the runtimes builds, so if you need help feel free to ask.

@arsenm
Copy link
Contributor

arsenm commented Feb 13, 2025

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--

These are not equivalent

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 13, 2025

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--

These are not equivalent

That's something I've been meaning to ask, the difference I know of right now is that amdgcn-- will not create a ROCm toolchain so we don't do any of that. However, I don't know to what extent the hsa in the triple matters, because there's certain builtins that are tied to the implicit arguments. I never knew if that was an hsa thing or we considered the code object universal to amdgcn--. I mostly just meant that IR compiled with amdgcn-- should be able to link with the HSA os version.

@arsenm
Copy link
Contributor

arsenm commented Feb 13, 2025

amdgcn-- is whatever mesa was doing many years ago, and amdgcn-mesa-mesa3d is "almost HSA" but not quite. The kernel inputs are different, and that should propagate through the entire function Abi as well. That part was probably never implemented though since clover wasn't maintained

@frasercrmck
Copy link
Contributor Author

An update on this: in adding a clang elementwise builtin for ldexp (and for clz/ctz) I found it necessary to first refactor the clang diagnostics. I opened a PR for it (#125673) which isn't getting much traction so that's slowing things down.

Unfortunately lots of other functions rely on ldexp so there's a bottleneck here.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an OK temporary step

This function was already conceptually in the CLC namespace - this just
formally moves it over.

Note however that this commit marks a change in how libclc functions may
be overridden by targets.

Until now we have been using a purely build-system-based approach where
targets could register identically-named files which took responsibility
for the implementation of the builtin in its entirety.

This system wasn't well equipped to deal with AMD's overriding of
__clc_ldexp for only a subset of types, and furthermore conditionally on
a pre-defined macro.

One option for handling this would be to require AMD to duplicate code
for the versions of __clc_ldexp it's *not* interested in overriding. We
could also make it easier for targets to re-define CLC functions through
macros or .inc files. Both of these have obvious downsides. We could
also keep AMD's overriding in the OpenCL layer and bypass CLC
altogether, but this has limited use.

We could use weak linkage on the "base" implementations of CLC
functions, and allow targets to opt-in to providing their own
implementations on a much finer granulaity. This commit supports this as
a proof of concept; we could expand it to all CLC builtins if accepted.

Note that the existing filename-based "claiming" approach is still in
effect, so targets have to name their overrides differently to have both
files compiled. This could also be refined.
@frasercrmck frasercrmck merged commit d5038b3 into llvm:main Feb 26, 2025
11 checks passed
@frasercrmck frasercrmck deleted the libclc-clc-ldexp branch February 26, 2025 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants