[libclc] Move __clc_ldexp to CLC library #126078

frasercrmck · 2025-02-06T14:54:05Z

This function was already conceptually in the CLC namespace - this just formally moves it over.

Note however that this commit marks a change in how libclc functions may be overridden by targets.

Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety.

This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro.

One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's not interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use.

We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granularity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted.

Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.

arsenm · 2025-02-06T15:07:21Z

The amdgpu customization should be removed. This is legacy code from before we had the ldexp intrinsic. Regular __builtin_ldexp* should work now

frasercrmck · 2025-02-06T15:23:24Z

The amdgpu customization should be removed. This is legacy code from before we had the ldexp intrinsic. Regular __builtin_ldexp* should work now

Good to know, thanks. No one else in libclc is using __builtin_ldexp, so in its current form it'd still have to be some sort of a customization. Everyone is opting for the software implementation.

We could have __builtin_ldexp* in the default implementation, but guard it behind a macro that only AMD defines. Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

arsenm · 2025-02-06T15:26:06Z

Good to know, thanks. No one else in libclc is using __builtin_ldexp, so in its current form it'd still have to be some sort of a customization. Everyone is opting for the software implementation.

Really we just want to get to llvm.ldexp. It's unfortunate we're stuck routing through these old GCC builtins that have any libm baggage associated with them

Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

Yes

frasercrmck · 2025-02-06T16:41:24Z

Would we ideally have an __builtin_elementwise_ldexp to target the vector intrinsics directly?

Yes

I'll go add that builtin and come back to this - thanks.

What do you think about libclc making use of weak linkage in principle?

jhuber6 · 2025-02-06T18:01:03Z

I'll go add that builtin and come back to this - thanks.

What do you think about libclc making use of weak linkage in principle?

Weak should be handled properly by the targets, the one downside is that it prevents optimization within a TU because the symbol's initializer could change, but LTO / noRDC-mode compilation makes that irrelevant. Also, even though NVPTX is an ELF platform it doesn't handle extern weak properly (It's supposed to be defined to zero if not defined). I think using weak is fine in the sense that it's well supported by LLVM and the GPU targets, but I don't know OpenCL's perspective.

Also completely unrelated question since you seem to know about libclc. It seems to require clang to be compiled, but can't be built through the runtimes interface. Do you know how difficult it would be to make that work? I'm somewhat interested in having it work through the compilation steps I use for other GPU libraries like compiler-rt, libc, libcxx, libcxxabi, and soon openmp.

Basically it would look like this.

-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='libc;libclc

That would create a separate CMake invocation that would run with the https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_TARGET.html set to amdgcn-amd-amdhsa (or whatever triple you prefer.) So you'd be doing one job with only the AMD triple set. I don't know where the libraries get stored, but by default they'd go in lib/amdgcn-amd-amdhsa which clang knows where to find.

frasercrmck · 2025-02-13T10:18:57Z

I'll go add that builtin and come back to this - thanks.
What do you think about libclc making use of weak linkage in principle?

Weak should be handled properly by the targets, the one downside is that it prevents optimization within a TU because the symbol's initializer could change, but LTO / noRDC-mode compilation makes that irrelevant. Also, even though NVPTX is an ELF platform it doesn't handle extern weak properly (It's supposed to be defined to zero if not defined). I think using weak is fine in the sense that it's well supported by LLVM and the GPU targets, but I don't know OpenCL's perspective.

Thanks for the information. I don't think we rely on intra-TU optimizations too heavily as we generally build each builtin function in isolation. As for what OpenCL thinks, I suspect it's left loose in the specification. It "works" as far as clang's concerned. The 'weak' CLC functions will generally always be inlined into a regular non-weak user-facing builtin at the interface layer.

Also completely unrelated question since you seem to know about libclc. It seems to require clang to be compiled, but can't be built through the runtimes interface. Do you know how difficult it would be to make that work? I'm somewhat interested in having it work through the compilation steps I use for other GPU libraries like compiler-rt, libc, libcxx, libcxxabi, and soon openmp.

Basically it would look like this.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='libc;libclc
That would create a separate CMake invocation that would run with the https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_TARGET.html set to amdgcn-amd-amdhsa (or whatever triple you prefer.) So you'd be doing one job with only the AMD triple set. I don't know where the libraries get stored, but by default they'd go in lib/amdgcn-amd-amdhsa which clang knows where to find.

I hadn't heard of this mechanism, thanks for pointing it out. It makes complete sense to me to allow this. I will take a look at it - without knowing how it works it's hard to say whether it'll be a lot of work.

Regarding triples, libclc doesn't currently have a amdgcn-amd-amdhsa target, per se. It's got amdgcn-- or amdgcn--amdhsa. Again I don't know how the runtimes triples work and how strict the triple matching would be or if libclc would have to massage triples into its own internal set. If a triple isn't found, like x86_64-unknown-linux-gnu is it silently ignored, warned, errored?

As for where the libraries are placed, it's currently not at all integrated with clang. They're currently in, e.g., build/tools/libclc/amdgcn--amdhsa.bc. There's no in-tree use of libclc, which I'd like to change. I know that @arsenm has been proposing that the libraries are made known to clang but I've not had the chance to look into that either.

arsenm · 2025-02-13T10:31:05Z

We currently don't do anything with the vendor. i.e. amdgcn-amd-amdhsa and amdgcn--amdhsa are identical. I believe spirv-amd-... does something since recently in the clang driver

arsenm · 2025-02-13T10:31:33Z

As for where the libraries are placed, it's currently not at all integrated with clang.

I think the correct place is the clang resource directory

frasercrmck · 2025-02-13T11:12:33Z

Oh, also for context, there was #124709 a few weeks ago. So it appears that there are other people interested in moving libclc to the runtimes infrastructure.

jhuber6 · 2025-02-13T13:09:47Z

I think the correct place is the clang resource directory

Tough to say, the compiler resource directory is for things the compiler emits as a part of the language, yet we don't store libomp there. I'd say it makes sense, if you had lib/clang/21/lib/amdgcn-amd-amdhsa/libclang_rt.clc.a it would be in-line with things like builtins and instrumentation.

Regarding triples, libclc doesn't currently have a amdgcn-amd-amdhsa target, per se. It's got amdgcn-- or amdgcn--amdhsa. Again I don't know how the runtimes triples work and how strict the triple matching would be or if libclc would have to massage triples into its own internal set. If a triple isn't found, like x86_64-unknown-linux-gnu is it silently ignored, warned, errored?

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--. The one problem I see is that if we do the per-target install directories, then amdgcn-- and amdgcn-amd-amdhsa will have different directories. I wonder if we could just unify those somehow, or just accept amdgcn-amd-amdhsa in libclc as the same thing.

Oh, also for context, there was #124709 a few weeks ago. So it appears that there are other people interested in moving libclc to the runtimes infrastructure.

Yep, part of why I'm bringing it up. So, the runtimes build is basically a CMake ExternalProject that points to the project. The file ./runtimes/CMakeLists.txt is the entry point for that build. So, the llvm/runtimes/CMakeListst.txt basically creates sub-jobs by pointing them to ./runtimes/CMakeLists.txt. This is mostly helpful because it guarantees you're going to be using an up-to-date clang compiler for the build.

Right now what libclc does would be pretty easy to port to a normal LLVM_ENABLE_RUNTIMES=libclc build, because we currently create a few jobs and compile them with --target=. So, that should likely still work. You can also do cross-compiling through the runtimes interface, which is my request. So likely we'd have LLVM_ENABLE_RUNTIMES=libclc work as it does now, But also for -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libclc to basically just enable amdgcn-- currently.

I have an unfortunate amount of experience mangling the runtimes builds, so if you need help feel free to ask.

arsenm · 2025-02-13T13:14:13Z

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--

These are not equivalent

jhuber6 · 2025-02-13T13:17:52Z

Yeah, amdgcn-amd-amdhsa should be roughly equivalent to amdgcn--

These are not equivalent

That's something I've been meaning to ask, the difference I know of right now is that amdgcn-- will not create a ROCm toolchain so we don't do any of that. However, I don't know to what extent the hsa in the triple matters, because there's certain builtins that are tied to the implicit arguments. I never knew if that was an hsa thing or we considered the code object universal to amdgcn--. I mostly just meant that IR compiled with amdgcn-- should be able to link with the HSA os version.

arsenm · 2025-02-13T13:31:44Z

amdgcn-- is whatever mesa was doing many years ago, and amdgcn-mesa-mesa3d is "almost HSA" but not quite. The kernel inputs are different, and that should propagate through the entire function Abi as well. That part was probably never implemented though since clover wasn't maintained

frasercrmck · 2025-02-25T11:50:44Z

An update on this: in adding a clang elementwise builtin for ldexp (and for clz/ctz) I found it necessary to first refactor the clang diagnostics. I opened a PR for it (#125673) which isn't getting much traction so that's slowing things down.

Unfortunately lots of other functions rely on ldexp so there's a bottleneck here.

arsenm

This should be an OK temporary step

This function was already conceptually in the CLC namespace - this just formally moves it over. Note however that this commit marks a change in how libclc functions may be overridden by targets. Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety. This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro. One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's *not* interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use. We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granulaity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted. Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.

frasercrmck added the libclc libclc OpenCL library label Feb 6, 2025

frasercrmck requested a review from arsenm February 6, 2025 14:54

arsenm approved these changes Feb 26, 2025

View reviewed changes

frasercrmck force-pushed the libclc-clc-ldexp branch from dc4bcfc to 27f4e94 Compare February 26, 2025 10:37

frasercrmck merged commit d5038b3 into llvm:main Feb 26, 2025
11 checks passed

frasercrmck deleted the libclc-clc-ldexp branch February 26, 2025 11:20

huaatian mentioned this pull request Feb 28, 2025

fix live interval empty issue huaatian/llvm-project#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libclc] Move __clc_ldexp to CLC library #126078

[libclc] Move __clc_ldexp to CLC library #126078

frasercrmck commented Feb 6, 2025

arsenm commented Feb 6, 2025

frasercrmck commented Feb 6, 2025

arsenm commented Feb 6, 2025

frasercrmck commented Feb 6, 2025

jhuber6 commented Feb 6, 2025 •

edited

Loading

frasercrmck commented Feb 13, 2025

arsenm commented Feb 13, 2025

arsenm commented Feb 13, 2025

frasercrmck commented Feb 13, 2025

jhuber6 commented Feb 13, 2025

arsenm commented Feb 13, 2025

jhuber6 commented Feb 13, 2025

arsenm commented Feb 13, 2025

frasercrmck commented Feb 25, 2025

arsenm left a comment

[libclc] Move __clc_ldexp to CLC library #126078

[libclc] Move __clc_ldexp to CLC library #126078

Conversation

frasercrmck commented Feb 6, 2025

arsenm commented Feb 6, 2025

frasercrmck commented Feb 6, 2025

arsenm commented Feb 6, 2025

frasercrmck commented Feb 6, 2025

jhuber6 commented Feb 6, 2025 • edited Loading

frasercrmck commented Feb 13, 2025

arsenm commented Feb 13, 2025

arsenm commented Feb 13, 2025

frasercrmck commented Feb 13, 2025

jhuber6 commented Feb 13, 2025

arsenm commented Feb 13, 2025

jhuber6 commented Feb 13, 2025

arsenm commented Feb 13, 2025

frasercrmck commented Feb 25, 2025

arsenm left a comment

Choose a reason for hiding this comment

jhuber6 commented Feb 6, 2025 •

edited

Loading