-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[libclc] Move __clc_ldexp to CLC library #126078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The amdgpu customization should be removed. This is legacy code from before we had the ldexp intrinsic. Regular __builtin_ldexp* should work now |
Good to know, thanks. No one else in libclc is using We could have |
Really we just want to get to llvm.ldexp. It's unfortunate we're stuck routing through these old GCC builtins that have any libm baggage associated with them
Yes |
I'll go add that builtin and come back to this - thanks. What do you think about libclc making use of weak linkage in principle? |
Weak should be handled properly by the targets, the one downside is that it prevents optimization within a TU because the symbol's initializer could change, but LTO / noRDC-mode compilation makes that irrelevant. Also, even though NVPTX is an ELF platform it doesn't handle Also completely unrelated question since you seem to know about Basically it would look like this.
That would create a separate CMake invocation that would run with the https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_TARGET.html set to |
Thanks for the information. I don't think we rely on intra-TU optimizations too heavily as we generally build each builtin function in isolation. As for what OpenCL thinks, I suspect it's left loose in the specification. It "works" as far as clang's concerned. The 'weak' CLC functions will generally always be inlined into a regular non-weak user-facing builtin at the interface layer.
I hadn't heard of this mechanism, thanks for pointing it out. It makes complete sense to me to allow this. I will take a look at it - without knowing how it works it's hard to say whether it'll be a lot of work. Regarding triples, libclc doesn't currently have a As for where the libraries are placed, it's currently not at all integrated with clang. They're currently in, e.g., |
We currently don't do anything with the vendor. i.e. amdgcn-amd-amdhsa and amdgcn--amdhsa are identical. I believe spirv-amd-... does something since recently in the clang driver |
I think the correct place is the clang resource directory |
Oh, also for context, there was #124709 a few weeks ago. So it appears that there are other people interested in moving libclc to the runtimes infrastructure. |
Tough to say, the compiler resource directory is for things the compiler emits as a part of the language, yet we don't store
Yeah,
Yep, part of why I'm bringing it up. So, the runtimes build is basically a CMake ExternalProject that points to the project. The file Right now what I have an unfortunate amount of experience mangling the runtimes builds, so if you need help feel free to ask. |
These are not equivalent |
That's something I've been meaning to ask, the difference I know of right now is that |
amdgcn-- is whatever mesa was doing many years ago, and amdgcn-mesa-mesa3d is "almost HSA" but not quite. The kernel inputs are different, and that should propagate through the entire function Abi as well. That part was probably never implemented though since clover wasn't maintained |
An update on this: in adding a clang elementwise builtin for ldexp (and for clz/ctz) I found it necessary to first refactor the clang diagnostics. I opened a PR for it (#125673) which isn't getting much traction so that's slowing things down. Unfortunately lots of other functions rely on ldexp so there's a bottleneck here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an OK temporary step
This function was already conceptually in the CLC namespace - this just formally moves it over. Note however that this commit marks a change in how libclc functions may be overridden by targets. Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety. This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro. One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's *not* interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use. We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granulaity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted. Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.
dc4bcfc
to
27f4e94
Compare
This function was already conceptually in the CLC namespace - this just formally moves it over.
Note however that this commit marks a change in how libclc functions may be overridden by targets.
Until now we have been using a purely build-system-based approach where targets could register identically-named files which took responsibility for the implementation of the builtin in its entirety.
This system wasn't well equipped to deal with AMD's overriding of __clc_ldexp for only a subset of types, and furthermore conditionally on a pre-defined macro.
One option for handling this would be to require AMD to duplicate code for the versions of __clc_ldexp it's not interested in overriding. We could also make it easier for targets to re-define CLC functions through macros or .inc files. Both of these have obvious downsides. We could also keep AMD's overriding in the OpenCL layer and bypass CLC altogether, but this has limited use.
We could use weak linkage on the "base" implementations of CLC functions, and allow targets to opt-in to providing their own implementations on a much finer granularity. This commit supports this as a proof of concept; we could expand it to all CLC builtins if accepted.
Note that the existing filename-based "claiming" approach is still in effect, so targets have to name their overrides differently to have both files compiled. This could also be refined.