Skip to content

[libclc] Move fma to the CLC library #126052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 24, 2025
Merged

Conversation

frasercrmck
Copy link
Contributor

This builtin is a little more involved than others as targets deal with
fma in various different ways.

Fundamentally, the CLC __clc_fma builtin compiles to
__builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic.
However, in the case of fp32 fma some targets call the __clc_sw_fma
function, which provides a software implementation of the builtin. This
in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a
runtime decision, depending on how the target defines that macro.

All targets build the CLC fma functions for all types. This is to the
CLC library can have a reliable internal implementation for its own
purposes.

For AMD/NVPTX targets there are no meaningful changes to the generated
LLVM bytecode. Some blocks of code have moved around, which confounds
llvm-diff.

For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its
use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32
first. This can either be a compile-time constant (1, for clspv) or a
runtime function for SPIR-V/Mesa.

The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It
unconditionally mapped that to the __clc_sw_fma builtin, even though the
generic version in theory had a runtime toggle through
__CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma,
though, would end up using the ExtInst fma, not calling the _Z3fmafff
function provided by libclc.

This commit keeps this system in place in the OpenCL layer, by mapping
fma to __clc_sw_fma. Where other builtins would previously call fma
(i.e., result in the ExtInst), they now call __clc_fma. This function
checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the
slow version or the quick version. The quick version is the LLVM fma
intrinsic which llvm-spirv translates to the ExtInst.

The clspv target had its own software implementation of fp32 fma, which
it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for
that target. This is potentially just so its library ships a software
version which it can fall back on. In the OpenCL layer, the target
doesn't provide fp64 fma, and maps fp16 fma to fp32 mad.

This commit keeps this system roughly in place: in the OpenCL layer it
maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would
previously call into fma, they now call __clc_fma, which compiles to the
LLVM intrinsic. If this goes through a translation to SPIR-V it will
become the fma ExtInst, or the intrinsic could be replaced by the
_Z3fmafff software implementation.

The clspv and SPIR-V/Mesa targets could potentially be cleaned up later,
depending on their needs.

@frasercrmck
Copy link
Contributor Author

FYI @rjodinchr this breaks downstream clspv testing.

# | declare !kernel_arg_name !7 float @llvm.fma.f32(float, float, float) #2
# | Unsupported llvm intrinsic
# | UNREACHABLE executed at /clspv/lib/SPIRVProducerPass.cpp:2390!

I was wondering if clspv should be translating llvm.fma to OpExtInst fma as llvm-spirv does.

@frasercrmck frasercrmck requested a review from arsenm February 6, 2025 11:58
@frasercrmck frasercrmck added the libclc libclc OpenCL library label Feb 6, 2025
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I still think libclc shouldn't be in the business of providing software fma. This is a problem for the ultimate codegen to decide

@rjodinchr
Copy link
Contributor

FYI @rjodinchr this breaks downstream clspv testing.

# | declare !kernel_arg_name !7 float @llvm.fma.f32(float, float, float) #2
# | Unsupported llvm intrinsic
# | UNREACHABLE executed at /clspv/lib/SPIRVProducerPass.cpp:2390!

I was wondering if clspv should be translating llvm.fma to OpExtInst fma as llvm-spirv does.

I guess it would just be a question of adding llvm.fma here: https://github.com/google/clspv/blob/0878df110af85e84d0c336e6571767278ad5eec8/lib/Builtins.cpp#L703

This builtin is a little more involved than others as targets deal with
fma in various different ways.

Fundamentally, the CLC __clc_fma builtin compiles to
__builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic.
However, in the case of fp32 fma some targets call the __clc_sw_fma
function, which provides a software implementation of the builtin. This
in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a
runtime decision, depending on how the target defines that macro.

All targets build the CLC fma functions for all types. This is to the
CLC library can have a reliable internal implementation for its own
purposes.

For AMD/NVPTX targets there are no meaningful changes to the generated
LLVM bytecode. Some blocks of code have moved around, which confounds
llvm-diff.

For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its
use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32
first. This can either be a compile-time constant (0, for clspv) or a
runtime function for SPIR-V/Mesa.

The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It
unconditionally mapped that to the __clc_sw_fma builtin, even though the
generic version in theory had a runtime toggle through
__CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma,
though, would end up using the ExtInst fma, *not* calling the _Z3fmafff
function provided by libclc.

This commit keeps this system in place in the OpenCL layer, by mapping
fma to __clc_sw_fma. Where other builtins would previously call fma
(i.e., result in the ExtInst), they now call __clc_fma. This function
checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the
slow version or the quick version. The quick version is the LLVM fma
intrinsic which llvm-spirv translates to the ExtInst.

The clspv target had its own software implementation of fp32 fma, which
it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for
that target. This is potentially just so its library ships a software
version which it can fall back on. In the OpenCL layer, the target
doesn't provide fp64 fma, and maps fp16 fma to fp32 mad.

This commit keeps this system roughly in place: in the OpenCL layer it
maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would
previously call into fma, they now call __clc_fma, which compiles to the
LLVM intrinsic. If this goes through a translation to SPIR-V it will
become the fma ExtInst, or the intrinsic could be replaced by the
_Z3fmafff software implementation.

The clspv and SPIR-V/Mesa targets could potentially be cleaned up later,
depending on their needs.
@frasercrmck
Copy link
Contributor Author

FYI @rjodinchr this breaks downstream clspv testing.

# | declare !kernel_arg_name !7 float @llvm.fma.f32(float, float, float) #2
# | Unsupported llvm intrinsic
# | UNREACHABLE executed at /clspv/lib/SPIRVProducerPass.cpp:2390!

I was wondering if clspv should be translating llvm.fma to OpExtInst fma as llvm-spirv does.

I guess it would just be a question of adding llvm.fma here: https://github.com/google/clspv/blob/0878df110af85e84d0c336e6571767278ad5eec8/lib/Builtins.cpp#L703

With that change, check-spirv does indeed pass. Does that mean this PR is good to go, do you think?

@rjodinchr
Copy link
Contributor

FYI @rjodinchr this breaks downstream clspv testing.

# | declare !kernel_arg_name !7 float @llvm.fma.f32(float, float, float) #2
# | Unsupported llvm intrinsic
# | UNREACHABLE executed at /clspv/lib/SPIRVProducerPass.cpp:2390!

I was wondering if clspv should be translating llvm.fma to OpExtInst fma as llvm-spirv does.

I guess it would just be a question of adding llvm.fma here: https://github.com/google/clspv/blob/0878df110af85e84d0c336e6571767278ad5eec8/lib/Builtins.cpp#L703

With that change, check-spirv does indeed pass. Does that mean this PR is good to go, do you think?

Yes, I'll update clspv when it is needed.

@frasercrmck frasercrmck merged commit e7ad07f into llvm:main Feb 24, 2025
11 checks passed
@frasercrmck frasercrmck deleted the libclc-clc-fma branch February 24, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants