Skip to content

[X86] Improve __bf16 code generation #134859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

antoniofrighetto
Copy link
Contributor

bf16 is a typedef short type introduced in AVX-512_BF16 and should be able to leverage SSE/AVX registers used for f16.

Fixes: #134222.

`bf16` is a typedef short type introduced in AVX-512_BF16 and
should be able to leverage SSE/AVX registers used for `f16`.

Fixes: llvm#134222.
Comment on lines 2459 to 2462
let Predicates = [HasAVX512, HasBF16] in {
def : Pat<(f16 (bitconvert (bf16 FR16X:$src))), (f16 FR16X:$src)>;
def : Pat<(bf16 (bitconvert (f16 FR16X:$src))), (bf16 FR16X:$src)>;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phoebewang Could we reuse FR16X for bf16 as well here? Changing FR16X definition to include bf16:


def FR16X : RegisterClass<"X86", [f16, bf16], 16, (add FR32X)> {let Size = 32;}

Seems to lead to a bunch of compilation issues due to ambiguity in instruction pattern :( Should we create a new class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can use FR16X for bf16. We used in this way for different vector types within the same size. It just requires updating affected patterns with explicit type cast.

@phoebewang
Copy link
Contributor

Thanks for the fix! We intended not to assign __bf16 a register classes because 1) there's not a single scalar BF16 instrcution supported on x86 so far; 2) we can leverage the target independent expansion to lower BF16 operations.

It is not a big problem though. We support FP16 without any the scalar feature enabled too. But there's more work to do than the current change. The FP16 type enabling patch would be a good example.

OTOH, I'm not sure if it's a big hammer for the issue here. Would it be better to explore some small change in fastisel side for it?

@antoniofrighetto
Copy link
Contributor Author

@phoebewang Thanks for the suggestion (and sorry for getting back only now), agree we should be using the already-existing FR16X register class, updated affected patterns (first fixup commit)!

In the second fixup commit, I took the chance to add custom hook for extension/truncation: while fptrunc is already handled correctly (https://llvm.godbolt.org/z/K87Tor8qP), I don't think that is the case for fpext (redundant shift of bits + call to __truncsfbf2, which shouId just be a call to __extendbfsf2), though I believe this should be orthogonal to the current miscompilation (might as well add in a different patch if any).

I tried to restrain the changes to FastISel only, however, even so, there were some other patterns to be added. There are some test failures mostly related to missing instruction selection for some nodes, e.g., bf16 = ConstantFP<APFloat(0)> (think we miss some move pattern here, this comes due to the fact that a v32f16 = BUILD_VECTOR node of 32 ConstantFP:f16<APFloat(0)> elements should have been correctly replaced with a single v32f16 = X86ISD::VBROADCAST ConstantFP:bf16<APFloat(0)>), but if possible it'd be nice to confirm this is the correct direction before.

@antoniofrighetto antoniofrighetto changed the title [X86] Add support for __bf16 to f16 conversion [X86] Improve __bf16 code generation Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect float conversion via __bf16 at -O0
2 participants