-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[X86] Improve __bf16
code generation
#134859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[X86] Improve __bf16
code generation
#134859
Conversation
`bf16` is a typedef short type introduced in AVX-512_BF16 and should be able to leverage SSE/AVX registers used for `f16`. Fixes: llvm#134222.
let Predicates = [HasAVX512, HasBF16] in { | ||
def : Pat<(f16 (bitconvert (bf16 FR16X:$src))), (f16 FR16X:$src)>; | ||
def : Pat<(bf16 (bitconvert (f16 FR16X:$src))), (bf16 FR16X:$src)>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang Could we reuse FR16X
for bf16 as well here? Changing FR16X definition to include bf16:
def FR16X : RegisterClass<"X86", [f16, bf16], 16, (add FR32X)> {let Size = 32;}
Seems to lead to a bunch of compilation issues due to ambiguity in instruction pattern :( Should we create a new class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use FR16X
for bf16. We used in this way for different vector types within the same size. It just requires updating affected patterns with explicit type cast.
Thanks for the fix! We intended not to assign It is not a big problem though. We support FP16 without any the scalar feature enabled too. But there's more work to do than the current change. The FP16 type enabling patch would be a good example. OTOH, I'm not sure if it's a big hammer for the issue here. Would it be better to explore some small change in fastisel side for it? |
@phoebewang Thanks for the suggestion (and sorry for getting back only now), agree we should be using the already-existing In the second fixup commit, I took the chance to add custom hook for extension/truncation: while I tried to restrain the changes to FastISel only, however, even so, there were some other patterns to be added. There are some test failures mostly related to missing instruction selection for some nodes, e.g., |
__bf16
to f16
conversion__bf16
code generation
bf16
is a typedef short type introduced in AVX-512_BF16 and should be able to leverage SSE/AVX registers used forf16
.Fixes: #134222.