-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[NVPTX] Disable incorrect peephole optimizations #79920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Link to original comment: #67866 (comment) |
I agree that the pattern is wrong, though the problem here is that we pass $a and $b directly to Also, the example in the description seems to be invalid:
Translated into a more readable pseudo-code:
AFAICT, |
That makes sense!
It seems like we're on the same page.
Assuming there are no other review comments, can I submit this PR to get rid of the incorrect transformation? |
Pinging @Artem-B @ThomasRaoux for review |
Would you be willing to fix it, instead of removing it? |
TLDR: I won't have the bandwidth to fix this in the foreseeable future. I implemented the fix that you suggested (and I agree that it's likely the right way to go). However, it wasn't trivial to fix the unit tests. That gave me pause about submitting a fix that I don't fully understand. e.g. I haven't run tests to assess the performance impact of the fix and I don't understand the original rationale for the change. So, instead of doing a deep dive on the fix, I'd like to remove the erroneous code for the time being and revisit it when necessary. |
Fair enough. Thank you for bringing the issue to my attention and giving it a try. Let me try fixing it forward, and if it does turn out too hairy, I'll merge this change. |
This should fix the issue: https://github.com/llvm/llvm-project/compare/main...Artem-B:llvm-project:bfe-fix?expand=1 While I was tinkering with this, there may be some opportunities for further optimizations for v4i8 types. One remaining quirk is that the patterns above create an interesting quirk -- we end up extracting the same fields twice, with different signedness. Once, to be used in comparison, and another if the same field is used for something else (and uses the BFE lowered for extractelt). I'll need to see what works best for ptxas:
|
Closing in favor of #81308 |
The removed patterns are not correct. Doing the comparison, without first ensuring that the value only uses the lower 8 bits, can lead to incorrect results.
For example, when
%a := Ox100
and%b := Ox1
,setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255)))
produces a false predicate. WhereasSETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI
will produce a true predicate.