[NVPTX] Disable incorrect peephole optimizations #79920

justinfargnoli · 2024-01-30T00:36:14Z

The removed patterns are not correct. Doing the comparison, without first ensuring that the value only uses the lower 8 bits, can lead to incorrect results.

For example, when %a := Ox100 and %b := Ox1,
setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))) produces a false predicate. Whereas SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI will produce a true predicate.

This reverts commit ff77058.

justinfargnoli · 2024-01-30T00:40:48Z

Link to original comment: #67866 (comment)

Artem-B · 2024-01-30T01:32:24Z

def: Pat<(setule (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpLS)>;

I agree that the pattern is wrong, though the problem here is that we pass $a and $b directly to SETP_u32rr.
It should've been (SETP_u32rr BFE(Int32Regs:$a), BFE(Int32Regs:$b), CmpLS) as in the test cases changed by the patch.

Also, the example in the description seems to be invalid:

For example, when %a := Ox100 and %b := Ox1,
setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))) produces a false predicate. Whereas SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI will produce a true predicate.

Translated into a more readable pseudo-code:

a = 0x100
b = 0x1
pred =  ((uint16_t)(a) & 0xff) > ((uint16_t)(b) & 0xff);

AFAICT, pred should be false here. It's not the matching part of the pattern that we need to fix, but rather the replacement.

justinfargnoli · 2024-02-02T23:15:26Z

It should've been (SETP_u32rr BFE(Int32Regs:$a), BFE(Int32Regs:$b), CmpLS) as in the test cases changed by the patch.

That makes sense!

Also, the example in the description seems to be invalid:

For example, when %a := Ox100 and %b := Ox1, setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))) produces a false predicate. Whereas SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI will produce a true predicate.
Translated into a more readable pseudo-code:
a = 0x100
b = 0x1
pred =  ((uint16_t)(a) & 0xff) > ((uint16_t)(b) & 0xff);
AFAICT, pred should be false here. It's not the matching part of the pattern that we need to fix, but rather the replacement.

It seems like we're on the same page. pred should be false. But, the replacement will change the code to behave like the below pseudo code. That pseudo code will set pred to true, producing an incorrect result.

a = 0x100
b = 0x1
pred =  a > b;

Assuming there are no other review comments, can I submit this PR to get rid of the incorrect transformation?

justinfargnoli · 2024-02-08T17:44:25Z

Pinging @Artem-B @ThomasRaoux for review

Artem-B · 2024-02-08T19:11:30Z

Would you be willing to fix it, instead of removing it?

justinfargnoli · 2024-02-08T21:35:08Z

TLDR: I won't have the bandwidth to fix this in the foreseeable future.

I implemented the fix that you suggested (and I agree that it's likely the right way to go). However, it wasn't trivial to fix the unit tests. That gave me pause about submitting a fix that I don't fully understand. e.g. I haven't run tests to assess the performance impact of the fix and I don't understand the original rationale for the change. So, instead of doing a deep dive on the fix, I'd like to remove the erroneous code for the time being and revisit it when necessary.

Artem-B · 2024-02-08T22:18:56Z

Fair enough. Thank you for bringing the issue to my attention and giving it a try. Let me try fixing it forward, and if it does turn out too hairy, I'll merge this change.

Artem-B · 2024-02-09T01:32:09Z

This should fix the issue: https://github.com/llvm/llvm-project/compare/main...Artem-B:llvm-project:bfe-fix?expand=1

While I was tinkering with this, there may be some opportunities for further optimizations for v4i8 types.
We could potentially use PRMT instruction to split it into v2i16, do vectorized ops on v2i16, and then pack results back into v4i8 with PRMT.

One remaining quirk is that the patterns above create an interesting quirk -- we end up extracting the same fields twice, with different signedness. Once, to be used in comparison, and another if the same field is used for something else (and uses the BFE lowered for extractelt). I'll need to see what works best for ptxas:

bfe.s32 + bfs.u32
bfe.u32 + sign-extend i8->i16.

justinfargnoli · 2024-02-09T22:53:28Z

Closing in favor of #81308

justinfargnoli and others added 4 commits January 29, 2024 16:15

Revert "Disable incorrect peephole optimizations"

6c86d89

This reverts commit ff77058.

Merge branch 'llvm:main' into main

36e519a

[NVPTX] Disable incorrect peephole optimizations

4487d30

Add newline to end of test

0170bcd

justinfargnoli self-assigned this Jan 30, 2024

justinfargnoli requested review from ThomasRaoux and Artem-B January 30, 2024 00:39

justinfargnoli marked this pull request as ready for review January 30, 2024 00:40

justinfargnoli closed this Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Disable incorrect peephole optimizations #79920

[NVPTX] Disable incorrect peephole optimizations #79920

Uh oh!

justinfargnoli commented Jan 30, 2024 •

edited

Loading

Uh oh!

justinfargnoli commented Jan 30, 2024

Uh oh!

Artem-B commented Jan 30, 2024

Uh oh!

justinfargnoli commented Feb 2, 2024

Uh oh!

justinfargnoli commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 8, 2024

Uh oh!

justinfargnoli commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 9, 2024

Uh oh!

justinfargnoli commented Feb 9, 2024

Uh oh!

Uh oh!

[NVPTX] Disable incorrect peephole optimizations #79920

[NVPTX] Disable incorrect peephole optimizations #79920

Uh oh!

Conversation

justinfargnoli commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinfargnoli commented Jan 30, 2024

Uh oh!

Artem-B commented Jan 30, 2024

Uh oh!

justinfargnoli commented Feb 2, 2024

Uh oh!

justinfargnoli commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 8, 2024

Uh oh!

justinfargnoli commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 8, 2024

Uh oh!

Artem-B commented Feb 9, 2024

Uh oh!

justinfargnoli commented Feb 9, 2024

Uh oh!

Uh oh!

justinfargnoli commented Jan 30, 2024 •

edited

Loading