-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[InstCombine] Fold Xor with or disjoint #105992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Amr Hesham (AmrDeveloper) ChangesImplement a missing optimization fold from ((select C1, C2) | A) ^ C3) to be ((Select C1 ^ C3, C2 ^ C3) | A) Fixes: #79266 Full diff: https://github.com/llvm/llvm-project/pull/105992.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index b703bc7d04db58..59bd133e56b118 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -4056,6 +4056,25 @@ Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
return nullptr;
}
+static Instruction *foldXorToOr(BinaryOperator &I,
+ InstCombiner::BuilderTy &Builder) {
+ assert(I.getOpcode() == Instruction::Xor);
+ Value *A, *B, *C, *D, *E;
+
+ // ((select C, A, B) | E) ^ D) -> (select C, A ^ D, B ^ D) | E)
+ if (match(&I,
+ m_c_Xor(m_c_DisjointOr(m_Select(m_Value(C), m_Value(A), m_Value(B)),
+ m_Value(E)),
+ m_Value(D)))) {
+ Value *XorAB = Builder.CreateXor(A, D);
+ Value *XorBD = Builder.CreateXor(B, D);
+ Value *S = Builder.CreateSelect(C, XorAB, XorBD);
+ return BinaryOperator::CreateDisjointOr(S, E);
+ }
+
+ return nullptr;
+}
+
/// A ^ B can be specified using other logic ops in a variety of patterns. We
/// can fold these early and efficiently by morphing an existing instruction.
static Instruction *foldXorToXor(BinaryOperator &I,
@@ -4658,6 +4677,9 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
if (Instruction *NewXor = foldXorToXor(I, Builder))
return NewXor;
+ if (Instruction *NewOr = foldXorToOr(I, Builder))
+ return NewOr;
+
// (A&B)^(A&C) -> A&(B^C) etc
if (Value *V = foldUsingDistributiveLaws(I))
return replaceInstUsesWith(I, V);
diff --git a/llvm/test/Transforms/InstCombine/disjoint_or.ll b/llvm/test/Transforms/InstCombine/disjoint_or.ll
new file mode 100644
index 00000000000000..81fa4c33199fd4
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/disjoint_or.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i32 @fold_xor_with_disjoint_or(i32 %a, i1 %c) {
+; CHECK-LABEL: define i32 @fold_xor_with_disjoint_or(
+; CHECK-SAME: i32 [[A:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[A]], 4
+; CHECK-NEXT: [[TMP1:%.*]] = select i1 [[C]], i32 4, i32 0
+; CHECK-NEXT: [[XOR:%.*]] = or disjoint i32 [[TMP1]], [[SHL]]
+; CHECK-NEXT: ret i32 [[XOR]]
+;
+ %s = select i1 %c, i32 0, i32 4
+ %shl = shl i32 %a, 4
+ %or = or disjoint i32 %s, %shl
+ %xor = xor i32 %or, 4
+ ret i32 %xor
+}
+
+define <2 x i32> @fold_xor_with_disjoint_or_vec(<2 x i32> %a, i1 %c) {
+; CHECK-LABEL: define <2 x i32> @fold_xor_with_disjoint_or_vec(
+; CHECK-SAME: <2 x i32> [[A:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT: [[SHL:%.*]] = shl <2 x i32> [[A]], <i32 4, i32 4>
+; CHECK-NEXT: [[TMP1:%.*]] = select i1 [[C]], <2 x i32> <i32 4, i32 4>, <2 x i32> zeroinitializer
+; CHECK-NEXT: [[XOR:%.*]] = or disjoint <2 x i32> [[TMP1]], [[SHL]]
+; CHECK-NEXT: ret <2 x i32> [[XOR]]
+;
+ %s = select i1 %c, <2 x i32> <i32 0, i32 0>, <2 x i32> <i32 4, i32 4>
+ %shl = shl <2 x i32> %a, <i32 4, i32 4>
+ %or = or disjoint <2 x i32> %s, %shl
+ %xor = xor <2 x i32> %or, <i32 4, i32 4>
+ ret <2 x i32> %xor
+}
|
7c579c8
to
edd78ee
Compare
@@ -0,0 +1,32 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: opt < %s -passes=instcombine -S | FileCheck %s | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should place the tests in the xor.ll
.
%or = or disjoint <2 x i32> %s, %shl | ||
%xor = xor <2 x i32> %or, <i32 4, i32 4> | ||
ret <2 x i32> %xor | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need more test cases e.g., negative tests, one use, more than one use case, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need test cases for commutative operations (or
and xor
) to see if the optimization works when the LHS and RHS are shuffled.
hmm, I think we need only for or as it has a non-const operand A
, but we don't need for xor
as the canonicalizer always places constants on the right-hand side.
Value *A, *B, *C, *D, *E; | ||
|
||
// ((select C, A, B) | E) ^ D) -> (select C, A ^ D, B ^ D) | E) | ||
if (match(&I, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this condition to the InstCombinerImpl::visitXor
function.
I can help with the condition, but I let you think.
further hints:
I think I should refer to @dtcxzyw to double-check my suggestions:
- I think the issue was about matching constants in place of
A
,B
, andD
, so It's better to usem_APInt
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this check be profitable if the or
has more than one use? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this check be profitable if the
or
has more than one use? 🤔
Probably not.
It's my fault for not providing generalized proof of the issue, but I don't think your pattern here is correct. |
Thank you, i change the pattern, test code ...etc |
Is this optimization profitable? We still have the same number of instructions, I think it depends on the target architecture if it has a special handling or if the pipelining prefers XOR. this is just a guess. |
This fold is profitable if |
edd78ee
to
9742ec2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great work, thanks.
@@ -4706,6 +4706,13 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) { | |||
return Xor; | |||
|
|||
Value *X, *Y; | |||
|
|||
// (A | B) ^ C -> (A ^ C) ^ B | |||
if (match(Op0, m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will handle C ^ (A | B)
!
if (match(Op0, m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))))) { | |
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))), m_Value(M)))) { |
There's an M
variable declared above if this is confusing you can declare a variable *Z in the if
clause
if (Value *Z; ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion, i used M
; CHECK-NEXT: [[OR:%.*]] = or disjoint <2 x i32> [[A:%.*]], [[B:%.*]] | ||
; CHECK-NEXT: [[XOR:%.*]] = xor <2 x i32> [[OR]], [[C:%.*]] | ||
; CHECK-NEXT: [[TMP0:%.*]] = xor <2 x i32> [[A:%.*]], [[C:%.*]] | ||
; CHECK-NEXT: [[XOR:%.*]] = xor <2 x i32> [[TMP0]], [[B:%.*]] | ||
; CHECK-NEXT: ret <2 x i32> [[XOR]] | ||
; | ||
entry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add more negative test cases?
you are using m_OneUse
we should have a test to check if this works as expected.
and a test for the C ^ (A | B)
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9d1106e
to
b2ecae0
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
639dd85
to
48c88e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?
48c88e6
to
2c2bdda
Compare
Done |
Any idea why it timeout on |
m_Value(M)))) { | ||
Value *XorAC = Builder.CreateXor(X, M); | ||
return BinaryOperator::CreateXor(XorAC, Y); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this desirable? I'd say disjoint or
is preferable to an xor
in most if not all respects.
I would limit this if X ^ M
folds. You can test it with:
if (Value * XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also try if Y, M
simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we do
- if X ^ M folds -> (X ^ M) ^ Y
- if Y ^ M folds -> (Y ^ M) ^ X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now i checked simplify and update samples, this is alive2 for current test depending that A ^ M will folded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With simplifyBinOp
, it could not handle the motivating case anymore. @dtcxzyw, could you test the impact of this patch? If it optimizes other cases still, I wouldn't argue about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An unrelated idea occurred to me: could we rank select cond, C1, C2
with the same number as constants in the Reassociate pass?
Probably an inf loop. Typically we do |
2c2bdda
to
f89cd57
Compare
f89cd57
to
f16a121
Compare
Yes it passed now |
f16a121
to
f6c06b9
Compare
Any updates for this PR review @XChy, @goldsteinn, @dtcxzyw? |
LGTM |
Thank you, you can accpet it :D, once other accepted we can merge |
@nikic Can you check this PR when you have time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
A possible alternative would be to try to integrate this inside SimplifyAssociativeOrCommutative, by replacing the opcode ==
matches there with a stronger check that knows that or disjoint
is compatible with or, xor and add. I'm okay with the separate fold though.
|
||
// (A | B) ^ C -> (A ^ C) ^ B | ||
// C ^ (A | B) -> B ^ (A ^ C) | ||
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))), | |
if (match(&I, m_c_Xor(m_OneUse(m_DisjointOr(m_Value(X), m_Value(Y))), |
Commutative matcher doesn't make sense if you are only capturing values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// (A | B) ^ C -> (A ^ C) ^ B | ||
// C ^ (A | B) -> B ^ (A ^ C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// (A | B) ^ C -> (A ^ C) ^ B | |
// C ^ (A | B) -> B ^ (A ^ C) | |
// (X | Y) ^ M -> (X ^ M) ^ Y | |
// (X | Y) ^ M -> (Y ^ M) ^ X |
Try to use the same names as in the code to make it easier to compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// C ^ (A | B) -> B ^ (A ^ C) | ||
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))), | ||
m_Value(M)))) { | ||
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ)) | |
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
; CHECK-NEXT: ret i32 [[B:%.*]] | ||
; | ||
entry: | ||
%or = or disjoint i32 %a, %b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also have a commuted variant with %b, %a
, if I didn't miss is somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ret i32 %xor | ||
} | ||
|
||
define i32 @xor_with_or_disjoint(i32 %a, i32 %b, i32 %c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused argument %c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Implement a missing optimization to fold (A | B) ^ C to (A ^ C) ^ B
f6c06b9
to
7107113
Compare
Fixes -> "Related to" or something? I don't think this PR fixes that issue as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (match(&I, m_c_Xor(m_OneUse(m_DisjointOr(m_Value(X), m_Value(Y))), | ||
m_Value(M)))) { | ||
if (Value *XorAC = | ||
simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I))) | |
simplifyXorInst(X, M, SQ.getWithInstruction(&I))) |
simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I))) | ||
return BinaryOperator::CreateXor(XorAC, Y); | ||
|
||
if (Value *XorBC = simplifyBinOp(Instruction::Xor, Y, M, SQ)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Value *XorBC = simplifyBinOp(Instruction::Xor, Y, M, SQ)) | |
if (Value *XorBC = simplifyXorInst(Y, M, SQ.getWithInstruction(&I))) |
Yes we changed the case, i will remove issue mention and describe the current fix Thank you |
Followup to #105992, use the simplifyXorInst helper and use getWithInstruction consistently.
Implement a missing optimization fold
(X | Y) ^ M to (X ^ M) ^ Y
and(X | Y) ^ M to (Y ^ M) ^ X