Skip to content

[InstCombine] Fold Xor with or disjoint #105992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 22, 2024

Conversation

AmrDeveloper
Copy link
Member

@AmrDeveloper AmrDeveloper commented Aug 25, 2024

Implement a missing optimization fold (X | Y) ^ M to (X ^ M) ^ Y and (X | Y) ^ M to (Y ^ M) ^ X

@llvmbot
Copy link
Member

llvmbot commented Aug 25, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Amr Hesham (AmrDeveloper)

Changes

Implement a missing optimization fold from ((select C1, C2) | A) ^ C3) to be ((Select C1 ^ C3, C2 ^ C3) | A)

Fixes: #79266


Full diff: https://github.com/llvm/llvm-project/pull/105992.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+22)
  • (added) llvm/test/Transforms/InstCombine/disjoint_or.ll (+32)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index b703bc7d04db58..59bd133e56b118 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -4056,6 +4056,25 @@ Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {
   return nullptr;
 }
 
+static Instruction *foldXorToOr(BinaryOperator &I,
+                                InstCombiner::BuilderTy &Builder) {
+  assert(I.getOpcode() == Instruction::Xor);
+  Value *A, *B, *C, *D, *E;
+
+  // ((select C, A, B) | E) ^ D) -> (select C, A ^ D, B ^ D) | E)
+  if (match(&I,
+            m_c_Xor(m_c_DisjointOr(m_Select(m_Value(C), m_Value(A), m_Value(B)),
+                                   m_Value(E)),
+                    m_Value(D)))) {
+    Value *XorAB = Builder.CreateXor(A, D);
+    Value *XorBD = Builder.CreateXor(B, D);
+    Value *S = Builder.CreateSelect(C, XorAB, XorBD);
+    return BinaryOperator::CreateDisjointOr(S, E);
+  }
+
+  return nullptr;
+}
+
 /// A ^ B can be specified using other logic ops in a variety of patterns. We
 /// can fold these early and efficiently by morphing an existing instruction.
 static Instruction *foldXorToXor(BinaryOperator &I,
@@ -4658,6 +4677,9 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
   if (Instruction *NewXor = foldXorToXor(I, Builder))
     return NewXor;
 
+  if (Instruction *NewOr = foldXorToOr(I, Builder))
+    return NewOr;
+
   // (A&B)^(A&C) -> A&(B^C) etc
   if (Value *V = foldUsingDistributiveLaws(I))
     return replaceInstUsesWith(I, V);
diff --git a/llvm/test/Transforms/InstCombine/disjoint_or.ll b/llvm/test/Transforms/InstCombine/disjoint_or.ll
new file mode 100644
index 00000000000000..81fa4c33199fd4
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/disjoint_or.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i32 @fold_xor_with_disjoint_or(i32 %a, i1 %c) {
+; CHECK-LABEL: define i32 @fold_xor_with_disjoint_or(
+; CHECK-SAME: i32 [[A:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[A]], 4
+; CHECK-NEXT:    [[TMP1:%.*]] = select i1 [[C]], i32 4, i32 0
+; CHECK-NEXT:    [[XOR:%.*]] = or disjoint i32 [[TMP1]], [[SHL]]
+; CHECK-NEXT:    ret i32 [[XOR]]
+;
+  %s = select i1 %c, i32 0, i32 4
+  %shl = shl i32 %a, 4
+  %or = or disjoint i32 %s, %shl
+  %xor = xor i32 %or, 4
+  ret i32 %xor
+}
+
+define <2 x i32> @fold_xor_with_disjoint_or_vec(<2 x i32> %a, i1 %c) {
+; CHECK-LABEL: define <2 x i32> @fold_xor_with_disjoint_or_vec(
+; CHECK-SAME: <2 x i32> [[A:%.*]], i1 [[C:%.*]]) {
+; CHECK-NEXT:    [[SHL:%.*]] = shl <2 x i32> [[A]], <i32 4, i32 4>
+; CHECK-NEXT:    [[TMP1:%.*]] = select i1 [[C]], <2 x i32> <i32 4, i32 4>, <2 x i32> zeroinitializer
+; CHECK-NEXT:    [[XOR:%.*]] = or disjoint <2 x i32> [[TMP1]], [[SHL]]
+; CHECK-NEXT:    ret <2 x i32> [[XOR]]
+;
+  %s = select i1 %c, <2 x i32> <i32 0, i32 0>, <2 x i32> <i32 4, i32 4>
+  %shl = shl <2 x i32> %a, <i32 4, i32 4>
+  %or = or disjoint <2 x i32> %s, %shl
+  %xor = xor <2 x i32> %or, <i32 4, i32 4>
+  ret <2 x i32> %xor
+}

@AmrDeveloper AmrDeveloper force-pushed the instcom_optimize_fold branch from 7c579c8 to edd78ee Compare August 25, 2024 16:47
@nikic nikic requested a review from dtcxzyw August 25, 2024 19:05
@@ -0,0 +1,32 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt < %s -passes=instcombine -S | FileCheck %s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should place the tests in the xor.ll.

%or = or disjoint <2 x i32> %s, %shl
%xor = xor <2 x i32> %or, <i32 4, i32 4>
ret <2 x i32> %xor
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need more test cases e.g., negative tests, one use, more than one use case, etc...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need test cases for commutative operations (or and xor) to see if the optimization works when the LHS and RHS are shuffled.
hmm, I think we need only for or as it has a non-const operand A, but we don't need for xor as the canonicalizer always places constants on the right-hand side.

Value *A, *B, *C, *D, *E;

// ((select C, A, B) | E) ^ D) -> (select C, A ^ D, B ^ D) | E)
if (match(&I,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this condition to the InstCombinerImpl::visitXor function.
I can help with the condition, but I let you think.
further hints:
I think I should refer to @dtcxzyw to double-check my suggestions:

  • I think the issue was about matching constants in place of A, B, and D, so It's better to use m_APInt instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this check be profitable if the or has more than one use? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this check be profitable if the or has more than one use? 🤔

Probably not.

@dtcxzyw dtcxzyw requested a review from goldsteinn August 26, 2024 06:10
@XChy
Copy link
Member

XChy commented Aug 26, 2024

It's my fault for not providing generalized proof of the issue, but I don't think your pattern here is correct.
IIUC, you try to fold (A | B) ^ C into (A ^ C) | B iff A ^ C can be simplified, but it's wrong:
https://alive2.llvm.org/ce/z/s7VV8u
The correct pattern should be (A | B) ^ C --> (A ^ C) ^ B: https://alive2.llvm.org/ce/z/u2vco3
The key is to treat or disjoint as xor.

@AmrDeveloper
Copy link
Member Author

It's my fault for not providing generalized proof of the issue, but I don't think your pattern here is correct. IIUC, you try to fold (A | B) ^ C into (A ^ C) | B iff A ^ C can be simplified, but it's wrong: https://alive2.llvm.org/ce/z/s7VV8u The correct pattern should be (A | B) ^ C --> (A ^ C) ^ B: https://alive2.llvm.org/ce/z/u2vco3 The key is to treat or disjoint as xor.

Thank you, i change the pattern, test code ...etc

@elhewaty
Copy link
Member

The correct pattern should be (A | B) ^ C --> (A ^ C) ^ B: https://alive2.llvm.org/ce/z/u2vco3
The key is to treat or disjoint as xor.

Is this optimization profitable? We still have the same number of instructions, I think it depends on the target architecture if it has a special handling or if the pipelining prefers XOR.

this is just a guess.

@XChy
Copy link
Member

XChy commented Aug 27, 2024

The correct pattern should be (A | B) ^ C --> (A ^ C) ^ B: https://alive2.llvm.org/ce/z/u2vco3
The key is to treat or disjoint as xor.

Is this optimization profitable? We still have the same number of instructions, I think it depends on the target architecture if it has a special handling or if the pipelining prefers XOR.

this is just a guess.

This fold is profitable if A ^ C folds. ((select C1, C2) | A) ^ C3 --> (select C1^C3, C2^C3) | A) is just a specific case.

@AmrDeveloper AmrDeveloper force-pushed the instcom_optimize_fold branch from edd78ee to 9742ec2 Compare August 29, 2024 19:40
Copy link
Member

@elhewaty elhewaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great work, thanks.

@@ -4706,6 +4706,13 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
return Xor;

Value *X, *Y;

// (A | B) ^ C -> (A ^ C) ^ B
if (match(Op0, m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will handle C ^ (A | B)!

Suggested change
if (match(Op0, m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))))) {
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))), m_Value(M)))) {

There's an M variable declared above if this is confusing you can declare a variable *Z in the if clause

 if (Value *Z; ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, i used M

; CHECK-NEXT: [[OR:%.*]] = or disjoint <2 x i32> [[A:%.*]], [[B:%.*]]
; CHECK-NEXT: [[XOR:%.*]] = xor <2 x i32> [[OR]], [[C:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = xor <2 x i32> [[A:%.*]], [[C:%.*]]
; CHECK-NEXT: [[XOR:%.*]] = xor <2 x i32> [[TMP0]], [[B:%.*]]
; CHECK-NEXT: ret <2 x i32> [[XOR]]
;
entry:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add more negative test cases?
you are using m_OneUse we should have a test to check if this works as expected.
and a test for the C ^ (A | B) case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AmrDeveloper AmrDeveloper force-pushed the instcom_optimize_fold branch 2 times, most recently from 9d1106e to b2ecae0 Compare August 30, 2024 17:40
Copy link

github-actions bot commented Aug 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AmrDeveloper AmrDeveloper force-pushed the instcom_optimize_fold branch 2 times, most recently from 639dd85 to 48c88e6 Compare August 31, 2024 06:20
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?

@AmrDeveloper
Copy link
Member Author

Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?

Done

@AmrDeveloper
Copy link
Member Author

Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?

Any idea why it timeout on ninja runtimes C++03 on linux CI, i did rebase to the code but still same issue

m_Value(M)))) {
Value *XorAC = Builder.CreateXor(X, M);
return BinaryOperator::CreateXor(XorAC, Y);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this desirable? I'd say disjoint or is preferable to an xor in most if not all respects.

I would limit this if X ^ M folds. You can test it with:
if (Value * XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also try if Y, M simplify.

Copy link
Member Author

@AmrDeveloper AmrDeveloper Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we do

  • if X ^ M folds -> (X ^ M) ^ Y
  • if Y ^ M folds -> (Y ^ M) ^ X

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i checked simplify and update samples, this is alive2 for current test depending that A ^ M will folded

https://alive2.llvm.org/ce/z/NLhfhN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With simplifyBinOp, it could not handle the motivating case anymore. @dtcxzyw, could you test the impact of this patch? If it optimizes other cases still, I wouldn't argue about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unrelated idea occurred to me: could we rank select cond, C1, C2 with the same number as constants in the Reassociate pass?

@goldsteinn
Copy link
Contributor

Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?

Any idea why it timeout on ninja runtimes C++03 on linux CI, i did rebase to the code but still same issue

Probably an inf loop. Typically we do xor -> disjoint or. If the three value all dont share bits I think this will inf loop between (xor (or disjoint x, y), z) -> (xor (xor x, z), y)

@AmrDeveloper
Copy link
Member Author

Can you add some tests to demonstrate the usefulness of this patch (e.g., the original issue #79266)?

Any idea why it timeout on ninja runtimes C++03 on linux CI, i did rebase to the code but still same issue

Probably an inf loop. Typically we do xor -> disjoint or. If the three value all dont share bits I think this will inf loop between (xor (or disjoint x, y), z) -> (xor (xor x, z), y)

Yes it passed now

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 8, 2024
@AmrDeveloper
Copy link
Member Author

Any updates for this PR review @XChy, @goldsteinn, @dtcxzyw?

@goldsteinn
Copy link
Contributor

LGTM

@AmrDeveloper
Copy link
Member Author

LGTM

Thank you, you can accpet it :D, once other accepted we can merge

@AmrDeveloper
Copy link
Member Author

@nikic Can you check this PR when you have time

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

A possible alternative would be to try to integrate this inside SimplifyAssociativeOrCommutative, by replacing the opcode == matches there with a stronger check that knows that or disjoint is compatible with or, xor and add. I'm okay with the separate fold though.


// (A | B) ^ C -> (A ^ C) ^ B
// C ^ (A | B) -> B ^ (A ^ C)
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))),
if (match(&I, m_c_Xor(m_OneUse(m_DisjointOr(m_Value(X), m_Value(Y))),

Commutative matcher doesn't make sense if you are only capturing values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 4698 to 4699
// (A | B) ^ C -> (A ^ C) ^ B
// C ^ (A | B) -> B ^ (A ^ C)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (A | B) ^ C -> (A ^ C) ^ B
// C ^ (A | B) -> B ^ (A ^ C)
// (X | Y) ^ M -> (X ^ M) ^ Y
// (X | Y) ^ M -> (Y ^ M) ^ X

Try to use the same names as in the code to make it easier to compare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// C ^ (A | B) -> B ^ (A ^ C)
if (match(&I, m_c_Xor(m_OneUse(m_c_DisjointOr(m_Value(X), m_Value(Y))),
m_Value(M)))) {
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ))
if (Value *XorAC = simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

; CHECK-NEXT: ret i32 [[B:%.*]]
;
entry:
%or = or disjoint i32 %a, %b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also have a commuted variant with %b, %a, if I didn't miss is somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ret i32 %xor
}

define i32 @xor_with_or_disjoint(i32 %a, i32 %b, i32 %c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused argument %c.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Implement a missing optimization to fold (A | B) ^ C to (A ^ C) ^ B
@nikic
Copy link
Contributor

nikic commented Sep 22, 2024

Fixes: #79266

Fixes -> "Related to" or something? I don't think this PR fixes that issue as-is.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (match(&I, m_c_Xor(m_OneUse(m_DisjointOr(m_Value(X), m_Value(Y))),
m_Value(M)))) {
if (Value *XorAC =
simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I)))
simplifyXorInst(X, M, SQ.getWithInstruction(&I)))

simplifyBinOp(Instruction::Xor, X, M, SQ.getWithInstruction(&I)))
return BinaryOperator::CreateXor(XorAC, Y);

if (Value *XorBC = simplifyBinOp(Instruction::Xor, Y, M, SQ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Value *XorBC = simplifyBinOp(Instruction::Xor, Y, M, SQ))
if (Value *XorBC = simplifyXorInst(Y, M, SQ.getWithInstruction(&I)))

@AmrDeveloper
Copy link
Member Author

Fixes: #79266

Fixes -> "Related to" or something? I don't think this PR fixes that issue as-is.

Yes we changed the case, i will remove issue mention and describe the current fix

Thank you

@AmrDeveloper AmrDeveloper merged commit 9614f69 into llvm:main Sep 22, 2024
8 checks passed
nikic added a commit that referenced this pull request Sep 23, 2024
Followup to #105992,
use the simplifyXorInst helper and use getWithInstruction
consistently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants