Skip to content

[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 8, 2024

Conversation

csstormq
Copy link
Contributor

@csstormq csstormq commented Jun 6, 2024

This patch simplifies sdiv to udiv by preserving the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C) if the sum of Op01C and Op1C will not overflow, and preserves the nuw flag unconditionally.

Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH

@csstormq csstormq requested a review from nikic as a code owner June 6, 2024 07:30
@llvmbot
Copy link
Member

llvmbot commented Jun 6, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (csstormq)

Changes

This patch simplifies sdiv to udiv by preserving the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C).


Full diff: https://github.com/llvm/llvm-project/pull/94586.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+11-2)
  • (modified) llvm/test/Transforms/InstCombine/sadd-with-overflow.ll (+1-1)
  • (added) llvm/test/Transforms/InstCombine/sdiv-simplify.ll (+15)
  • (modified) llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll (+9-9)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 8205b49dfbe2f..b2c1cfcd1148c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -905,8 +905,17 @@ Instruction *InstCombinerImpl::foldAddWithConstant(BinaryOperator &Add) {
 
   // (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the `or` is actually an `add`
   Constant *Op01C;
-  if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C))))
-    return BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+  if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C)))) {
+    bool HasNSW = Add.hasNoSignedWrap();
+    BinaryOperator *NewAdd =
+        BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+    // Preserve the nsw flag so that there is a chance to make some other
+    // transformations.
+    // For some cases, sdiv can be converted to udiv when the newly created add
+    // carrying the nsw flag is one of its operands.
+    NewAdd->setHasNoSignedWrap(HasNSW);
+    return NewAdd;
+  }
 
   // (X | C2) + C --> (X | C2) ^ C2 iff (C2 == -C)
   const APInt *C2;
diff --git a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
index 729ca03ddfd15..e4dd2d10637d3 100644
--- a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
+++ b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
@@ -125,7 +125,7 @@ define { i32, i1 } @fold_sub_simple(i32 %x) {
 
 define { i32, i1 } @fold_with_distjoin_or(i32 %x) {
 ; CHECK-LABEL: @fold_with_distjoin_or(
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[X:%.*]], 6
+; CHECK-NEXT:    [[B:%.*]] = add nsw i32 [[X:%.*]], 6
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { i32, i1 } { i32 poison, i1 false }, i32 [[B]], 0
 ; CHECK-NEXT:    ret { i32, i1 } [[TMP1]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/sdiv-simplify.ll b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
new file mode 100644
index 0000000000000..91d648e9093a7
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
@@ -0,0 +1,15 @@
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i32 @sdiv_to_udiv(i32 %arg0, i32 %arg1) {
+; CHECK-LABEL: @sdiv_to_udiv(
+; CHECK-NEXT:    [[T0:%.*]] = shl nuw nsw i32 [[ARG0:%.*]], 8
+; CHECK-NEXT:    [[T2:%.*]] = add nuw nsw i32 [[T0:%.*]], 6242049
+; CHECK-NEXT:    [[T3:%.*]] = udiv i32 [[T2]], 192
+; CHECK-NEXT:    ret i32 [[T3]]
+;
+  %t0 = shl nuw nsw i32 %arg0, 8
+  %t1 = or disjoint i32 %t0, 1
+  %t2 = add nuw nsw i32 %t1, 6242048
+  %t3 = sdiv i32 %t2, 192
+  ret i32 %t3
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
index 5cbf50e06fbe8..c4cd2379a7be2 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
@@ -182,11 +182,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.1:
 ; CHECK-NEXT:    [[INDEX_1:%.*]] = phi i64 [ 0, [[VECTOR_PH_1]] ], [ [[INDEX_NEXT_1:%.*]], [[VECTOR_BODY_1]] ]
 ; CHECK-NEXT:    [[TMP33:%.*]] = add nuw nsw i64 [[INDEX_1]], 15
-; CHECK-NEXT:    [[TMP34:%.*]] = add i64 [[INDEX_1]], 16
+; CHECK-NEXT:    [[TMP34:%.*]] = add nsw i64 [[INDEX_1]], 16
 ; CHECK-NEXT:    [[TMP35:%.*]] = insertelement <2 x i64> poison, i64 [[TMP33]], i64 0
 ; CHECK-NEXT:    [[TMP36:%.*]] = insertelement <2 x i64> [[TMP35]], i64 [[TMP34]], i64 1
-; CHECK-NEXT:    [[TMP37:%.*]] = add i64 [[INDEX_1]], 17
-; CHECK-NEXT:    [[TMP38:%.*]] = add i64 [[INDEX_1]], 18
+; CHECK-NEXT:    [[TMP37:%.*]] = add nsw i64 [[INDEX_1]], 17
+; CHECK-NEXT:    [[TMP38:%.*]] = add nsw i64 [[INDEX_1]], 18
 ; CHECK-NEXT:    [[TMP39:%.*]] = insertelement <2 x i64> poison, i64 [[TMP37]], i64 0
 ; CHECK-NEXT:    [[TMP40:%.*]] = insertelement <2 x i64> [[TMP39]], i64 [[TMP38]], i64 1
 ; CHECK-NEXT:    [[TMP41:%.*]] = icmp ult <2 x i64> [[TMP36]], <i64 225, i64 225>
@@ -259,11 +259,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.2:
 ; CHECK-NEXT:    [[INDEX_2:%.*]] = phi i64 [ 0, [[VECTOR_PH_2]] ], [ [[INDEX_NEXT_2:%.*]], [[VECTOR_BODY_2]] ]
 ; CHECK-NEXT:    [[TMP64:%.*]] = add nuw nsw i64 [[INDEX_2]], 30
-; CHECK-NEXT:    [[TMP65:%.*]] = add i64 [[INDEX_2]], 31
+; CHECK-NEXT:    [[TMP65:%.*]] = add nsw i64 [[INDEX_2]], 31
 ; CHECK-NEXT:    [[TMP66:%.*]] = insertelement <2 x i64> poison, i64 [[TMP64]], i64 0
 ; CHECK-NEXT:    [[TMP67:%.*]] = insertelement <2 x i64> [[TMP66]], i64 [[TMP65]], i64 1
-; CHECK-NEXT:    [[TMP68:%.*]] = add i64 [[INDEX_2]], 32
-; CHECK-NEXT:    [[TMP69:%.*]] = add i64 [[INDEX_2]], 33
+; CHECK-NEXT:    [[TMP68:%.*]] = add nsw i64 [[INDEX_2]], 32
+; CHECK-NEXT:    [[TMP69:%.*]] = add nsw i64 [[INDEX_2]], 33
 ; CHECK-NEXT:    [[TMP70:%.*]] = insertelement <2 x i64> poison, i64 [[TMP68]], i64 0
 ; CHECK-NEXT:    [[TMP71:%.*]] = insertelement <2 x i64> [[TMP70]], i64 [[TMP69]], i64 1
 ; CHECK-NEXT:    [[TMP72:%.*]] = icmp ult <2 x i64> [[TMP67]], <i64 225, i64 225>
@@ -336,11 +336,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.3:
 ; CHECK-NEXT:    [[INDEX_3:%.*]] = phi i64 [ 0, [[VECTOR_PH_3]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY_3]] ]
 ; CHECK-NEXT:    [[TMP95:%.*]] = add nuw nsw i64 [[INDEX_3]], 45
-; CHECK-NEXT:    [[TMP96:%.*]] = add i64 [[INDEX_3]], 46
+; CHECK-NEXT:    [[TMP96:%.*]] = add nsw i64 [[INDEX_3]], 46
 ; CHECK-NEXT:    [[TMP97:%.*]] = insertelement <2 x i64> poison, i64 [[TMP95]], i64 0
 ; CHECK-NEXT:    [[TMP98:%.*]] = insertelement <2 x i64> [[TMP97]], i64 [[TMP96]], i64 1
-; CHECK-NEXT:    [[TMP99:%.*]] = add i64 [[INDEX_3]], 47
-; CHECK-NEXT:    [[TMP100:%.*]] = add i64 [[INDEX_3]], 48
+; CHECK-NEXT:    [[TMP99:%.*]] = add nsw i64 [[INDEX_3]], 47
+; CHECK-NEXT:    [[TMP100:%.*]] = add nsw i64 [[INDEX_3]], 48
 ; CHECK-NEXT:    [[TMP101:%.*]] = insertelement <2 x i64> poison, i64 [[TMP99]], i64 0
 ; CHECK-NEXT:    [[TMP102:%.*]] = insertelement <2 x i64> [[TMP101]], i64 [[TMP100]], i64 1
 ; CHECK-NEXT:    [[TMP103:%.*]] = icmp ult <2 x i64> [[TMP98]], <i64 225, i64 225>

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always provide alive2 proofs for InstCombine changes, see https://llvm.org/docs/InstCombineContributorGuide.html#proofs.

The transform as implemented is incorrect: https://alive2.llvm.org/ce/z/pEk7wC

@nikic
Copy link
Contributor

nikic commented Jun 6, 2024

This would be a correct variant of the transform: https://alive2.llvm.org/ce/z/nrdCZT

@csstormq
Copy link
Contributor Author

csstormq commented Jun 6, 2024

Please always provide alive2 proofs for InstCombine changes, see https://llvm.org/docs/InstCombineContributorGuide.html#proofs.

Alive2 Proof: https://alive2.llvm.org/ce/z/CRKtkU

@csstormq csstormq requested a review from nikic June 6, 2024 08:43
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proof is not for the transform you actually implement. The transform is still incorrect in general, you merely picked out one example where it is correct. I have already shared what you need to do to make it correct in my second comment. The necessary pre-condition can be implemented using the willNotOverflowSignedAdd helper.

@csstormq
Copy link
Contributor Author

csstormq commented Jun 7, 2024

Your proof is not for the transform you actually implement. The transform is still incorrect in general, you merely picked out one example where it is correct. I have already shared what you need to do to make it correct in my second comment. The necessary pre-condition can be implemented using the willNotOverflowSignedAdd helper.

I guess I have fixed up, and add more test cases. Thanks a lot for your help. Please review again.

@csstormq csstormq requested a review from nikic June 7, 2024 07:48
csstormq added 2 commits June 7, 2024 17:33
1. Preserve the nuw flag also

2. Remove unnecessary comment

3. Move test cases to the better place
@csstormq csstormq changed the title [InstCombine] Preserve the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C) [InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) Jun 7, 2024
@csstormq
Copy link
Contributor Author

csstormq commented Jun 7, 2024

Update the title and description to reflect the intent of the latest code.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

1. Save the overflow check if not needed

2. Add preserve nuw test case
@csstormq
Copy link
Contributor Author

csstormq commented Jun 7, 2024

@nikic Thank you very much for your review. I have learned a lot from your comments.

@csstormq csstormq merged commit 96af114 into llvm:main Jun 8, 2024
7 checks passed
@csstormq csstormq deleted the instcmb branch June 8, 2024 01:20
nekoshirro pushed a commit to nekoshirro/Alchemist-LLVM that referenced this pull request Jun 9, 2024
… + (Op01C + Op1C) (llvm#94586)

This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for
`(X | Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and
`Op1C` will not overflow, and preserves the `nuw` flag unconditionally.

Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT,
https://alive2.llvm.org/ce/z/YnJHnH

Signed-off-by: Hafidz Muzakky <[email protected]>
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants