-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: None (csstormq) ChangesThis patch simplifies Full diff: https://github.com/llvm/llvm-project/pull/94586.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 8205b49dfbe2f..b2c1cfcd1148c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -905,8 +905,17 @@ Instruction *InstCombinerImpl::foldAddWithConstant(BinaryOperator &Add) {
// (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the `or` is actually an `add`
Constant *Op01C;
- if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C))))
- return BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+ if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C)))) {
+ bool HasNSW = Add.hasNoSignedWrap();
+ BinaryOperator *NewAdd =
+ BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+ // Preserve the nsw flag so that there is a chance to make some other
+ // transformations.
+ // For some cases, sdiv can be converted to udiv when the newly created add
+ // carrying the nsw flag is one of its operands.
+ NewAdd->setHasNoSignedWrap(HasNSW);
+ return NewAdd;
+ }
// (X | C2) + C --> (X | C2) ^ C2 iff (C2 == -C)
const APInt *C2;
diff --git a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
index 729ca03ddfd15..e4dd2d10637d3 100644
--- a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
+++ b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
@@ -125,7 +125,7 @@ define { i32, i1 } @fold_sub_simple(i32 %x) {
define { i32, i1 } @fold_with_distjoin_or(i32 %x) {
; CHECK-LABEL: @fold_with_distjoin_or(
-; CHECK-NEXT: [[B:%.*]] = add i32 [[X:%.*]], 6
+; CHECK-NEXT: [[B:%.*]] = add nsw i32 [[X:%.*]], 6
; CHECK-NEXT: [[TMP1:%.*]] = insertvalue { i32, i1 } { i32 poison, i1 false }, i32 [[B]], 0
; CHECK-NEXT: ret { i32, i1 } [[TMP1]]
;
diff --git a/llvm/test/Transforms/InstCombine/sdiv-simplify.ll b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
new file mode 100644
index 0000000000000..91d648e9093a7
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
@@ -0,0 +1,15 @@
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i32 @sdiv_to_udiv(i32 %arg0, i32 %arg1) {
+; CHECK-LABEL: @sdiv_to_udiv(
+; CHECK-NEXT: [[T0:%.*]] = shl nuw nsw i32 [[ARG0:%.*]], 8
+; CHECK-NEXT: [[T2:%.*]] = add nuw nsw i32 [[T0:%.*]], 6242049
+; CHECK-NEXT: [[T3:%.*]] = udiv i32 [[T2]], 192
+; CHECK-NEXT: ret i32 [[T3]]
+;
+ %t0 = shl nuw nsw i32 %arg0, 8
+ %t1 = or disjoint i32 %t0, 1
+ %t2 = add nuw nsw i32 %t1, 6242048
+ %t3 = sdiv i32 %t2, 192
+ ret i32 %t3
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
index 5cbf50e06fbe8..c4cd2379a7be2 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
@@ -182,11 +182,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
; CHECK: vector.body.1:
; CHECK-NEXT: [[INDEX_1:%.*]] = phi i64 [ 0, [[VECTOR_PH_1]] ], [ [[INDEX_NEXT_1:%.*]], [[VECTOR_BODY_1]] ]
; CHECK-NEXT: [[TMP33:%.*]] = add nuw nsw i64 [[INDEX_1]], 15
-; CHECK-NEXT: [[TMP34:%.*]] = add i64 [[INDEX_1]], 16
+; CHECK-NEXT: [[TMP34:%.*]] = add nsw i64 [[INDEX_1]], 16
; CHECK-NEXT: [[TMP35:%.*]] = insertelement <2 x i64> poison, i64 [[TMP33]], i64 0
; CHECK-NEXT: [[TMP36:%.*]] = insertelement <2 x i64> [[TMP35]], i64 [[TMP34]], i64 1
-; CHECK-NEXT: [[TMP37:%.*]] = add i64 [[INDEX_1]], 17
-; CHECK-NEXT: [[TMP38:%.*]] = add i64 [[INDEX_1]], 18
+; CHECK-NEXT: [[TMP37:%.*]] = add nsw i64 [[INDEX_1]], 17
+; CHECK-NEXT: [[TMP38:%.*]] = add nsw i64 [[INDEX_1]], 18
; CHECK-NEXT: [[TMP39:%.*]] = insertelement <2 x i64> poison, i64 [[TMP37]], i64 0
; CHECK-NEXT: [[TMP40:%.*]] = insertelement <2 x i64> [[TMP39]], i64 [[TMP38]], i64 1
; CHECK-NEXT: [[TMP41:%.*]] = icmp ult <2 x i64> [[TMP36]], <i64 225, i64 225>
@@ -259,11 +259,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
; CHECK: vector.body.2:
; CHECK-NEXT: [[INDEX_2:%.*]] = phi i64 [ 0, [[VECTOR_PH_2]] ], [ [[INDEX_NEXT_2:%.*]], [[VECTOR_BODY_2]] ]
; CHECK-NEXT: [[TMP64:%.*]] = add nuw nsw i64 [[INDEX_2]], 30
-; CHECK-NEXT: [[TMP65:%.*]] = add i64 [[INDEX_2]], 31
+; CHECK-NEXT: [[TMP65:%.*]] = add nsw i64 [[INDEX_2]], 31
; CHECK-NEXT: [[TMP66:%.*]] = insertelement <2 x i64> poison, i64 [[TMP64]], i64 0
; CHECK-NEXT: [[TMP67:%.*]] = insertelement <2 x i64> [[TMP66]], i64 [[TMP65]], i64 1
-; CHECK-NEXT: [[TMP68:%.*]] = add i64 [[INDEX_2]], 32
-; CHECK-NEXT: [[TMP69:%.*]] = add i64 [[INDEX_2]], 33
+; CHECK-NEXT: [[TMP68:%.*]] = add nsw i64 [[INDEX_2]], 32
+; CHECK-NEXT: [[TMP69:%.*]] = add nsw i64 [[INDEX_2]], 33
; CHECK-NEXT: [[TMP70:%.*]] = insertelement <2 x i64> poison, i64 [[TMP68]], i64 0
; CHECK-NEXT: [[TMP71:%.*]] = insertelement <2 x i64> [[TMP70]], i64 [[TMP69]], i64 1
; CHECK-NEXT: [[TMP72:%.*]] = icmp ult <2 x i64> [[TMP67]], <i64 225, i64 225>
@@ -336,11 +336,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
; CHECK: vector.body.3:
; CHECK-NEXT: [[INDEX_3:%.*]] = phi i64 [ 0, [[VECTOR_PH_3]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY_3]] ]
; CHECK-NEXT: [[TMP95:%.*]] = add nuw nsw i64 [[INDEX_3]], 45
-; CHECK-NEXT: [[TMP96:%.*]] = add i64 [[INDEX_3]], 46
+; CHECK-NEXT: [[TMP96:%.*]] = add nsw i64 [[INDEX_3]], 46
; CHECK-NEXT: [[TMP97:%.*]] = insertelement <2 x i64> poison, i64 [[TMP95]], i64 0
; CHECK-NEXT: [[TMP98:%.*]] = insertelement <2 x i64> [[TMP97]], i64 [[TMP96]], i64 1
-; CHECK-NEXT: [[TMP99:%.*]] = add i64 [[INDEX_3]], 47
-; CHECK-NEXT: [[TMP100:%.*]] = add i64 [[INDEX_3]], 48
+; CHECK-NEXT: [[TMP99:%.*]] = add nsw i64 [[INDEX_3]], 47
+; CHECK-NEXT: [[TMP100:%.*]] = add nsw i64 [[INDEX_3]], 48
; CHECK-NEXT: [[TMP101:%.*]] = insertelement <2 x i64> poison, i64 [[TMP99]], i64 0
; CHECK-NEXT: [[TMP102:%.*]] = insertelement <2 x i64> [[TMP101]], i64 [[TMP100]], i64 1
; CHECK-NEXT: [[TMP103:%.*]] = icmp ult <2 x i64> [[TMP98]], <i64 225, i64 225>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always provide alive2 proofs for InstCombine changes, see https://llvm.org/docs/InstCombineContributorGuide.html#proofs.
The transform as implemented is incorrect: https://alive2.llvm.org/ce/z/pEk7wC
This would be a correct variant of the transform: https://alive2.llvm.org/ce/z/nrdCZT |
Alive2 Proof: https://alive2.llvm.org/ce/z/CRKtkU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proof is not for the transform you actually implement. The transform is still incorrect in general, you merely picked out one example where it is correct. I have already shared what you need to do to make it correct in my second comment. The necessary pre-condition can be implemented using the willNotOverflowSignedAdd helper.
I guess I have fixed up, and add more test cases. Thanks a lot for your help. Please review again. |
1. Preserve the nuw flag also 2. Remove unnecessary comment 3. Move test cases to the better place
Update the title and description to reflect the intent of the latest code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. Save the overflow check if not needed 2. Add preserve nuw test case
@nikic Thank you very much for your review. I have learned a lot from your comments. |
… + (Op01C + Op1C) (llvm#94586) This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for `(X | Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and `Op1C` will not overflow, and preserves the `nuw` flag unconditionally. Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH Signed-off-by: Hafidz Muzakky <[email protected]>
This patch simplifies
sdiv
toudiv
by preserving thensw
flag for(X | Op01C) + Op1C --> X + (Op01C + Op1C)
if the sum ofOp01C
andOp1C
will not overflow, and preserves thenuw
flag unconditionally.Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH