[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

csstormq · 2024-06-06T07:30:26Z

This patch simplifies sdiv to udiv by preserving the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C) if the sum of Op01C and Op1C will not overflow, and preserves the nuw flag unconditionally.

Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH

…p01C + Op1C)

llvmbot · 2024-06-06T07:30:58Z

@llvm/pr-subscribers-llvm-transforms

Author: None (csstormq)

Changes

This patch simplifies sdiv to udiv by preserving the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C).

Full diff: https://github.com/llvm/llvm-project/pull/94586.diff

4 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+11-2)
(modified) llvm/test/Transforms/InstCombine/sadd-with-overflow.ll (+1-1)
(added) llvm/test/Transforms/InstCombine/sdiv-simplify.ll (+15)
(modified) llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll (+9-9)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 8205b49dfbe2f..b2c1cfcd1148c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -905,8 +905,17 @@ Instruction *InstCombinerImpl::foldAddWithConstant(BinaryOperator &Add) {
 
   // (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the `or` is actually an `add`
   Constant *Op01C;
-  if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C))))
-    return BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+  if (match(Op0, m_DisjointOr(m_Value(X), m_ImmConstant(Op01C)))) {
+    bool HasNSW = Add.hasNoSignedWrap();
+    BinaryOperator *NewAdd =
+        BinaryOperator::CreateAdd(X, ConstantExpr::getAdd(Op01C, Op1C));
+    // Preserve the nsw flag so that there is a chance to make some other
+    // transformations.
+    // For some cases, sdiv can be converted to udiv when the newly created add
+    // carrying the nsw flag is one of its operands.
+    NewAdd->setHasNoSignedWrap(HasNSW);
+    return NewAdd;
+  }
 
   // (X | C2) + C --> (X | C2) ^ C2 iff (C2 == -C)
   const APInt *C2;
diff --git a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
index 729ca03ddfd15..e4dd2d10637d3 100644
--- a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
+++ b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
@@ -125,7 +125,7 @@ define { i32, i1 } @fold_sub_simple(i32 %x) {
 
 define { i32, i1 } @fold_with_distjoin_or(i32 %x) {
 ; CHECK-LABEL: @fold_with_distjoin_or(
-; CHECK-NEXT:    [[B:%.*]] = add i32 [[X:%.*]], 6
+; CHECK-NEXT:    [[B:%.*]] = add nsw i32 [[X:%.*]], 6
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { i32, i1 } { i32 poison, i1 false }, i32 [[B]], 0
 ; CHECK-NEXT:    ret { i32, i1 } [[TMP1]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/sdiv-simplify.ll b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
new file mode 100644
index 0000000000000..91d648e9093a7
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/sdiv-simplify.ll
@@ -0,0 +1,15 @@
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i32 @sdiv_to_udiv(i32 %arg0, i32 %arg1) {
+; CHECK-LABEL: @sdiv_to_udiv(
+; CHECK-NEXT:    [[T0:%.*]] = shl nuw nsw i32 [[ARG0:%.*]], 8
+; CHECK-NEXT:    [[T2:%.*]] = add nuw nsw i32 [[T0:%.*]], 6242049
+; CHECK-NEXT:    [[T3:%.*]] = udiv i32 [[T2]], 192
+; CHECK-NEXT:    ret i32 [[T3]]
+;
+  %t0 = shl nuw nsw i32 %arg0, 8
+  %t1 = or disjoint i32 %t0, 1
+  %t2 = add nuw nsw i32 %t1, 6242048
+  %t3 = sdiv i32 %t2, 192
+  ret i32 %t3
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
index 5cbf50e06fbe8..c4cd2379a7be2 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll
@@ -182,11 +182,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.1:
 ; CHECK-NEXT:    [[INDEX_1:%.*]] = phi i64 [ 0, [[VECTOR_PH_1]] ], [ [[INDEX_NEXT_1:%.*]], [[VECTOR_BODY_1]] ]
 ; CHECK-NEXT:    [[TMP33:%.*]] = add nuw nsw i64 [[INDEX_1]], 15
-; CHECK-NEXT:    [[TMP34:%.*]] = add i64 [[INDEX_1]], 16
+; CHECK-NEXT:    [[TMP34:%.*]] = add nsw i64 [[INDEX_1]], 16
 ; CHECK-NEXT:    [[TMP35:%.*]] = insertelement <2 x i64> poison, i64 [[TMP33]], i64 0
 ; CHECK-NEXT:    [[TMP36:%.*]] = insertelement <2 x i64> [[TMP35]], i64 [[TMP34]], i64 1
-; CHECK-NEXT:    [[TMP37:%.*]] = add i64 [[INDEX_1]], 17
-; CHECK-NEXT:    [[TMP38:%.*]] = add i64 [[INDEX_1]], 18
+; CHECK-NEXT:    [[TMP37:%.*]] = add nsw i64 [[INDEX_1]], 17
+; CHECK-NEXT:    [[TMP38:%.*]] = add nsw i64 [[INDEX_1]], 18
 ; CHECK-NEXT:    [[TMP39:%.*]] = insertelement <2 x i64> poison, i64 [[TMP37]], i64 0
 ; CHECK-NEXT:    [[TMP40:%.*]] = insertelement <2 x i64> [[TMP39]], i64 [[TMP38]], i64 1
 ; CHECK-NEXT:    [[TMP41:%.*]] = icmp ult <2 x i64> [[TMP36]], <i64 225, i64 225>
@@ -259,11 +259,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.2:
 ; CHECK-NEXT:    [[INDEX_2:%.*]] = phi i64 [ 0, [[VECTOR_PH_2]] ], [ [[INDEX_NEXT_2:%.*]], [[VECTOR_BODY_2]] ]
 ; CHECK-NEXT:    [[TMP64:%.*]] = add nuw nsw i64 [[INDEX_2]], 30
-; CHECK-NEXT:    [[TMP65:%.*]] = add i64 [[INDEX_2]], 31
+; CHECK-NEXT:    [[TMP65:%.*]] = add nsw i64 [[INDEX_2]], 31
 ; CHECK-NEXT:    [[TMP66:%.*]] = insertelement <2 x i64> poison, i64 [[TMP64]], i64 0
 ; CHECK-NEXT:    [[TMP67:%.*]] = insertelement <2 x i64> [[TMP66]], i64 [[TMP65]], i64 1
-; CHECK-NEXT:    [[TMP68:%.*]] = add i64 [[INDEX_2]], 32
-; CHECK-NEXT:    [[TMP69:%.*]] = add i64 [[INDEX_2]], 33
+; CHECK-NEXT:    [[TMP68:%.*]] = add nsw i64 [[INDEX_2]], 32
+; CHECK-NEXT:    [[TMP69:%.*]] = add nsw i64 [[INDEX_2]], 33
 ; CHECK-NEXT:    [[TMP70:%.*]] = insertelement <2 x i64> poison, i64 [[TMP68]], i64 0
 ; CHECK-NEXT:    [[TMP71:%.*]] = insertelement <2 x i64> [[TMP70]], i64 [[TMP69]], i64 1
 ; CHECK-NEXT:    [[TMP72:%.*]] = icmp ult <2 x i64> [[TMP67]], <i64 225, i64 225>
@@ -336,11 +336,11 @@ define void @matrix_extract_insert_loop(i32 %i, ptr nonnull align 8 dereferencea
 ; CHECK:       vector.body.3:
 ; CHECK-NEXT:    [[INDEX_3:%.*]] = phi i64 [ 0, [[VECTOR_PH_3]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY_3]] ]
 ; CHECK-NEXT:    [[TMP95:%.*]] = add nuw nsw i64 [[INDEX_3]], 45
-; CHECK-NEXT:    [[TMP96:%.*]] = add i64 [[INDEX_3]], 46
+; CHECK-NEXT:    [[TMP96:%.*]] = add nsw i64 [[INDEX_3]], 46
 ; CHECK-NEXT:    [[TMP97:%.*]] = insertelement <2 x i64> poison, i64 [[TMP95]], i64 0
 ; CHECK-NEXT:    [[TMP98:%.*]] = insertelement <2 x i64> [[TMP97]], i64 [[TMP96]], i64 1
-; CHECK-NEXT:    [[TMP99:%.*]] = add i64 [[INDEX_3]], 47
-; CHECK-NEXT:    [[TMP100:%.*]] = add i64 [[INDEX_3]], 48
+; CHECK-NEXT:    [[TMP99:%.*]] = add nsw i64 [[INDEX_3]], 47
+; CHECK-NEXT:    [[TMP100:%.*]] = add nsw i64 [[INDEX_3]], 48
 ; CHECK-NEXT:    [[TMP101:%.*]] = insertelement <2 x i64> poison, i64 [[TMP99]], i64 0
 ; CHECK-NEXT:    [[TMP102:%.*]] = insertelement <2 x i64> [[TMP101]], i64 [[TMP100]], i64 1
 ; CHECK-NEXT:    [[TMP103:%.*]] = icmp ult <2 x i64> [[TMP98]], <i64 225, i64 225>

nikic

Please always provide alive2 proofs for InstCombine changes, see https://llvm.org/docs/InstCombineContributorGuide.html#proofs.

The transform as implemented is incorrect: https://alive2.llvm.org/ce/z/pEk7wC

nikic · 2024-06-06T07:45:17Z

This would be a correct variant of the transform: https://alive2.llvm.org/ce/z/nrdCZT

csstormq · 2024-06-06T08:42:53Z

Please always provide alive2 proofs for InstCombine changes, see https://llvm.org/docs/InstCombineContributorGuide.html#proofs.

Alive2 Proof: https://alive2.llvm.org/ce/z/CRKtkU

nikic

Your proof is not for the transform you actually implement. The transform is still incorrect in general, you merely picked out one example where it is correct. I have already shared what you need to do to make it correct in my second comment. The necessary pre-condition can be implemented using the willNotOverflowSignedAdd helper.

…verflow

csstormq · 2024-06-07T07:47:51Z

Your proof is not for the transform you actually implement. The transform is still incorrect in general, you merely picked out one example where it is correct. I have already shared what you need to do to make it correct in my second comment. The necessary pre-condition can be implemented using the willNotOverflowSignedAdd helper.

I guess I have fixed up, and add more test cases. Thanks a lot for your help. Please review again.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/test/Transforms/InstCombine/add.ll

1. Preserve the nuw flag also 2. Remove unnecessary comment 3. Move test cases to the better place

csstormq · 2024-06-07T09:57:33Z

Update the title and description to reflect the intent of the latest code.

nikic

LGTM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/test/Transforms/InstCombine/add.ll

1. Save the overflow check if not needed 2. Add preserve nuw test case

llvm/test/Transforms/InstCombine/add.ll

csstormq · 2024-06-07T10:53:40Z

@nikic Thank you very much for your review. I have learned a lot from your comments.

@nikic

… + (Op01C + Op1C) (llvm#94586) This patch simplifies `sdiv` to `udiv` by preserving the `nsw` flag for `(X | Op01C) + Op1C --> X + (Op01C + Op1C)` if the sum of `Op01C` and `Op1C` will not overflow, and preserves the `nuw` flag unconditionally. Alive2 Proofs (provided by @nikic): https://alive2.llvm.org/ce/z/nrdCZT, https://alive2.llvm.org/ce/z/YnJHnH Signed-off-by: Hafidz Muzakky <[email protected]>

[InstCombine] Preserve the nsw flag for (X | Op01C) + Op1C --> X + (O…

170cfa2

…p01C + Op1C)

csstormq requested a review from nikic as a code owner June 6, 2024 07:30

llvmbot added the llvm:transforms label Jun 6, 2024

nikic requested changes Jun 6, 2024

View reviewed changes

csstormq requested a review from nikic June 6, 2024 08:43

update test case using the update_test_checks.py script

8b6eca1

nikic requested changes Jun 6, 2024

View reviewed changes

csstormq added 2 commits June 7, 2024 11:47

fixup! Preserve the nsw flag iff the sum of Op01C and Op1C will not o…

3cf7a70

…verflow

Add more test cases

40037ff

csstormq requested a review from nikic June 7, 2024 07:48

nikic reviewed Jun 7, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/add.ll Outdated Show resolved Hide resolved

csstormq added 2 commits June 7, 2024 17:33

Resolve conversations

d2f4bee

1. Preserve the nuw flag also 2. Remove unnecessary comment 3. Move test cases to the better place

Remove added new lines

8997ab0

csstormq changed the title ~~[InstCombine] Preserve the nsw flag for (X | Op01C) + Op1C --> X + (Op01C + Op1C)~~ [InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) Jun 7, 2024

nikic approved these changes Jun 7, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/add.ll Show resolved Hide resolved

Resolve conversations

045f3ba

1. Save the overflow check if not needed 2. Add preserve nuw test case

nikic reviewed Jun 7, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/add.ll Outdated Show resolved Hide resolved

Fix test to not always return poison

1050bde

csstormq merged commit 96af114 into llvm:main Jun 8, 2024
7 checks passed

csstormq deleted the instcmb branch June 8, 2024 01:20

HerrCai0907 mentioned this pull request Jun 13, 2024

tidy #95384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

csstormq commented Jun 6, 2024 •

edited

Loading

llvmbot commented Jun 6, 2024

nikic left a comment •

edited

Loading

nikic commented Jun 6, 2024

csstormq commented Jun 6, 2024 •

edited

Loading

nikic left a comment

csstormq commented Jun 7, 2024

csstormq commented Jun 7, 2024

nikic left a comment

csstormq commented Jun 7, 2024

[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

[InstCombine] Preserve the nsw/nuw flags for (X | Op01C) + Op1C --> X + (Op01C + Op1C) #94586

Conversation

csstormq commented Jun 6, 2024 • edited Loading

llvmbot commented Jun 6, 2024

nikic left a comment • edited Loading

Choose a reason for hiding this comment

nikic commented Jun 6, 2024

csstormq commented Jun 6, 2024 • edited Loading

nikic left a comment

Choose a reason for hiding this comment

csstormq commented Jun 7, 2024

csstormq commented Jun 7, 2024

nikic left a comment

Choose a reason for hiding this comment

csstormq commented Jun 7, 2024

csstormq commented Jun 6, 2024 •

edited

Loading

nikic left a comment •

edited

Loading

csstormq commented Jun 6, 2024 •

edited

Loading