-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[InstCombine] Infer nuw for gep inbounds from base of object #119225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-amdgpu Author: Nikita Popov (nikic) ChangesWhen we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138). Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff 29 Files Affected:
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
// NO-SANITIZE-WITH-ATTR-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
// NO-SANITIZE-WITH-ATTR-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITH-ATTR-NEXT: store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
// NO-SANITIZE-WITHOUT-ATTR-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
// CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
// CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
// CHECK-NEXT: store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
// CHECK-NEXT: [[TMP3:%.*]] = lshr i32 [[MUL]], 16
// CHECK-NEXT: store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
// CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
// CHECK-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
// CHECK-NEXT: [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
}
}
+ // The single (non-zero) index of an inbounds GEP of a base object cannot
+ // be negative.
+ auto HasOneNonZeroIndex = [&]() {
+ bool FoundNonZero = false;
+ for (Value *Idx : GEP.indices()) {
+ auto *C = dyn_cast<Constant>(Idx);
+ if (C && C->isNullValue())
+ continue;
+ if (FoundNonZero)
+ return false;
+ FoundNonZero = true;
+ }
+ return true;
+ };
+ if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+ HasOneNonZeroIndex()) {
+ GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+ return &GEP;
+ }
+
// nusw + nneg -> nuw
if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
; CHECK-NEXT: ret i64 [[LOAD]]
;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
; CHECK-NEXT: store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
; CHECK-NEXT: ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
; CHECK-NEXT: store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
; CHECK-NEXT: ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
; CHECK-NEXT: [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT: [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT: [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT: [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT: [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
; CHECK-NEXT: [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
; CHECK-NEXT: br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
; CHECK: .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
define i1 @pr93017(i64 %idx) {
; CHECK-LABEL: @pr93017(
; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
; CHECK-NEXT: [[V:%.*]] = load ptr, ptr [[GEP]], align 4
; CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[V]], null
; CHECK-NEXT: ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
define void @test_load(ptr addrspace(1) %out, i64 %x) {
; CHECK-LABEL: @test_load(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
; CHECK-LABEL: @test_load_bitcast_chain(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
; CHECK-NEXT: ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT: [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
; CHECK-NEXT: store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
; CHECK-NEXT: ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
; CHECK-NEXT: [[A:%.*]] = alloca [4 x float], align 4
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
; CHECK-NEXT: call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT: [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT: [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
; CHECK-NEXT: [[R:%.*]] = load float, ptr [[G]], align 4
; CHECK-NEXT: ret float [[R]]
;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
define ptr @test_simplify2() {
; CHECK-LABEL: @test_simplify2(
; CHECK-NEXT: [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
; CHECK-NEXT: ret ptr [[RET]]
;
%ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
define ptr @test_simplify6() {
; CHECK-LABEL: @test_simplify6(
; CHECK-NEXT: [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
; CHECK-NEXT: ret ptr [[RET]]
;
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
define i32 @test_no_simplify2(i32 %x) {
; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
; CHECK-NEXT: ret i32 [[HELLO_L]]
;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
; CHECK-NEXT: ret i32 [[HELLO_L]]
;
%hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT: [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT: [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
; CHECK-NEXT: [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) ChangesWhen we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138). Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff 29 Files Affected:
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
// NO-SANITIZE-WITH-ATTR-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
// NO-SANITIZE-WITH-ATTR-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITH-ATTR-NEXT: store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITH-ATTR-NEXT: [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
// NO-SANITIZE-WITHOUT-ATTR-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
// NO-SANITIZE-WITHOUT-ATTR-NEXT: [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
// CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
// CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
// CHECK-NEXT: store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
// CHECK-NEXT: [[TMP3:%.*]] = lshr i32 [[MUL]], 16
// CHECK-NEXT: store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
// CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT: [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
// CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
// CHECK-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
// CHECK-NEXT: [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
}
}
+ // The single (non-zero) index of an inbounds GEP of a base object cannot
+ // be negative.
+ auto HasOneNonZeroIndex = [&]() {
+ bool FoundNonZero = false;
+ for (Value *Idx : GEP.indices()) {
+ auto *C = dyn_cast<Constant>(Idx);
+ if (C && C->isNullValue())
+ continue;
+ if (FoundNonZero)
+ return false;
+ FoundNonZero = true;
+ }
+ return true;
+ };
+ if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+ HasOneNonZeroIndex()) {
+ GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+ return &GEP;
+ }
+
// nusw + nneg -> nuw
if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
; CHECK-NEXT: ret i64 [[LOAD]]
;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
; CHECK-NEXT: store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
; CHECK-NEXT: ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
; CHECK-NEXT: [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
; CHECK-NEXT: call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
; CHECK-NEXT: [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
; CHECK-NEXT: store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
; CHECK-NEXT: ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
; CHECK-NEXT: [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT: [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT: [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT: [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT: [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
; CHECK-NEXT: [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
; CHECK-NEXT: br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
; CHECK: .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
define i1 @pr93017(i64 %idx) {
; CHECK-LABEL: @pr93017(
; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
; CHECK-NEXT: [[V:%.*]] = load ptr, ptr [[GEP]], align 4
; CHECK-NEXT: [[CMP:%.*]] = icmp ne ptr [[V]], null
; CHECK-NEXT: ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
define void @test_load(ptr addrspace(1) %out, i64 %x) {
; CHECK-LABEL: @test_load(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
; CHECK-LABEL: @test_load_bitcast_chain(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
; CHECK-NEXT: ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
; CHECK-NEXT: entry:
; CHECK-NEXT: [[DATA:%.*]] = alloca [8 x i32], align 4
; CHECK-NEXT: call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT: [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
; CHECK-NEXT: store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
; CHECK-NEXT: ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
; CHECK-NEXT: [[A:%.*]] = alloca [4 x float], align 4
; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
; CHECK-NEXT: call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT: [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT: [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
; CHECK-NEXT: [[R:%.*]] = load float, ptr [[G]], align 4
; CHECK-NEXT: ret float [[R]]
;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
define ptr @test_simplify2() {
; CHECK-LABEL: @test_simplify2(
; CHECK-NEXT: [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
; CHECK-NEXT: ret ptr [[RET]]
;
%ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
define ptr @test_simplify6() {
; CHECK-LABEL: @test_simplify6(
; CHECK-NEXT: [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT: [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
; CHECK-NEXT: ret ptr [[RET]]
;
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
define i32 @test_no_simplify2(i32 %x) {
; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
; CHECK-NEXT: ret i32 [[HELLO_L]]
;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT: [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT: [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
; CHECK-NEXT: ret i32 [[HELLO_L]]
;
%hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT: [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT: [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
; CHECK-NEXT: [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/8300 Here is the relevant piece of the build log for the reference
|
Seems that this change causes Segment Fault on multiple targets including aarch64, loongarch64 and riscv64. This is detected by a LoongArch buildbot and manually checked on aarch64 and riscv64 QEMUs. I wonder why aarch64 buildbot doesn't report this. I find that UPDATE: This buildbot(aarch64) also fail which uses Reproduce$ clang llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c -O3 -o huffbench |
Hey @nikic, The reproducer is unfortunately not pleasant: |
Reproduced on Loongson-3A5000. I will provide a reduced test case later. |
From a quick look, it's probably due to UB on this line: https://github.com/llvm/llvm-test-suite/blob/4f14adb8a2f2c5fa02c6b3643d45e74adfcb31f9/SingleSource/Benchmarks/CoyoteBench/huffbench.c#L114 It indexes below the start of an object. |
|
I think llvm/llvm-test-suite#190 should fix this issue, but haven't tested on an affected arch. I'm a bit worried that we don't have a sanitizer to catch this issue (the negative left shift issue is an unrelated one). |
Tested on an aarch64 machine where I've been able to reproduce. Works like a charm, thanks @nikic |
|
Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation. |
Thanks for the heads up! As I mentioned on another thread, I'm happy to just entirely revert this patch due to lack of good sanitizer support (I can do it tomorrow, but feel free to do it yourself earlier). Having a reproducer for the msan issue is probably still valuable, as it may indicate a larger problem. |
Thanks! Sent a PR with a revert: #120460 As mentioned, a test case is in the works. |
…#120460) Reverts #119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: #119225 (comment) #118472 (comment)
I was working on this miscompile. I cannot give a reproducer right now, but the following observations: The generated IR is clearly incorrect (the
This is some interaction between this CL and these two InstCombines:
If I disable any of the three, the error goes away. The observation is that with the two above, we get an The comment in the first link implies that it maybe should be removing
but it then goes and only removes the wrap flags:
If I change this to
the error also goes away. If I disable the second link (i8 canonicalization) the access is then I don't feel like this is a sanitizer issue, it seems like MSan might just have gotten unlucky. Also, I think this CL is correct in isolation, but is worsening some existing problem. |
@fmayer The It's pretty likely that the root cause here is indeed incorrect inbounds preservation somewhere, but I think the logic in that transform is correct. |
Seems like the root cause was an vectorizer bug, sent #120730 that fixes it. |
… of object" (#120460) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: llvm/llvm-project#119225 (comment) llvm/llvm-project#118472 (comment)
This is landed, so from that side we could reland this. |
When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.
The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.
Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).