[InstCombine] Infer nuw for gep inbounds from base of object #119225

nikic · 2024-12-09T15:55:19Z

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well. The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices. Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).

llvmbot · 2024-12-09T15:55:57Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).

Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff

29 Files Affected:

(modified) clang/test/CodeGen/attr-counted-by.c (+2-2)
(modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+20)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/stpcpy-1.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/strncat-2.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+19-19)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
(modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+87-87)
(modified) llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll (+8-8)

diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
 // CHECK-NEXT:    store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
 // CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
 // CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
 // CHECK-NEXT:    store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[TMP3:%.*]] = lshr i32 [[MUL]], 16
 // CHECK-NEXT:    store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
 // CHECK-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
 // CHECK-NEXT:    [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
     }
   }
 
+  // The single (non-zero) index of an inbounds GEP of a base object cannot
+  // be negative.
+  auto HasOneNonZeroIndex = [&]() {
+    bool FoundNonZero = false;
+    for (Value *Idx : GEP.indices()) {
+      auto *C = dyn_cast<Constant>(Idx);
+      if (C && C->isNullValue())
+        continue;
+      if (FoundNonZero)
+        return false;
+      FoundNonZero = true;
+    }
+    return true;
+  };
+  if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+      HasOneNonZeroIndex()) {
+    GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+    return &GEP;
+  }
+
   // nusw + nneg -> nuw
   if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
       all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
 ; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
 ; CHECK-NEXT:    ret i64 [[LOAD]]
 ;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
 ; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
 ; CHECK-NEXT:    [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
 ; CHECK-NEXT:    [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
 ; CHECK:       .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
 define i1 @pr93017(i64 %idx) {
 ; CHECK-LABEL: @pr93017(
 ; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
 ; CHECK-NEXT:    [[V:%.*]] = load ptr, ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[V]], null
 ; CHECK-NEXT:    ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
 define void @test_load(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
 define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load_bitcast_chain(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
 ; CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
 ; CHECK-NEXT:    store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [4 x float], align 4
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[R:%.*]] = load float, ptr [[G]], align 4
 ; CHECK-NEXT:    ret float [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
 define ptr @test_simplify2() {
 ; CHECK-LABEL: @test_simplify2(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
   %ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
 define ptr @test_simplify6() {
 ; CHECK-LABEL: @test_simplify6(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
 
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
 
 define i32 @test_no_simplify2(i32 %x) {
 ; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
 ; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
 
 define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
 ; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
   %hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
 
 define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
 ; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
 ; CHECK-NEXT:    [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]

llvmbot · 2024-12-09T15:55:58Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).

Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff

29 Files Affected:

(modified) clang/test/CodeGen/attr-counted-by.c (+2-2)
(modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+20)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/stpcpy-1.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/strncat-2.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+19-19)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
(modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+87-87)
(modified) llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll (+8-8)

diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
 // CHECK-NEXT:    store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
 // CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
 // CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
 // CHECK-NEXT:    store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[TMP3:%.*]] = lshr i32 [[MUL]], 16
 // CHECK-NEXT:    store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
 // CHECK-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
 // CHECK-NEXT:    [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
     }
   }
 
+  // The single (non-zero) index of an inbounds GEP of a base object cannot
+  // be negative.
+  auto HasOneNonZeroIndex = [&]() {
+    bool FoundNonZero = false;
+    for (Value *Idx : GEP.indices()) {
+      auto *C = dyn_cast<Constant>(Idx);
+      if (C && C->isNullValue())
+        continue;
+      if (FoundNonZero)
+        return false;
+      FoundNonZero = true;
+    }
+    return true;
+  };
+  if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+      HasOneNonZeroIndex()) {
+    GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+    return &GEP;
+  }
+
   // nusw + nneg -> nuw
   if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
       all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
 ; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
 ; CHECK-NEXT:    ret i64 [[LOAD]]
 ;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
 ; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
 ; CHECK-NEXT:    [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
 ; CHECK-NEXT:    [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
 ; CHECK:       .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
 define i1 @pr93017(i64 %idx) {
 ; CHECK-LABEL: @pr93017(
 ; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
 ; CHECK-NEXT:    [[V:%.*]] = load ptr, ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[V]], null
 ; CHECK-NEXT:    ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
 define void @test_load(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
 define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load_bitcast_chain(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
 ; CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
 ; CHECK-NEXT:    store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [4 x float], align 4
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[R:%.*]] = load float, ptr [[G]], align 4
 ; CHECK-NEXT:    ret float [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
 define ptr @test_simplify2() {
 ; CHECK-LABEL: @test_simplify2(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
   %ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
 define ptr @test_simplify6() {
 ; CHECK-LABEL: @test_simplify6(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
 
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
 
 define i32 @test_no_simplify2(i32 %x) {
 ; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
 ; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
 
 define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
 ; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
   %hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
 
 define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
 ; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
 ; CHECK-NEXT:    [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]

dtcxzyw

LGTM.

llvm-ci · 2024-12-10T09:23:21Z

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building clang,llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/8300

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: functionalities/dyld-exec-linux/TestDyldExecLinux.py (443 of 2846)
UNSUPPORTED: lldb-api :: functionalities/fat_archives/TestFatArchives.py (444 of 2846)
PASS: lldb-api :: functionalities/executable_first/TestExecutableFirst.py (445 of 2846)
PASS: lldb-api :: functionalities/find-line-entry/TestFindLineEntry.py (446 of 2846)
UNSUPPORTED: lldb-api :: functionalities/fork/concurrent_vfork/TestConcurrentVFork.py (447 of 2846)
PASS: lldb-api :: functionalities/exec/TestExec.py (448 of 2846)
PASS: lldb-api :: functionalities/float-display/TestFloatDisplay.py (449 of 2846)
PASS: lldb-api :: functionalities/gdb_remote_client/TestAArch64XMLRegOffsets.py (450 of 2846)
PASS: lldb-api :: functionalities/dynamic_value_child_count/TestDynamicValueChildCount.py (451 of 2846)
PASS: lldb-api :: functionalities/gdb_remote_client/TestArmRegisterDefinition.py (452 of 2846)
FAIL: lldb-api :: functionalities/fork/resumes-child/TestForkResumesChild.py (453 of 2846)
******************** TEST 'lldb-api :: functionalities/fork/resumes-child/TestForkResumesChild.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/make --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/functionalities/fork/resumes-child -p TestForkResumesChild.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision e21ab4d16b555c28ded307571d138f594f33e325)
  clang revision e21ab4d16b555c28ded307571d138f594f33e325
  llvm revision e21ab4d16b555c28ded307571d138f594f33e325
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_step_over_fork (TestForkResumesChild.TestForkResumesChild)
======================================================================
FAIL: test_step_over_fork (TestForkResumesChild.TestForkResumesChild)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/functionalities/fork/resumes-child/TestForkResumesChild.py", line 22, in test_step_over_fork
    self.expect("continue", substrs=["exited with status = 0"])
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2371, in expect
    self.runCmd(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 996, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'continue' did not return successfully
Error output:
error: Process must be launched.

Config=arm-/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 0.805s

FAILED (failures=1)

SixWeining · 2024-12-12T12:13:01Z

Seems that this change causes Segment Fault on multiple targets including aarch64, loongarch64 and riscv64.

This is detected by a LoongArch buildbot and manually checked on aarch64 and riscv64 QEMUs.

I wonder why aarch64 buildbot doesn't report this. I find that CMAKE_CXX_FLAGS_RELEASE is empty on aarch64 bot while it is -O3 -DNDEBUG on loongarch64. Is it an issue of the llvm-zorg framework?

UPDATE: This buildbot(aarch64) also fail which uses -O3.

Reproduce

$ clang llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c -O3 -o huffbench
$ ./huffbench

serge-sans-paille · 2024-12-12T12:55:43Z

Hey @nikic,
I've bisected the failures from this buildbot:
https://lab.llvm.org/buildbot/#/builders/199/builds/57
down to this commit.

The reproducer is unfortunately not pleasant:
get a stage-2 build of clang then run the llvm-test suite. This reproducibility chokes on SingleSource/Benchmarks/CoyoteBench/huffbench.test, as can be seen on the buildbot log.

dtcxzyw · 2024-12-12T12:58:41Z

Seems that this change causes Segment Fault on multiple targets including aarch64, loongarch64 and riscv64.

This is detected by a LoongArch buildbot and manually checked on aarch64 and riscv64 QEMUs.

I wonder why aarch64 buildbot doesn't report this. I find that CMAKE_CXX_FLAGS_RELEASE is empty on aarch64 bot while it is -O3 -DNDEBUG on loongarch64. Is it an issue of the llvm-zorg framework?

UPDATE: This buildbot(aarch64) also fail which uses -O3.

Reproduce

$ clang llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c -O3 -o huffbench $ ./huffbench

Reproduced on Loongson-3A5000. I will provide a reduced test case later.

nikic · 2024-12-12T13:12:12Z

From a quick look, it's probably due to UB on this line: https://github.com/llvm/llvm-test-suite/blob/4f14adb8a2f2c5fa02c6b3643d45e74adfcb31f9/SingleSource/Benchmarks/CoyoteBench/huffbench.c#L114 It indexes below the start of an object.

dtcxzyw · 2024-12-12T13:34:35Z

huffbench.c:319:10: runtime error: left shift of negative value -93
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior huffbench.c:319:10

nikic · 2024-12-12T16:17:23Z

I think llvm/llvm-test-suite#190 should fix this issue, but haven't tested on an affected arch.

I'm a bit worried that we don't have a sanitizer to catch this issue (the negative left shift issue is an unrelated one).

serge-sans-paille · 2024-12-13T18:49:22Z

Tested on an aarch64 machine where I've been able to reproduce. Works like a charm, thanks @nikic

vitalybuka · 2024-12-18T00:45:29Z

I think llvm/llvm-test-suite#190 should fix this issue, but haven't tested on an affected arch.

I'm a bit worried that we don't have a sanitizer to catch this issue (the negative left shift issue is an unrelated one).

-fsanitize=local-bounds is quite related, but I guess InstCombine happens before the check inserted

alexfh · 2024-12-18T17:33:01Z

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

nikic · 2024-12-18T17:39:23Z

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

Thanks for the heads up! As I mentioned on another thread, I'm happy to just entirely revert this patch due to lack of good sanitizer support (I can do it tomorrow, but feel free to do it yourself earlier). Having a reproducer for the msan issue is probably still valuable, as it may indicate a larger problem.

…119225)" This reverts commit e21ab4d.

alexfh · 2024-12-18T17:59:32Z

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

Thanks for the heads up! As I mentioned on another thread, I'm happy to just entirely revert this patch due to lack of good sanitizer support (I can do it tomorrow, but feel free to do it yourself earlier). Having a reproducer for the msan issue is probably still valuable, as it may indicate a larger problem.

Thanks! Sent a PR with a revert: #120460

As mentioned, a test case is in the works.

…#120460) Reverts #119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: #119225 (comment) #118472 (comment)

fmayer · 2024-12-19T15:07:19Z

I was working on this miscompile. I cannot give a reproducer right now, but the following observations:

The generated IR is clearly incorrect (the inbounds nuw), of the form:

%x = alloca [12 x i32], align 16
[...]
%59 = getelementptr inbounds nuw i8, ptr %x, i64 -4
%wide.masked.load.2 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr nonnull %59, i32 4, <4 x i1> <i1 false, i1 true, i1 true, i1 true>, <4 x i32> poison)

This is some interaction between this CL and these two InstCombines:

llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Line 2495 in 5a3f1ac

return replaceInstUsesWith(
llvm-project/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Line 2949 in 5a3f1ac

if (!GEPEltType->isIntegerTy(8) && GEP.hasAllConstantIndices()) {

If I disable any of the three, the error goes away. The observation is that with the two above, we get an inbounds getelementptr that is NOT inbounds (-4), that then gets masked vector read to correct for that.

The comment in the first link implies that it maybe should be removing inbounds:

      // Even if the total offset is inbounds, we may end up representing it
      // by first performing a larger negative offset, and then a smaller
      // positive one. The large negative offset might go out of bounds. Only
      // preserve inbounds if all signs are the same.

but it then goes and only removes the wrap flags:

      if (Idx.isNonNegative() != ConstIndices[0].isNonNegative())
        NW = NW.withoutNoUnsignedSignedWrap();
      if (!Idx.isNonNegative())
        NW = NW.withoutNoUnsignedWrap();

If I change this to

      if (Idx.isNonNegative() != ConstIndices[0].isNonNegative())
        NW = NW.withoutNoUnsignedSignedWrap().withoutInBounds();
      if (!Idx.isNonNegative())
        NW = NW.withoutNoUnsignedWrap().withoutInBounds();

the error also goes away.

If I disable the second link (i8 canonicalization) the access is then -1 (makes sense) but NOT marked as inbounds.

I don't feel like this is a sanitizer issue, it seems like MSan might just have gotten unlucky. Also, I think this CL is correct in isolation, but is worsening some existing problem.

nikic · 2024-12-19T15:23:13Z

@fmayer The withoutNoUnsignedSignedWrap() API already removes the inbounds flag as well (because inbounds requires nusw). So I think the effect of your change is to drop inbounds in case all indices are negative, which should generally not be necessary.

It's pretty likely that the root cause here is indeed incorrect inbounds preservation somewhere, but I think the logic in that transform is correct.

fmayer · 2024-12-20T13:30:53Z

Seems like the root cause was an vectorizer bug, sent #120730 that fixes it.

… of object" (#120460) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: llvm/llvm-project#119225 (comment) llvm/llvm-project#118472 (comment)

fmayer · 2025-01-16T17:25:30Z

Seems like the root cause was an vectorizer bug, sent #120730 that fixes it.

This is landed, so from that side we could reland this.

nikic requested review from dtcxzyw and goldsteinn December 9, 2024 15:55

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:instcombine llvm:transforms labels Dec 9, 2024

dtcxzyw approved these changes Dec 9, 2024

View reviewed changes

nikic merged commit e21ab4d into llvm:main Dec 10, 2024
13 checks passed

nikic deleted the instcombine-base-gep-nuw branch December 10, 2024 09:00

alexfh added a commit that referenced this pull request Dec 18, 2024

Revert "[InstCombine] Infer nuw for gep inbounds from base of object (#…

18c4109

…119225)" This reverts commit e21ab4d.

alexfh mentioned this pull request Dec 18, 2024

Revert "[InstCombine] Infer nuw for gep inbounds from base of object" #120460

Merged

[InstCombine] Infer nuw for gep inbounds from base of object #119225

[InstCombine] Infer nuw for gep inbounds from base of object #119225

Uh oh!

Conversation

nikic commented Dec 9, 2024

Uh oh!

llvmbot commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 9, 2024

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Dec 10, 2024

Uh oh!

SixWeining commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproduce

Uh oh!

serge-sans-paille commented Dec 12, 2024

Uh oh!

dtcxzyw commented Dec 12, 2024

Reproduce

Uh oh!

nikic commented Dec 12, 2024

Uh oh!

dtcxzyw commented Dec 12, 2024

Uh oh!

nikic commented Dec 12, 2024

Uh oh!

serge-sans-paille commented Dec 13, 2024

Uh oh!

vitalybuka commented Dec 18, 2024

Uh oh!

alexfh commented Dec 18, 2024

Uh oh!

nikic commented Dec 18, 2024

Uh oh!

alexfh commented Dec 18, 2024

Uh oh!

fmayer commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmayer commented Dec 20, 2024

Uh oh!

fmayer commented Jan 16, 2025

Uh oh!

Uh oh!

llvmbot commented Dec 9, 2024 •

edited

Loading

SixWeining commented Dec 12, 2024 •

edited

Loading

fmayer commented Dec 19, 2024 •

edited

Loading

nikic commented Dec 19, 2024 •

edited

Loading