Skip to content

[InstCombine] Infer nuw for gep inbounds from base of object #119225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Dec 9, 2024

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).

When we have a gep inbounds from the base of an object (e.g.
alloca or global), we know that the index cannot be negative,
as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could
also accept one unknown index followed by known-non-negative
indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2
currently incorrectly doesn't require the inbounds for the alloca
case, see AliveToolkit/alive2#1138).
@nikic nikic requested review from dtcxzyw and goldsteinn December 9, 2024 15:55
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:instcombine llvm:transforms labels Dec 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).


Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff

29 Files Affected:

  • (modified) clang/test/CodeGen/attr-counted-by.c (+2-2)
  • (modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+20)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/stpcpy-1.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/strncat-2.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+87-87)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll (+8-8)
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
 // CHECK-NEXT:    store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
 // CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
 // CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
 // CHECK-NEXT:    store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[TMP3:%.*]] = lshr i32 [[MUL]], 16
 // CHECK-NEXT:    store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
 // CHECK-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
 // CHECK-NEXT:    [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
     }
   }
 
+  // The single (non-zero) index of an inbounds GEP of a base object cannot
+  // be negative.
+  auto HasOneNonZeroIndex = [&]() {
+    bool FoundNonZero = false;
+    for (Value *Idx : GEP.indices()) {
+      auto *C = dyn_cast<Constant>(Idx);
+      if (C && C->isNullValue())
+        continue;
+      if (FoundNonZero)
+        return false;
+      FoundNonZero = true;
+    }
+    return true;
+  };
+  if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+      HasOneNonZeroIndex()) {
+    GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+    return &GEP;
+  }
+
   // nusw + nneg -> nuw
   if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
       all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
 ; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
 ; CHECK-NEXT:    ret i64 [[LOAD]]
 ;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
 ; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
 ; CHECK-NEXT:    [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
 ; CHECK-NEXT:    [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
 ; CHECK:       .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
 define i1 @pr93017(i64 %idx) {
 ; CHECK-LABEL: @pr93017(
 ; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
 ; CHECK-NEXT:    [[V:%.*]] = load ptr, ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[V]], null
 ; CHECK-NEXT:    ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
 define void @test_load(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
 define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load_bitcast_chain(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
 ; CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
 ; CHECK-NEXT:    store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [4 x float], align 4
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[R:%.*]] = load float, ptr [[G]], align 4
 ; CHECK-NEXT:    ret float [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
 define ptr @test_simplify2() {
 ; CHECK-LABEL: @test_simplify2(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
   %ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
 define ptr @test_simplify6() {
 ; CHECK-LABEL: @test_simplify6(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
 
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
 
 define i32 @test_no_simplify2(i32 %x) {
 ; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
 ; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
 
 define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
 ; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
   %hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
 
 define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
 ; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
 ; CHECK-NEXT:    [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

When we have a gep inbounds from the base of an object (e.g. alloca or global), we know that the index cannot be negative, as this would go out of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently incorrectly doesn't require the inbounds for the alloca case, see AliveToolkit/alive2#1138).


Patch is 90.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119225.diff

29 Files Affected:

  • (modified) clang/test/CodeGen/attr-counted-by.c (+2-2)
  • (modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+20)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/stpcpy-1.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/strncat-2.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+87-87)
  • (modified) llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll (+8-8)
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 6b3cad5708835b..be4c7f07e92150 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -1043,7 +1043,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR11:[0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
@@ -1085,7 +1085,7 @@ int test12_a, test12_b;
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.lifetime.start.p0(i64 24, ptr nonnull [[BAZ]]) #[[ATTR9:[0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 4 dereferenceable(24) [[BAZ]], ptr noundef nonnull align 4 dereferenceable(24) @test12_bar, i64 24, i1 false), !tbaa.struct [[TBAA_STRUCT7:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [6 x i32], ptr [[BAZ]], i64 0, i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[TMP0]], ptr @test12_b, align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP1:%.*]] = load i32, ptr getelementptr inbounds nuw (i8, ptr @test12_foo, i64 4), align 4, !tbaa [[TBAA2]]
diff --git a/clang/test/CodeGen/union-tbaa1.c b/clang/test/CodeGen/union-tbaa1.c
index 0f7a67cb7eccd1..7d44c9a3fbe6bb 100644
--- a/clang/test/CodeGen/union-tbaa1.c
+++ b/clang/test/CodeGen/union-tbaa1.c
@@ -16,17 +16,17 @@ void bar(vect32 p[][2]);
 // CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]]
 // CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL:%.*]] = mul i32 [[TMP1]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
+// CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]]
 // CHECK-NEXT:    store i32 [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA6:![0-9]+]]
 // CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [2 x i32], ptr [[ARR]], i32 [[TMP0]], i32 1
 // CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[MUL6:%.*]] = mul i32 [[TMP2]], [[NUM]]
-// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX8:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP0]], i32 1
 // CHECK-NEXT:    store i32 [[MUL6]], ptr [[ARRAYIDX8]], align 4, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[TMP3:%.*]] = lshr i32 [[MUL]], 16
 // CHECK-NEXT:    store i32 [[TMP3]], ptr [[VEC]], align 4, !tbaa [[TBAA2]]
 // CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[INDEX]], align 4, !tbaa [[TBAA2]]
-// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
+// CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds nuw [4 x [2 x %union.vect32]], ptr [[TMP]], i32 0, i32 [[TMP4]], i32 1
 // CHECK-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds nuw i8, ptr [[ARRAYIDX14]], i32 2
 // CHECK-NEXT:    [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX15]], align 2, !tbaa [[TBAA6]]
 // CHECK-NEXT:    [[CONV16:%.*]] = zext i16 [[TMP5]] to i32
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index ac9a4fdcf304d4..8f55e5b3cc28a2 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -3119,6 +3119,26 @@ Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {
     }
   }
 
+  // The single (non-zero) index of an inbounds GEP of a base object cannot
+  // be negative.
+  auto HasOneNonZeroIndex = [&]() {
+    bool FoundNonZero = false;
+    for (Value *Idx : GEP.indices()) {
+      auto *C = dyn_cast<Constant>(Idx);
+      if (C && C->isNullValue())
+        continue;
+      if (FoundNonZero)
+        return false;
+      FoundNonZero = true;
+    }
+    return true;
+  };
+  if (GEP.isInBounds() && !GEP.hasNoUnsignedWrap() && isBaseOfObject(PtrOp) &&
+      HasOneNonZeroIndex()) {
+    GEP.setNoWrapFlags(GEP.getNoWrapFlags() | GEPNoWrapFlags::noUnsignedWrap());
+    return &GEP;
+  }
+
   // nusw + nneg -> nuw
   if (GEP.hasNoUnsignedSignedWrap() && !GEP.hasNoUnsignedWrap() &&
       all_of(GEP.indices(), [&](Value *Idx) {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
index c14d61b51ad77a..0e5b4dedd4a0d2 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll
@@ -53,7 +53,7 @@ define i64 @memcpy_constant_arg_ptr_to_alloca_load_atomic(ptr addrspace(4) noali
 ; CHECK-LABEL: @memcpy_constant_arg_ptr_to_alloca_load_atomic(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i64], align 8, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 8 dereferenceable(256) [[ALLOCA]], ptr addrspace(4) noundef align 8 dereferenceable(256) [[ARG:%.*]], i64 256, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i64], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load atomic i64, ptr addrspace(5) [[GEP]] syncscope("somescope") acquire, align 8
 ; CHECK-NEXT:    ret i64 [[LOAD]]
 ;
@@ -101,7 +101,7 @@ define amdgpu_kernel void @memcpy_constant_byref_arg_ptr_to_alloca_too_many_byte
 ; CHECK-LABEL: @memcpy_constant_byref_arg_ptr_to_alloca_too_many_bytes(
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(31) [[ALLOCA]], ptr addrspace(4) noundef align 4 dereferenceable(31) [[ARG:%.*]], i64 31, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
@@ -120,7 +120,7 @@ define amdgpu_kernel void @memcpy_constant_intrinsic_ptr_to_alloca(ptr addrspace
 ; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [32 x i8], align 4, addrspace(5)
 ; CHECK-NEXT:    [[KERNARG_SEGMENT_PTR:%.*]] = call align 16 dereferenceable(32) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
 ; CHECK-NEXT:    call void @llvm.memcpy.p5.p4.i64(ptr addrspace(5) noundef align 4 dereferenceable(32) [[ALLOCA]], ptr addrspace(4) noundef align 16 dereferenceable(32) [[KERNARG_SEGMENT_PTR]], i64 32, i1 false)
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [32 x i8], ptr addrspace(5) [[ALLOCA]], i32 0, i32 [[IDX:%.*]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load i8, ptr addrspace(5) [[GEP]], align 1
 ; CHECK-NEXT:    store i8 [[LOAD]], ptr addrspace(1) [[OUT:%.*]], align 1
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index aafe5f57c4c724..f289e1459000aa 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -31,8 +31,8 @@ define void @MainKernel(i32 %iNumSteps, i32 %tid, i32 %base) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i32 [[I12_06]], [[BASE:%.*]]
 ; CHECK-NEXT:    [[ADD:%.*]] = add nuw i32 [[I12_06]], 1
 ; CHECK-NEXT:    [[CONV_I9:%.*]] = sext i32 [[ADD]] to i64
-; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
-; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX20:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLA]], i64 0, i64 [[CONV_I9]]
+; CHECK-NEXT:    [[ARRAYIDX24:%.*]] = getelementptr inbounds nuw [258 x float], ptr [[CALLB]], i64 0, i64 [[CONV_I9]]
 ; CHECK-NEXT:    [[CMP40:%.*]] = icmp ult i32 [[I12_06]], [[BASE]]
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[DOTBB4:%.*]], label [[DOTBB5:%.*]]
 ; CHECK:       .bb4:
diff --git a/llvm/test/Transforms/InstCombine/load-cmp.ll b/llvm/test/Transforms/InstCombine/load-cmp.ll
index 12be81b8f815d0..531258935bb822 100644
--- a/llvm/test/Transforms/InstCombine/load-cmp.ll
+++ b/llvm/test/Transforms/InstCombine/load-cmp.ll
@@ -339,7 +339,7 @@ define i1 @test10_struct_arr_noinbounds_i64(i64 %x) {
 define i1 @pr93017(i64 %idx) {
 ; CHECK-LABEL: @pr93017(
 ; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[IDX:%.*]] to i32
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds nuw [2 x ptr], ptr @table, i32 0, i32 [[TMP1]]
 ; CHECK-NEXT:    [[V:%.*]] = load ptr, ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[V]], null
 ; CHECK-NEXT:    ret i1 [[CMP]]
diff --git a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
index d6624010acb219..f931b41eb0f71d 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-addrspace.ll
@@ -6,7 +6,7 @@
 define void @test_load(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr addrspace(2) @test.data, i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -45,7 +45,7 @@ entry:
 define void @test_load_bitcast_chain(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-LABEL: @test_load_bitcast_chain(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr addrspace(2) @test.data, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(2) [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -66,7 +66,7 @@ define void @test_call(ptr addrspace(1) %out, i64 %x) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -87,8 +87,8 @@ define void @test_call_no_null_opt(ptr addrspace(1) %out, i64 %x) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
-; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[TMP0:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
 ; CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ define void @test_load_and_call(ptr addrspace(1) %out, i64 %x, i64 %y) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
@@ -135,11 +135,11 @@ define void @test_load_and_call_no_null_opt(ptr addrspace(1) %out, i64 %x, i64 %
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[DATA:%.*]] = alloca [8 x i32], align 4
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p2.i64(ptr noundef nonnull align 4 dereferenceable(32) [[DATA]], ptr addrspace(2) noundef align 4 dereferenceable(32) @test.data, i64 32, i1 false)
-; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [8 x i32], ptr [[DATA]], i64 0, i64 [[X:%.*]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT:%.*]], i64 [[X]]
 ; CHECK-NEXT:    store i32 [[TMP0]], ptr addrspace(1) [[ARRAYIDX1]], align 4
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr [[ARRAYIDX]])
+; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @foo(ptr nonnull [[ARRAYIDX]])
 ; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr addrspace(1) [[OUT]], i64 [[Y:%.*]]
 ; CHECK-NEXT:    store i32 [[TMP1]], ptr addrspace(1) [[ARRAYIDX2]], align 4
 ; CHECK-NEXT:    ret void
diff --git a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
index 34e6c601f494a5..845b1ad703596b 100644
--- a/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
+++ b/llvm/test/Transforms/InstCombine/memcpy-from-global.ll
@@ -322,7 +322,7 @@ define float @test11_volatile(i64 %i) {
 ; CHECK-NEXT:    [[A:%.*]] = alloca [4 x float], align 4
 ; CHECK-NEXT:    call void @llvm.lifetime.start.p0(i64 16, ptr nonnull [[A]])
 ; CHECK-NEXT:    call void @llvm.memcpy.p0.p1.i64(ptr align 4 [[A]], ptr addrspace(1) align 4 @I, i64 16, i1 true)
-; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[G:%.*]] = getelementptr inbounds nuw [4 x float], ptr [[A]], i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[R:%.*]] = load float, ptr [[G]], align 4
 ; CHECK-NEXT:    ret float [[R]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/stpcpy-1.ll b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
index 2ddacb20974426..88d2ab9bfec420 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy-1.ll
@@ -25,7 +25,7 @@ define ptr @test_simplify1() {
 define ptr @test_simplify2() {
 ; CHECK-LABEL: @test_simplify2(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
   %ret = call ptr @stpcpy(ptr @a, ptr @a)
diff --git a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
index 2d775f35c8bda4..8a1ce0dfa9f9be 100644
--- a/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
+++ b/llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll
@@ -93,7 +93,7 @@ define ptr @test_simplify5() {
 define ptr @test_simplify6() {
 ; CHECK-LABEL: @test_simplify6(
 ; CHECK-NEXT:    [[STRLEN:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) @a)
-; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds i8, ptr @a, i32 [[STRLEN]]
+; CHECK-NEXT:    [[RET:%.*]] = getelementptr inbounds nuw i8, ptr @a, i32 [[STRLEN]]
 ; CHECK-NEXT:    ret ptr [[RET]]
 ;
 
diff --git a/llvm/test/Transforms/InstCombine/strlen-1.ll b/llvm/test/Transforms/InstCombine/strlen-1.ll
index facf4f7d0973f2..2d28139bf75a80 100644
--- a/llvm/test/Transforms/InstCombine/strlen-1.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-1.ll
@@ -155,7 +155,7 @@ define i32 @test_no_simplify1() {
 
 define i32 @test_no_simplify2(i32 %x) {
 ; CHECK-LABEL: @test_no_simplify2(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
 ; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
@@ -166,8 +166,8 @@ define i32 @test_no_simplify2(i32 %x) {
 
 define i32 @test_no_simplify2_no_null_opt(i32 %x) #0 {
 ; CHECK-LABEL: @test_no_simplify2_no_null_opt(
-; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
-; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef [[HELLO_P]])
+; CHECK-NEXT:    [[HELLO_P:%.*]] = getelementptr inbounds nuw [7 x i8], ptr @null_hello, i32 0, i32 [[X:%.*]]
+; CHECK-NEXT:    [[HELLO_L:%.*]] = call i32 @strlen(ptr noundef nonnull dereferenceable(1) [[HELLO_P]])
 ; CHECK-NEXT:    ret i32 [[HELLO_L]]
 ;
   %hello_p = getelementptr inbounds [7 x i8], ptr @null_hello, i32 0, i32 %x
diff --git a/llvm/test/Transforms/InstCombine/strlen-4.ll b/llvm/test/Transforms/InstCombine/strlen-4.ll
index 58d04e8a0d4bea..ca01ce93be8835 100644
--- a/llvm/test/Transforms/InstCombine/strlen-4.ll
+++ b/llvm/test/Transforms/InstCombine/strlen-4.ll
@@ -18,7 +18,7 @@ declare i64 @strlen(ptr)
 
 define i64 @fold_strlen_s3_pi_s5(i1 %X, i64 %I) {
 ; CHECK-LABEL: @fold_strlen_s3_pi_s5(
-; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
+; CHECK-NEXT:    [[PS3_PI:%.*]] = getelementptr inbounds nuw [4 x i8], ptr @s3, i64 0, i64 [[I:%.*]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[X:%.*]], ptr [[PS3_PI]], ptr @s5
 ; CHECK-NEXT:    [[LEN:%.*]] = tail call i64 @strlen(ptr noundef nonnull derefe...
[truncated]

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nikic nikic merged commit e21ab4d into llvm:main Dec 10, 2024
13 checks passed
@nikic nikic deleted the instcombine-base-gep-nuw branch December 10, 2024 09:00
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 10, 2024

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building clang,llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/8300

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: functionalities/dyld-exec-linux/TestDyldExecLinux.py (443 of 2846)
UNSUPPORTED: lldb-api :: functionalities/fat_archives/TestFatArchives.py (444 of 2846)
PASS: lldb-api :: functionalities/executable_first/TestExecutableFirst.py (445 of 2846)
PASS: lldb-api :: functionalities/find-line-entry/TestFindLineEntry.py (446 of 2846)
UNSUPPORTED: lldb-api :: functionalities/fork/concurrent_vfork/TestConcurrentVFork.py (447 of 2846)
PASS: lldb-api :: functionalities/exec/TestExec.py (448 of 2846)
PASS: lldb-api :: functionalities/float-display/TestFloatDisplay.py (449 of 2846)
PASS: lldb-api :: functionalities/gdb_remote_client/TestAArch64XMLRegOffsets.py (450 of 2846)
PASS: lldb-api :: functionalities/dynamic_value_child_count/TestDynamicValueChildCount.py (451 of 2846)
PASS: lldb-api :: functionalities/gdb_remote_client/TestArmRegisterDefinition.py (452 of 2846)
FAIL: lldb-api :: functionalities/fork/resumes-child/TestForkResumesChild.py (453 of 2846)
******************** TEST 'lldb-api :: functionalities/fork/resumes-child/TestForkResumesChild.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/make --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/functionalities/fork/resumes-child -p TestForkResumesChild.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision e21ab4d16b555c28ded307571d138f594f33e325)
  clang revision e21ab4d16b555c28ded307571d138f594f33e325
  llvm revision e21ab4d16b555c28ded307571d138f594f33e325
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_step_over_fork (TestForkResumesChild.TestForkResumesChild)
======================================================================
FAIL: test_step_over_fork (TestForkResumesChild.TestForkResumesChild)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/functionalities/fork/resumes-child/TestForkResumesChild.py", line 22, in test_step_over_fork
    self.expect("continue", substrs=["exited with status = 0"])
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2371, in expect
    self.runCmd(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 996, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'continue' did not return successfully
Error output:
error: Process must be launched.

Config=arm-/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang
----------------------------------------------------------------------
Ran 1 test in 0.805s

FAILED (failures=1)


@SixWeining
Copy link
Contributor

SixWeining commented Dec 12, 2024

Seems that this change causes Segment Fault on multiple targets including aarch64, loongarch64 and riscv64.

This is detected by a LoongArch buildbot and manually checked on aarch64 and riscv64 QEMUs.

I wonder why aarch64 buildbot doesn't report this. I find that CMAKE_CXX_FLAGS_RELEASE is empty on aarch64 bot while it is -O3 -DNDEBUG on loongarch64. Is it an issue of the llvm-zorg framework?

UPDATE: This buildbot(aarch64) also fail which uses -O3.

Reproduce

$ clang llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c -O3 -o huffbench
$ ./huffbench

@serge-sans-paille
Copy link
Collaborator

Hey @nikic,
I've bisected the failures from this buildbot:
https://lab.llvm.org/buildbot/#/builders/199/builds/57
down to this commit.

The reproducer is unfortunately not pleasant:
get a stage-2 build of clang then run the llvm-test suite. This reproducibility chokes on SingleSource/Benchmarks/CoyoteBench/huffbench.test, as can be seen on the buildbot log.

@dtcxzyw
Copy link
Member

dtcxzyw commented Dec 12, 2024

Seems that this change causes Segment Fault on multiple targets including aarch64, loongarch64 and riscv64.

This is detected by a LoongArch buildbot and manually checked on aarch64 and riscv64 QEMUs.

I wonder why aarch64 buildbot doesn't report this. I find that CMAKE_CXX_FLAGS_RELEASE is empty on aarch64 bot while it is -O3 -DNDEBUG on loongarch64. Is it an issue of the llvm-zorg framework?

UPDATE: This buildbot(aarch64) also fail which uses -O3.

Reproduce

$ clang llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c -O3 -o huffbench $ ./huffbench

Reproduced on Loongson-3A5000. I will provide a reduced test case later.

@nikic
Copy link
Contributor Author

nikic commented Dec 12, 2024

From a quick look, it's probably due to UB on this line: https://github.com/llvm/llvm-test-suite/blob/4f14adb8a2f2c5fa02c6b3643d45e74adfcb31f9/SingleSource/Benchmarks/CoyoteBench/huffbench.c#L114 It indexes below the start of an object.

@dtcxzyw
Copy link
Member

dtcxzyw commented Dec 12, 2024

huffbench.c:319:10: runtime error: left shift of negative value -93
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior huffbench.c:319:10

@nikic
Copy link
Contributor Author

nikic commented Dec 12, 2024

I think llvm/llvm-test-suite#190 should fix this issue, but haven't tested on an affected arch.

I'm a bit worried that we don't have a sanitizer to catch this issue (the negative left shift issue is an unrelated one).

@serge-sans-paille
Copy link
Collaborator

Tested on an aarch64 machine where I've been able to reproduce. Works like a charm, thanks @nikic

@vitalybuka
Copy link
Collaborator

I think llvm/llvm-test-suite#190 should fix this issue, but haven't tested on an affected arch.

I'm a bit worried that we don't have a sanitizer to catch this issue (the negative left shift issue is an unrelated one).

-fsanitize=local-bounds is quite related, but I guess InstCombine happens before the check inserted

@alexfh
Copy link
Contributor

alexfh commented Dec 18, 2024

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

@nikic
Copy link
Contributor Author

nikic commented Dec 18, 2024

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

Thanks for the heads up! As I mentioned on another thread, I'm happy to just entirely revert this patch due to lack of good sanitizer support (I can do it tomorrow, but feel free to do it yourself earlier). Having a reproducer for the msan issue is probably still valuable, as it may indicate a larger problem.

@alexfh
Copy link
Contributor

alexfh commented Dec 18, 2024

Heads up: apart from a number of instances of actual UB in the code (which are quite tedious to localize and understand due to a lack of specialized tooling) we're investigating a bad interaction of this commit with msan, which seems to result in a miscompilation.

Thanks for the heads up! As I mentioned on another thread, I'm happy to just entirely revert this patch due to lack of good sanitizer support (I can do it tomorrow, but feel free to do it yourself earlier). Having a reproducer for the msan issue is probably still valuable, as it may indicate a larger problem.

Thanks! Sent a PR with a revert: #120460

As mentioned, a test case is in the works.

alexfh added a commit that referenced this pull request Dec 18, 2024
…#120460)

Reverts #119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
#119225 (comment)
#118472 (comment)
@fmayer
Copy link
Contributor

fmayer commented Dec 19, 2024

I was working on this miscompile. I cannot give a reproducer right now, but the following observations:

The generated IR is clearly incorrect (the inbounds nuw), of the form:

%x = alloca [12 x i32], align 16
[...]
%59 = getelementptr inbounds nuw i8, ptr %x, i64 -4
%wide.masked.load.2 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr nonnull %59, i32 4, <4 x i1> <i1 false, i1 true, i1 true, i1 true>, <4 x i32> poison)

This is some interaction between this CL and these two InstCombines:

If I disable any of the three, the error goes away. The observation is that with the two above, we get an inbounds getelementptr that is NOT inbounds (-4), that then gets masked vector read to correct for that.

The comment in the first link implies that it maybe should be removing inbounds:

      // Even if the total offset is inbounds, we may end up representing it
      // by first performing a larger negative offset, and then a smaller
      // positive one. The large negative offset might go out of bounds. Only
      // preserve inbounds if all signs are the same.

but it then goes and only removes the wrap flags:

      if (Idx.isNonNegative() != ConstIndices[0].isNonNegative())
        NW = NW.withoutNoUnsignedSignedWrap();
      if (!Idx.isNonNegative())
        NW = NW.withoutNoUnsignedWrap();

If I change this to

      if (Idx.isNonNegative() != ConstIndices[0].isNonNegative())
        NW = NW.withoutNoUnsignedSignedWrap().withoutInBounds();
      if (!Idx.isNonNegative())
        NW = NW.withoutNoUnsignedWrap().withoutInBounds();

the error also goes away.

If I disable the second link (i8 canonicalization) the access is then -1 (makes sense) but NOT marked as inbounds.

I don't feel like this is a sanitizer issue, it seems like MSan might just have gotten unlucky. Also, I think this CL is correct in isolation, but is worsening some existing problem.

@nikic
Copy link
Contributor Author

nikic commented Dec 19, 2024

@fmayer The withoutNoUnsignedSignedWrap() API already removes the inbounds flag as well (because inbounds requires nusw). So I think the effect of your change is to drop inbounds in case all indices are negative, which should generally not be necessary.

It's pretty likely that the root cause here is indeed incorrect inbounds preservation somewhere, but I think the logic in that transform is correct.

@fmayer
Copy link
Contributor

fmayer commented Dec 20, 2024

Seems like the root cause was an vectorizer bug, sent #120730 that fixes it.

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
… of object" (#120460)

Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
llvm/llvm-project#119225 (comment)
llvm/llvm-project#118472 (comment)
@fmayer
Copy link
Contributor

fmayer commented Jan 16, 2025

Seems like the root cause was an vectorizer bug, sent #120730 that fixes it.

This is landed, so from that side we could reland this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:instcombine llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants