[LV] Fix for miscompile with disjoint or #81922

annamthomas · 2024-02-15T21:26:17Z

Precommit test for issue in pr81872
[LoopVectorize] Fix for miscompile with disjoint or

When disjoint or is disjoint because of dominating conditions, we should not vectorize such loops. This is because vectorizing it causes us to no longer have the original meaning of the instruction (it is no longer disjoint, it becomes a regular or which is not an add). Fixes pr81872.

llvmbot · 2024-02-15T21:27:02Z

@llvm/pr-subscribers-llvm-transforms

Author: None (annamthomas)

Changes

Precommit test for issue in pr81872
[LoopVectorize] Fix for miscompile with disjoint or

Full diff: https://github.com/llvm/llvm-project/pull/81922.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+11)
(added) llvm/test/Transforms/LoopVectorize/X86/pr81872.ll (+102)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 37a356c43e29a4..f2744b5d66a26f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -888,6 +888,17 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
         return false;
       } // end of PHI handling
 
+      // or-disjoint instruction which is predicated maybe disjoint because of a
+      // dominating condition. Avoid vectorizing in such cases, since dropping
+      // the disjoint (to avoid poison) changes the meaning of the instruction
+      // which was originally an add.
+      // TODO: We should check the terminating condition of all
+      // dominating blocks in the loop to see if Op0 is used for their
+      // terminating condition.
+      if (auto *Op = dyn_cast<PossiblyDisjointInst>(&I))
+        if (Op->isDisjoint() && blockNeedsPredication(BB))
+          return false;
+
       // We handle calls that:
       //   * Are debug info intrinsics.
       //   * Have a mapping to an IR intrinsic.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
new file mode 100644
index 00000000000000..20a6732e340b78
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr81872.ll
@@ -0,0 +1,102 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=loop-vectorize < %s | FileCheck %s
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+@global = external global ptr addrspace(1), align 8
+
+; PR 81872 explains the issue.
+
+; If we vectorize, we have a miscompile where array IV and thereby value stored in (arr[99],
+; arr[98]) is calculated incorrectly since disjoint or was only disjoint because
+; of dominating conditions. Dropping the disjoint to avoid poison still changes
+; the behaviour since now the or is not longer equivalent to the add.
+; Function Attrs: uwtable
+define void @test(ptr addrspace(1) noundef align 8 dereferenceable_or_null(16) %arg1) #0 {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: ptr addrspace(1) noundef align 8 dereferenceable_or_null(16) [[ARG1:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  bb5:
+; CHECK-NEXT:    [[GETELEMENTPTR:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[ARG1]], i64 16
+; CHECK-NEXT:    br label [[BB8:%.*]]
+; CHECK:       bb6:
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[PHI9:%.*]], 1
+; CHECK-NEXT:    [[ICMP7:%.*]] = icmp sgt i32 [[PHI9]], -2
+; CHECK-NEXT:    br i1 [[ICMP7]], label [[BB10:%.*]], label [[BB8]]
+; CHECK:       bb8:
+; CHECK-NEXT:    [[PHI9]] = phi i32 [ -60516, [[BB5:%.*]] ], [ [[ADD]], [[BB6:%.*]] ]
+; CHECK-NEXT:    br label [[BB15:%.*]]
+; CHECK:       bb10:
+; CHECK-NEXT:    [[GETELEMENTPTR11:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[ARG1]], i64 96
+; CHECK-NEXT:    [[LOAD12:%.*]] = load i64, ptr addrspace(1) [[GETELEMENTPTR11]], align 8, !noundef [[META0:![0-9]+]]
+; CHECK-NEXT:    [[LOAD13:%.*]] = load ptr addrspace(1), ptr @global, align 8, !invariant.load [[META0]], !nonnull [[META0]], !dereferenceable_or_null !1, !align [[META2:![0-9]+]]
+; CHECK-NEXT:    [[GETELEMENTPTR14:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[LOAD13]], i64 848
+; CHECK-NEXT:    store i64 [[LOAD12]], ptr addrspace(1) [[GETELEMENTPTR14]], align 8
+; CHECK-NEXT:    ret void
+; CHECK:       bb15:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 99, [[BB8]] ], [ [[IV_NEXT:%.*]], [[BB20:%.*]] ]
+; CHECK-NEXT:    [[AND:%.*]] = and i64 [[IV]], 1
+; CHECK-NEXT:    [[ICMP17:%.*]] = icmp eq i64 [[AND]], 0
+; CHECK-NEXT:    br i1 [[ICMP17]], label [[BB18:%.*]], label [[BB20]], !prof [[PROF3:![0-9]+]]
+; CHECK:       bb18:
+; CHECK-NEXT:    [[OR:%.*]] = or disjoint i64 [[IV]], 1
+; CHECK-NEXT:    [[GETELEMENTPTR19:%.*]] = getelementptr inbounds i64, ptr addrspace(1) [[GETELEMENTPTR]], i64 [[OR]]
+; CHECK-NEXT:    store i64 1, ptr addrspace(1) [[GETELEMENTPTR19]], align 8
+; CHECK-NEXT:    br label [[BB20]]
+; CHECK:       bb20:
+; CHECK-NEXT:    [[IV_NEXT]] = add nsw i64 [[IV]], -1
+; CHECK-NEXT:    [[ICMP22:%.*]] = icmp eq i64 [[IV_NEXT]], 90
+; CHECK-NEXT:    br i1 [[ICMP22]], label [[BB6]], label [[BB15]], !prof [[PROF4:![0-9]+]]
+;
+bb5:
+  %getelementptr = getelementptr inbounds i8, ptr addrspace(1) %arg1, i64 16
+  br label %bb8
+
+bb6:                                              ; preds = %bb20
+  %add = add nsw i32 %phi9, 1
+  %icmp7 = icmp sgt i32 %phi9, -2
+  br i1 %icmp7, label %bb10, label %bb8
+
+bb8:                                              ; preds = %bb6, %bb5
+  %phi9 = phi i32 [ -60516, %bb5 ], [ %add, %bb6 ]
+  br label %bb15
+
+bb10:                                             ; preds = %bb6
+  %getelementptr11 = getelementptr inbounds i8, ptr addrspace(1) %arg1, i64 96
+  %load12 = load i64, ptr addrspace(1) %getelementptr11, align 8,  !noundef !4
+  %load13 = load ptr addrspace(1), ptr @global, align 8, !invariant.load !4, !nonnull !4, !dereferenceable_or_null !16, !align !17
+  %getelementptr14 = getelementptr inbounds i8, ptr addrspace(1) %load13, i64 848
+  store i64 %load12, ptr addrspace(1) %getelementptr14, align 8
+  ret void
+
+bb15:                                             ; preds = %bb20, %bb8
+  %iv = phi i64 [ 99, %bb8 ], [ %iv.next, %bb20 ]
+  %and = and i64 %iv, 1
+  %icmp17 = icmp eq i64 %and, 0
+  br i1 %icmp17, label %bb18, label %bb20, !prof !21
+
+bb18:                                             ; preds = %bb15
+  %or = or disjoint i64 %iv, 1
+  %getelementptr19 = getelementptr inbounds i64, ptr addrspace(1) %getelementptr, i64 %or
+  store i64 1, ptr addrspace(1) %getelementptr19, align 8
+  br label %bb20
+
+bb20:                                             ; preds = %bb18, %bb15
+  %iv.next = add nsw i64 %iv, -1
+  %icmp22 = icmp eq i64 %iv.next, 90
+  br i1 %icmp22, label %bb6, label %bb15, !prof !22
+}
+
+attributes #0 = {"target-cpu"="haswell" "target-features"="+avx2" }
+
+!4 = !{}
+!10 = !{i32 1}
+!16 = !{i64 864}
+!17 = !{i64 8}
+!21 = !{!"branch_weights", i32 1, i32 1}
+!22 = !{!"branch_weights", i32 1, i32 95}
+;.
+; CHECK: [[META0]] = !{}
+; CHECK: [[META2]] = !{i64 8}
+; CHECK: [[PROF3]] = !{!"branch_weights", i32 1, i32 1}
+; CHECK: [[PROF4]] = !{!"branch_weights", i32 1, i32 95}
+;.

annamthomas · 2024-02-15T21:29:02Z

@fhahn Some lit tests are failing because we are very conservative with the disable vectorization check with disjoint or. But I wanted to see if this is the general path forward. This is a fix for pr81872.

annamthomas · 2024-02-27T20:27:03Z

any comments here? Is this the correct direction for the fix? The miscompile is a functional bug here.

nikic · 2024-03-04T09:54:59Z

Is it possible / not too hard to convert the disjoint or into an add instead, when dropping poison flags?

fhahn

any comments here? Is this the correct direction for the fix? The miscompile is a functional bug here.

Sorry this dropped off my radar. I think in terms of correctness, this would be OK, but quite aggressive. IIUC we would only need to bail out if the OR was used by SCEV (and treated as ADD). But unfortunately there's no easy way to check if it was used in such a way.

As @nikic suggested, it should be fine to convert it to an ADD instead, which has less negative impact. I tried to see if that would be feasible and sketched it in #83821

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll

fhahn · 2024-03-04T11:16:22Z

(It would be great if you could land the test separately regardless)

annamthomas

Landing the updated test case separately and closing this MR in favour of the linked one (changes disjoint or to add in poisonDroppingFlags API).

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll

AZero13 · 2024-04-02T20:08:14Z

ping?

fhahn · 2024-04-02T21:15:18Z

I think this PR could be closed now IIUC?

annamthomas · 2024-04-03T12:45:00Z

yes, this PR can be closed since #83821 has landed.

annamthomas added 2 commits February 15, 2024 16:24

Precommit test for issue in pr81872

7977537

llvmbot added vectorizers llvm:transforms labels Feb 15, 2024

annamthomas changed the title ~~LV PR 81872~~ [ LV] Fix for miscompile with disjoint or Feb 15, 2024

annamthomas requested review from fhahn and davemgreen February 15, 2024 21:29

dtcxzyw changed the title ~~[ LV] Fix for miscompile with disjoint or~~ [LV] Fix for miscompile with disjoint or Feb 16, 2024

dtcxzyw linked an issue Feb 16, 2024 that may be closed by this pull request

Miscompile with LoopVectorizer: Incorrect code with disjoint or #81872

Closed

davemgreen requested review from preames and nikic March 4, 2024 09:53

fhahn reviewed Mar 4, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll Show resolved Hide resolved

annamthomas commented Mar 11, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll Show resolved Hide resolved

annamthomas closed this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LV] Fix for miscompile with disjoint or #81922

[LV] Fix for miscompile with disjoint or #81922

annamthomas commented Feb 15, 2024

llvmbot commented Feb 15, 2024

annamthomas commented Feb 15, 2024

annamthomas commented Feb 27, 2024

nikic commented Mar 4, 2024

fhahn left a comment

fhahn commented Mar 4, 2024

annamthomas left a comment

AZero13 commented Apr 2, 2024

fhahn commented Apr 2, 2024

annamthomas commented Apr 3, 2024

[LV] Fix for miscompile with disjoint or #81922

[LV] Fix for miscompile with disjoint or #81922

Conversation

annamthomas commented Feb 15, 2024

llvmbot commented Feb 15, 2024

annamthomas commented Feb 15, 2024

annamthomas commented Feb 27, 2024

nikic commented Mar 4, 2024

fhahn left a comment

Choose a reason for hiding this comment

fhahn commented Mar 4, 2024

annamthomas left a comment

Choose a reason for hiding this comment

AZero13 commented Apr 2, 2024

fhahn commented Apr 2, 2024

annamthomas commented Apr 3, 2024