-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LoopInterchange] Remove 'S' Scalar Dependencies #119345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Sjoerd Meijer (sjoerdmeijer) ChangesThis makes loop-interchange more pessimistic, but also more correct as we are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that: [LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176 [LoopInterchange] Interchange breaks program correctness #46867 This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later lift this restriction. Patch is 25.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119345.diff 9 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index a0c0080c0bda1c..909891f84dc191 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx,
// if the direction matrix, after the same permutation is applied to its
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
static bool isLexicographicallyPositive(std::vector<char> &DV) {
+ bool HasScalar = false;
+ bool FirstPosScalar = true;
for (unsigned char Direction : DV) {
+ // FIXME: we reject scalar dependencies because we currently don't
+ // handle them correctly. This makes the legality check more conservative
+ // than needed, but is a stopgap to avoid miscompiles.
+ if (Direction == 'S' && !FirstPosScalar)
+ return false;
if (Direction == '<')
return true;
if (Direction == '>' || Direction == '*')
return false;
+ FirstPosScalar = false;
}
return true;
}
diff --git a/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll b/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
index b3383655668980..2168cde7f5374b 100644
--- a/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
+++ b/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
@@ -29,19 +29,13 @@
define dso_local i32 @test1(i1 %cond) {
; CHECK-LABEL: define dso_local i32 @test1(
; CHECK-SAME: i1 [[COND:%.*]]) {
-; CHECK-NEXT: [[FOR_PREHEADER:.*:]]
-; CHECK-NEXT: br label %[[INNERLOOP_PREHEADER:.*]]
-; CHECK: [[OUTERLOOP_PREHEADER:.*]]:
+; CHECK-NEXT: [[FOR_PREHEADER:.*]]:
; CHECK-NEXT: br label %[[OUTERLOOP:.*]]
; CHECK: [[OUTERLOOP]]:
-; CHECK-NEXT: [[I:%.*]] = phi i64 [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ], [ 0, %[[OUTERLOOP_PREHEADER]] ]
-; CHECK-NEXT: br label %[[INNERLOOP_SPLIT:.*]]
-; CHECK: [[INNERLOOP_PREHEADER]]:
+; CHECK-NEXT: [[I:%.*]] = phi i64 [ 0, %[[FOR_PREHEADER]] ], [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ]
; CHECK-NEXT: br label %[[INNERLOOP:.*]]
; CHECK: [[INNERLOOP]]:
-; CHECK-NEXT: [[J:%.*]] = phi i64 [ [[TMP0:%.*]], %[[IF_END_SPLIT:.*]] ], [ 0, %[[INNERLOOP_PREHEADER]] ]
-; CHECK-NEXT: br label %[[OUTERLOOP_PREHEADER]]
-; CHECK: [[INNERLOOP_SPLIT]]:
+; CHECK-NEXT: [[J:%.*]] = phi i64 [ 0, %[[OUTERLOOP]] ], [ [[TMP0:%.*]], %[[IF_END:.*]] ]
; CHECK-NEXT: [[ARRAYIDX6_I:%.*]] = getelementptr inbounds [4 x [9 x i32]], ptr @f, i64 0, i64 [[J]], i64 [[I]]
; CHECK-NEXT: [[I1:%.*]] = load i32, ptr [[ARRAYIDX6_I]], align 4
; CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp eq i32 [[I1]], 0
@@ -50,24 +44,20 @@ define dso_local i32 @test1(i1 %cond) {
; CHECK-NEXT: store i32 3, ptr @g, align 4
; CHECK-NEXT: br label %[[LAND_END]]
; CHECK: [[LAND_END]]:
-; CHECK-NEXT: br i1 [[COND]], label %[[IF_END:.*]], label %[[IF_THEN:.*]]
+; CHECK-NEXT: br i1 [[COND]], label %[[IF_END]], label %[[IF_THEN:.*]]
; CHECK: [[IF_THEN]]:
; CHECK-NEXT: [[I2:%.*]] = load i32, ptr @g, align 4
; CHECK-NEXT: [[INC_I:%.*]] = add i32 [[I2]], 1
; CHECK-NEXT: store i32 [[INC_I]], ptr @g, align 4
; CHECK-NEXT: br label %[[IF_END]]
; CHECK: [[IF_END]]:
-; CHECK-NEXT: [[J_NEXT:%.*]] = add nuw nsw i64 [[J]], 1
-; CHECK-NEXT: [[EXITCOND_I:%.*]] = icmp eq i64 [[J_NEXT]], 3
-; CHECK-NEXT: br label %[[FOR_LATCH]]
-; CHECK: [[IF_END_SPLIT]]:
; CHECK-NEXT: [[TMP0]] = add nuw nsw i64 [[J]], 1
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[TMP0]], 3
-; CHECK-NEXT: br i1 [[TMP1]], label %[[EXIT:.*]], label %[[INNERLOOP]]
+; CHECK-NEXT: br i1 [[TMP1]], label %[[FOR_LATCH]], label %[[INNERLOOP]]
; CHECK: [[FOR_LATCH]]:
; CHECK-NEXT: [[INDVARS_IV_NEXT21_I]] = add nsw i64 [[I]], 1
; CHECK-NEXT: [[CMP_I:%.*]] = icmp slt i64 [[I]], 2
-; CHECK-NEXT: br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[IF_END_SPLIT]]
+; CHECK-NEXT: br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: [[I3:%.*]] = load i32, ptr @g, align 4
; CHECK-NEXT: ret i32 [[I3]]
@@ -139,19 +129,13 @@ exit:
define dso_local i32 @test2(i1 %cond) {
; CHECK-LABEL: define dso_local i32 @test2(
; CHECK-SAME: i1 [[COND:%.*]]) {
-; CHECK-NEXT: [[FOR_PREHEADER:.*:]]
-; CHECK-NEXT: br label %[[INNERLOOP_PREHEADER:.*]]
-; CHECK: [[OUTERLOOP_PREHEADER:.*]]:
+; CHECK-NEXT: [[FOR_PREHEADER:.*]]:
; CHECK-NEXT: br label %[[OUTERLOOP:.*]]
; CHECK: [[OUTERLOOP]]:
-; CHECK-NEXT: [[I:%.*]] = phi i64 [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ], [ 0, %[[OUTERLOOP_PREHEADER]] ]
-; CHECK-NEXT: br label %[[INNERLOOP_SPLIT:.*]]
-; CHECK: [[INNERLOOP_PREHEADER]]:
+; CHECK-NEXT: [[I:%.*]] = phi i64 [ 0, %[[FOR_PREHEADER]] ], [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ]
; CHECK-NEXT: br label %[[INNERLOOP:.*]]
; CHECK: [[INNERLOOP]]:
-; CHECK-NEXT: [[J:%.*]] = phi i64 [ [[TMP0:%.*]], %[[IF_END_SPLIT:.*]] ], [ 0, %[[INNERLOOP_PREHEADER]] ]
-; CHECK-NEXT: br label %[[OUTERLOOP_PREHEADER]]
-; CHECK: [[INNERLOOP_SPLIT]]:
+; CHECK-NEXT: [[J:%.*]] = phi i64 [ 0, %[[OUTERLOOP]] ], [ [[TMP0:%.*]], %[[IF_END:.*]] ]
; CHECK-NEXT: [[ARRAYIDX6_I:%.*]] = getelementptr inbounds [4 x [9 x i32]], ptr @f, i64 0, i64 [[J]], i64 [[I]]
; CHECK-NEXT: [[I1:%.*]] = load i32, ptr [[ARRAYIDX6_I]], align 4
; CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp eq i32 [[I1]], 0
@@ -160,24 +144,20 @@ define dso_local i32 @test2(i1 %cond) {
; CHECK: [[LAND_RHS]]:
; CHECK-NEXT: br label %[[LAND_END]]
; CHECK: [[LAND_END]]:
-; CHECK-NEXT: br i1 [[COND]], label %[[IF_END:.*]], label %[[IF_THEN:.*]]
+; CHECK-NEXT: br i1 [[COND]], label %[[IF_END]], label %[[IF_THEN:.*]]
; CHECK: [[IF_THEN]]:
; CHECK-NEXT: [[I2:%.*]] = load i32, ptr @g, align 4
; CHECK-NEXT: [[INC_I:%.*]] = add i32 [[I2]], 1
; CHECK-NEXT: store i32 [[INC_I]], ptr @g, align 4
; CHECK-NEXT: br label %[[IF_END]]
; CHECK: [[IF_END]]:
-; CHECK-NEXT: [[J_NEXT:%.*]] = add nuw nsw i64 [[J]], 1
-; CHECK-NEXT: [[EXITCOND_I:%.*]] = icmp eq i64 [[J_NEXT]], 3
-; CHECK-NEXT: br label %[[FOR_LATCH]]
-; CHECK: [[IF_END_SPLIT]]:
; CHECK-NEXT: [[TMP0]] = add nuw nsw i64 [[J]], 1
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[TMP0]], 3
-; CHECK-NEXT: br i1 [[TMP1]], label %[[EXIT:.*]], label %[[INNERLOOP]]
+; CHECK-NEXT: br i1 [[TMP1]], label %[[FOR_LATCH]], label %[[INNERLOOP]]
; CHECK: [[FOR_LATCH]]:
; CHECK-NEXT: [[INDVARS_IV_NEXT21_I]] = add nsw i64 [[I]], 1
; CHECK-NEXT: [[CMP_I:%.*]] = icmp slt i64 [[I]], 2
-; CHECK-NEXT: br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[IF_END_SPLIT]]
+; CHECK-NEXT: br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: [[I3:%.*]] = load i32, ptr @g, align 4
; CHECK-NEXT: ret i32 [[I3]]
diff --git a/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll b/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
index 79a1cb71169665..a2679c0f111cca 100644
--- a/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
+++ b/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
@@ -18,7 +18,7 @@
; CHECK: --- !Missed
; CHECK-NEXT: Pass: loop-interchange
-; CHECK-NEXT: Name: UnsupportedPHI
+; CHECK-NEXT: Name: Dependence
; CHECK-NEXT: Function: reduction_01
; IR-LABEL: @reduction_01(
@@ -71,7 +71,7 @@ for.end8: ; preds = %for.cond1.for.inc6_
; CHECK: --- !Missed
; CHECK-NEXT: Pass: loop-interchange
-; CHECK-NEXT: Name: UnsupportedPHIOuter
+; CHECK-NEXT: Name: UnsupportedPHIOuter
; CHECK-NEXT: Function: reduction_03
; IR-LABEL: @reduction_03(
diff --git a/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll b/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
index bad84224d445ab..691e8f11a0101a 100644
--- a/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
+++ b/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
@@ -28,20 +28,20 @@ define void @innermost_latch_uses_values_in_middle_header() {
; CHECK: [[INNERMOST_HEADER_PREHEADER]]:
; CHECK-NEXT: br label %[[INNERMOST_HEADER:.*]]
; CHECK: [[INNERMOST_HEADER]]:
-; CHECK-NEXT: [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], %[[INNERMOST_LATCH_SPLIT:.*]] ], [ 4, %[[INNERMOST_HEADER_PREHEADER]] ]
+; CHECK-NEXT: [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP3:%.*]], %[[INNERMOST_LATCH_SPLIT:.*]] ], [ 4, %[[INNERMOST_HEADER_PREHEADER]] ]
; CHECK-NEXT: br label %[[MIDDLE_HEADER_PREHEADER]]
; CHECK: [[INNERMOST_BODY]]:
; CHECK-NEXT: [[ARRAYIDX9_I:%.*]] = getelementptr inbounds [1 x [6 x i32]], ptr @d, i64 0, i64 [[INDVAR_INNERMOST]], i64 [[INDVAR_MIDDLE]]
; CHECK-NEXT: store i32 0, ptr [[ARRAYIDX9_I]], align 4
; CHECK-NEXT: br label %[[INNERMOST_LATCH:.*]]
; CHECK: [[INNERMOST_LATCH]]:
-; CHECK-NEXT: [[INDVAR_INNERMOST_NEXT:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], 1
-; CHECK-NEXT: [[TOBOOL5_I:%.*]] = icmp eq i64 [[INDVAR_INNERMOST_NEXT]], [[INDVAR_MIDDLE_WIDE]]
+; CHECK-NEXT: [[TMP1:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], 1
+; CHECK-NEXT: [[TOBOOL5_I:%.*]] = icmp eq i64 [[TMP1]], [[INDVAR_MIDDLE_WIDE]]
; CHECK-NEXT: br label %[[MIDDLE_LATCH]]
; CHECK: [[INNERMOST_LATCH_SPLIT]]:
; CHECK-NEXT: [[INDVAR_MIDDLE_WIDE_LCSSA:%.*]] = phi i64 [ [[INDVAR_MIDDLE_WIDE]], %[[MIDDLE_LATCH]] ]
-; CHECK-NEXT: [[TMP1]] = add nsw i64 [[INDVAR_INNERMOST]], 1
-; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], [[INDVAR_MIDDLE_WIDE_LCSSA]]
+; CHECK-NEXT: [[TMP3]] = add nsw i64 [[INDVAR_INNERMOST]], 1
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP3]], [[INDVAR_MIDDLE_WIDE_LCSSA]]
; CHECK-NEXT: br i1 [[TMP2]], label %[[OUTERMOST_LATCH_LOOPEXIT:.*]], label %[[INNERMOST_HEADER]]
; CHECK: [[MIDDLE_LATCH]]:
; CHECK-NEXT: [[INDVAR_MIDDLE_NEXT]] = add nsw i64 [[INDVAR_MIDDLE]], -1
diff --git a/llvm/test/Transforms/LoopInterchange/lcssa.ll b/llvm/test/Transforms/LoopInterchange/lcssa.ll
index b41eba4ef56173..23db652e9a984d 100644
--- a/llvm/test/Transforms/LoopInterchange/lcssa.ll
+++ b/llvm/test/Transforms/LoopInterchange/lcssa.ll
@@ -177,7 +177,7 @@ for.end16: ; preds = %for.exit
}
; PHI node in inner latch with multiple predecessors.
-; REMARK: Interchanged
+; REMARK: Dependence
; REMARK-NEXT: lcssa_05
define void @lcssa_05(ptr %ptr) {
@@ -222,7 +222,7 @@ for.end16: ; preds = %for.exit
ret void
}
-; REMARK: UnsupportedExitPHI
+; REMARK: Dependence
; REMARK-NEXT: lcssa_06
define void @lcssa_06(ptr %ptr, ptr %ptr1) {
diff --git a/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll b/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
index 6db95c09b175f9..34de449a87c2d3 100644
--- a/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
+++ b/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-loop-lcssa -S %s | FileCheck %s
@b = global [3 x [5 x [8 x i16]]] [[5 x [8 x i16]] zeroinitializer, [5 x [8 x i16]] [[8 x i16] zeroinitializer, [8 x i16] [i16 0, i16 0, i16 0, i16 6, i16 1, i16 6, i16 0, i16 0], [8 x i16] zeroinitializer, [8 x i16] zeroinitializer, [8 x i16] zeroinitializer], [5 x [8 x i16]] zeroinitializer], align 2
@@ -21,44 +22,37 @@
;; }
define void @test1() {
-;CHECK-LABEL: @test1(
-;CHECK: entry:
-;CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
-;CHECK: for.body.preheader:
-;CHECK-NEXT: br label [[FOR_BODY:%.*]]
-;CHECK: for.body:
-;CHECK-NEXT: [[INDVARS_IV22:%.*]] = phi i64 [ [[INDVARS_IV_NEXT23:%.*]], [[FOR_INC8:%.*]] ], [ 0, [[FOR_BODY_PREHEADER:%.*]] ]
-;CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i64 [[INDVARS_IV22:%.*]], 0
-;CHECK-NEXT: br i1 [[TOBOOL]], label [[FOR_BODY3_SPLIT1:%.*]], label [[FOR_BODY3_SPLIT:%.*]]
-;CHECK: for.cond1.preheader:
-;CHECK-NEXT: br label [[FOR_BODY3:%.*]]
-;CHECK: for.body3:
-;CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ %3, [[FOR_BODY3_SPLIT]] ]
-;CHECK-NEXT: br label [[FOR_BODY_PREHEADER]]
-;CHECK: for.body3.split1:
-;CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV22]], 5
-;CHECK-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds [3 x [5 x [8 x i16]]], ptr @b, i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV]], i64 [[TMP0]]
-;CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX7]]
-;CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP1]] to i32
-;CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr @a
-;CHECK-NEXT: [[TMP_OR:%.*]] = or i32 [[TMP2]], [[CONV]]
-;CHECK-NEXT: store i32 [[TMP_OR]], ptr @a
-;CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
-;CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], 3
-;CHECK-NEXT: br label [[FOR_INC8_LOOPEXIT:%.*]]
-;CHECK: for.body3.split:
-;CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
-;CHECK-NEXT: [[TMP4:%.*]] = icmp ne i64 [[TMP3]], 3
-;CHECK-NEXT: br i1 %4, label [[FOR_BODY3]], label [[FOR_END10:%.*]]
-;CHECK: for.inc8.loopexit:
-;CHECK-NEXT: br label [[FOR_INC8]]
-;CHECK: for.inc8:
-;CHECK-NEXT: [[INDVARS_IV_NEXT23]] = add nuw nsw i64 [[INDVARS_IV22]], 1
-;CHECK-NEXT: [[EXITCOND25:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT23]], 3
-;CHECK-NEXT: br i1 [[EXITCOND25]], label [[FOR_BODY]], label [[FOR_BODY3_SPLIT]]
-;CHECK: for.end10:
-;CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr @a
-;CHECK-NEXT: ret void
+; CHECK-LABEL: define void @test1() {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_BODY]]:
+; CHECK-NEXT: [[INDVARS_IV22:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT23:%.*]], %[[FOR_INC8:.*]] ]
+; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i64 [[INDVARS_IV22]], 0
+; CHECK-NEXT: br i1 [[TOBOOL]], label %[[FOR_COND1_PREHEADER:.*]], label %[[FOR_INC8]]
+; CHECK: [[FOR_COND1_PREHEADER]]:
+; CHECK-NEXT: br label %[[FOR_BODY3:.*]]
+; CHECK: [[FOR_BODY3]]:
+; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_COND1_PREHEADER]] ], [ [[TMP3:%.*]], %[[FOR_BODY3]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV22]], 5
+; CHECK-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds [3 x [5 x [8 x i16]]], ptr @b, i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV]], i64 [[TMP0]]
+; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX7]], align 2
+; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr @a, align 4
+; CHECK-NEXT: [[TMP_OR:%.*]] = or i32 [[TMP2]], [[CONV]]
+; CHECK-NEXT: store i32 [[TMP_OR]], ptr @a, align 4
+; CHECK-NEXT: [[TMP3]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ne i64 [[TMP3]], 3
+; CHECK-NEXT: br i1 [[TMP4]], label %[[FOR_BODY3]], label %[[FOR_INC8_LOOPEXIT:.*]]
+; CHECK: [[FOR_INC8_LOOPEXIT]]:
+; CHECK-NEXT: br label %[[FOR_INC8]]
+; CHECK: [[FOR_INC8]]:
+; CHECK-NEXT: [[INDVARS_IV_NEXT23]] = add nuw nsw i64 [[INDVARS_IV22]], 1
+; CHECK-NEXT: [[EXITCOND25:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT23]], 3
+; CHECK-NEXT: br i1 [[EXITCOND25]], label %[[FOR_BODY]], label %[[FOR_END10:.*]]
+; CHECK: [[FOR_END10]]:
+; CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr @a, align 4
+; CHECK-NEXT: ret void
+;
entry:
br label %for.body
@@ -118,45 +112,45 @@ for.end10: ; preds = %for.inc8
;; }
define void @test2() {
-; CHECK-LABEL: @test2(
-; CHECK-NEXT: entry:
-; CHECK-NEXT: br label [[OUTERMOST_HEADER:%.*]]
-; CHECK: outermost.header:
-; CHECK-NEXT: [[INDVAR_OUTERMOST:%.*]] = phi i32 [ 10, [[ENTRY:%.*]] ], [ [[INDVAR_OUTERMOST_NEXT:%.*]], [[OUTERMOST_LATCH:%.*]] ]
+; CHECK-LABEL: define void @test2() {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: br label %[[OUTERMOST_HEADER:.*]]
+; CHECK: [[OUTERMOST_HEADER]]:
+; CHECK-NEXT: [[INDVAR_OUTERMOST:%.*]] = phi i32 [ 10, %[[ENTRY]] ], [ [[INDVAR_OUTERMOST_NEXT:%.*]], %[[OUTERMOST_LATCH:.*]] ]
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr @a, align 4
; CHECK-NEXT: [[TOBOOL71_I:%.*]] = icmp eq i32 [[TMP0]], 0
-; CHECK-NEXT: br label [[INNERMOST_PREHEADER:%.*]]
-; CHECK: middle.header.preheader:
-; CHECK-NEXT: br label [[MIDDLE_HEADER:%.*]]
-; CHECK: middle.header:
-; CHECK-NEXT: [[INDVAR_MIDDLE:%.*]] = phi i64 [ [[INDVAR_MIDDLE_NEXT:%.*]], [[MIDDLE_LATCH:%.*]] ], [ 4, [[MIDDLE_HEADER_PREHEADER:%.*]] ]
-; CHECK-NEXT: br i1 [[TOBOOL71_I]], label [[INNERMOST_BODY_SPLIT1:%.*]], label [[INNERMOST_BODY_SPLIT:%.*]]
-; CHECK: innermost.preheader:
-; CHECK-NEXT: br label [[INNERMOST_BODY:%.*]]
-; CHECK: innermost.body:
-; CHECK-NEXT: [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], [[INNERMOST_BODY_SPLIT]] ], [ 4, [[INNERMOST_PREHEADER]] ]
-; CHECK-NEXT: br label [[MIDDLE_HEADER_PREHEADER]]
-; CHECK: innermost.body.split1:
+; CHECK-NEXT: br label %[[INNERMOST_PREHEADER:.*]]
+; CHECK: [[MIDDLE_HEADER_PREHEADER:.*]]:
+; CHECK-NEXT: br label %[[MIDDLE_HEADER:.*]]
+; CHECK: [[MIDDLE_HEADER]]:
+; CHECK-NEXT: [[INDVAR_MIDDLE:%.*]] = phi i64 [ [[INDVAR_MIDDLE_NEXT:%.*]], %[[MIDDLE_LATCH:.*]] ], [ 4, %[[MIDDLE_HEADER_PREHEADER]] ]
+; CHECK-NEXT: br i1 [[TOBOOL71_I]], label %[[INNERMOST_BODY_SPLIT1:.*]], label %[[INNERMOST_BODY_SPLIT:.*]]
+; CHECK: [[INNERMOST_PREHEADER]]:
+; CHECK-NEXT: br label %[[INNERMOST_BODY:.*]]
+; CHECK: [[INNERMOST_BODY]]:
+; CHECK-NEXT: [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], %[[INNERMOST_BODY_SPLIT]] ], [ 4, %[[INNERMOST_PREHEADER]] ]
+; CHECK-NEXT: br label %[[MIDDLE_HEADER_PREHEADER]]
+; CHECK: [[INNERMOST_BODY_SPLIT1]]:
; CHECK-NEXT: [[ARRAYIDX9_I:%.*]] = getelementptr inbounds [1 x [6 x i32]], ptr @d, i64 0, i64 [[INDVAR_INNERMOST]], i64 [[INDVAR_MIDDLE]]
; CHECK-NEXT: store i32 0, ptr [[ARRAYIDX9_I]], align 4
; CHECK-NEXT: [[INDVAR_INNERMOST_NEXT:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], -1
; CHECK-NEXT: [[TOBOOL5_I:%.*]] = icmp eq i64 [[INDVAR_INNERMOST_NEXT]], 0
-; CHECK-NEXT: br label [[MIDDLE_LATCH_LOOPEXIT:%.*]]
-; CHECK: innermost.body.split:
+; CHECK-NEXT: br label %[[INNERMOST_LOOPEXIT:.*]]
+; CHECK: [[INNERMOST_BODY_SPLIT]]:
; CHECK-NEXT: [[TMP1]] = add nsw i64 [[INDVAR_INNERMOST]], -1
; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 0
-; CHECK-NEXT: br i1 [[TMP2]], label [[OUTERMOST_LATCH]], label [[INNERMOST_BODY]]
-; CHECK: innermost.loopexit:
-; CHECK-NEXT: br label [[MIDDLE_LATCH]]
-; CHECK: middle.latch:
+; CHECK-NEXT: br i1 [[TMP2]], label %[[OUTERMOST_LATCH]], label %[[INNERMOST_BODY]]
+; CHECK: [[INNERMOST_LOOPEXIT]]:
+; CHECK-NEXT: br label %[[MIDDLE_LATCH]]
+; CHECK: [[MIDDLE_LATCH]]:
; CHECK-NEXT: [[INDVAR_MIDDLE_NEXT]] = add nsw i64 [[INDVAR_MIDDLE]], -1
; CHECK-NEXT: [[TOBOOL2_I:%.*]] = icmp eq i64 [[INDVAR_MIDDLE_NEXT]], 0
-; CHECK-NEXT: br i1 [[TOBOOL2_I]], label [[INNERMOST_BODY_SPLIT]], label [[MIDDLE_HEADER]]
-; CHECK: outermost.latch:
+; CHECK-NEXT: br i1 [[TOBOOL2_I]], label %[[INNERMOST_BODY_SPLIT]], label %[[MIDDLE_HEADER]]
+; CHECK: [[OUTERMOST_LATCH]]:
; CHECK-NEXT: [[INDVAR_OUTERMOST_NEXT]] = add nsw i32 [[INDVAR_OUTERMOST]], -5
; CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp eq i32 [[INDVAR_OUTERMOST_NEXT]], 0
-; CHECK-NEXT: br i1 [[TOBOOL_I]], label [[OUTERMOST_EXIT:%.*]], label [[OUTERMOST_HEADER]]
-; CHECK: outermost.exit:
+; CHECK-NEXT: br i1 [[TOBOOL_I]], label %[[OUTERMOST_EXIT:.*]], label %[[OUTERMOST_HEADER]]
+; CHECK: [[OUTERMOST_EXIT]]...
[truncated]
|
@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx, | |||
// if the direction matrix, after the same permutation is applied to its | |||
// columns, has no ">" direction as the leftmost non-"=" direction in any row. | |||
static bool isLexicographicallyPositive(std::vector<char> &DV) { | |||
bool HasScalar = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unused?
@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx, | |||
// if the direction matrix, after the same permutation is applied to its | |||
// columns, has no ">" direction as the leftmost non-"=" direction in any row. | |||
static bool isLexicographicallyPositive(std::vector<char> &DV) { | |||
bool HasScalar = false; | |||
bool FirstPosScalar = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand what FirstPosScalar
means. This legality check seems to allow the permutation of [S <]
to [< S]
. Is this transformation legal? Or can such a case not happen? I tried some loops that have a row of [S <]
, and as far as these cases, the legality check rejects the interchange because of another row like [S =]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would second this. It looks to me that this patch allows some specific pattern to be interchanged without a schematic reasoning.
3ab0e49
to
dae2cad
Compare
Thanks for the review @kasuga-fj and @CongzheUalberta! I agree with your feedback, I made a mess of the logic. I have now simplified things, the idea is to simply reject S dependencies:
I have updated the commit message / description of this patch with a better explanation. |
I honestly never understood why a dependency that is "scalar" (that is, the dimension is not referenced in the index expression) is handled fundamentally different. If not references it should be equivalent to accessing (any of) the previous iteration. How about we remove
in populateDependencyMatrix entirely, so a dependency is never In any case, the patch in the current form should be conservatively correct. |
I would be happy to remove it in the way you suggested, @Meinersbur. My only minor concern is that if we want to deal with these sort of dependencies later, we would need to restore some of this logic, so removing it now would mean more churn later? But let me know what you prefer, I am happy either way (leave it as it is because it is conservative/correct as you say, or implement your suggestion). |
I agree with removing the 'S'. I think leaving the incorrect dependencies risks causing correctness issues, especially when we change the code in the future. At the extreme, we need to insert something like |
@sjoerdmeijer What kind of interchange do you want to preserve with Besides, this PR currently will just reject everything with |
Okay, excellent points. I will prepare a new revision that removes |
dae2cad
to
95395bb
Compare
Thanks for the review @Meinersbur , @kasuga-fj: this now no longer inserts 'S' into the dependency matrix, it should default to '*'. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me. but some tests do not appear to have achieved their original objectives. I'm not sure what we should do in such cases, but in my opinion a mechanical update does not make sense. Marking XFAIL
may be one option.
llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll
Show resolved
Hide resolved
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me. but some tests do not appear to have achieved their original objectives. I'm not sure what we should do in such cases, but in my opinion a mechanical update does not make sense. Marking
XFAIL
may be one option.
Thanks, and I also had a bit of a nagging feeling about the test cases. I quite like the idea about the XFAILs. I think what I will do is the following:
- I will first convert the test cases that were modified and check the full IR to tests that check the optimisation remark in a separate patch. Checking the the optimisation remark is more compact, less verbose and easier to see.
- I will then see if adding XFAILS here is better, I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, and I also had a bit of a nagging feeling about the test cases. I quite like the idea about the XFAILs. I think what I will do is the following:
- I will first convert the test cases that were modified and check the full IR to tests that check the optimisation remark in a separate patch. Checking the the optimisation remark is more compact, less verbose and easier to see.
- I will then see if adding XFAILS here is better, I think so.
Cool! I second this idea.
Checking the remark message if interchange did or didn't happen is more straight forward than the full IR for these cases. This comment was also made when I moved some tests away from relying on debug builds in change llvm#116780, and this is a prep step for llvm#119345 that is going to change these steps.
Checking the remark message if interchange did or didn't happen is more straight forward than checking the full IR for these cases. This comment was also made when I moved some tests away from relying on debug builds in change llvm#116780, and this is a prep step for llvm#119345 that is going to change these steps.
Checking the remark message if interchange did or didn't happen is more straight forward than the full IR for these cases. This comment was also made when I moved some tests away from relying on debug builds in change #116780, and this is a prep step for #119345 that is going to change these test cases.
95395bb
to
ccc3110
Compare
I have now XFAIL'ed the test cases, with 2 exceptions where it was already rejected but for different reasons. |
ccc3110
to
4600d04
Compare
4600d04
to
ccc410e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, now all that remains is to apply the formatter
Thanks a lot for the reviews, will do that before merging this. |
We are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that: [LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" llvm#54176 [LoopInterchange] Interchange breaks program correctness llvm#46867 [LoopInterchange] Loops should not interchanged due to dependencies llvm#47259 [LoopInterchange] Loops should not interchanged due to control flow llvm#47401 This patch does no longer insert the "S" dependency/direction into the dependency matrix, so a dependency is never "S". We seem to have forgotten what the exact meaning is of this dependency type, and don't see why it should be treated differently. We prefer correctness over incorrect and more aggressive results. I.e., this prevents the miscompiles at the expense of handling less cases, i.e. making interchange more pessimistic. However, some of the cases that are now rejected for dependence analysis reasons, were rejected before too but for other reasons (e.g. profitability). So at least for the llvm regression tests, the number of regression are very reasonable. This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later see if we can get solve the regressions.
ccc410e
to
383b011
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a work in progress patch to enable loop-interchange by default and is a continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345) - [LoopInterchange] Fix overflow in cost calculation (llvm#111807) - [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj - [DA] disambiguate evolution of base addresses (llvm#116628) Compile-times: - [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973) - [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128) - [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247) And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - [DA] do not handle array accesses of different offsets (llvm#123436) - [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630) - [DA] use NSW arithmetic llvm#116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing llvm#124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.
The incorrect handling of scalar dependencies in LoopInterchange was fixed by llvm#119345. This patch adds tests that are relative to the issues fixed by llvm#119345.
The incorrect handling of scalar dependencies in LoopInterchange was fixed by llvm#119345. This patch adds tests that are relative to the issues fixed by llvm#119345.
We are not handling 'S' scalar dependencies correctly and have at least
the following miscompiles related to that:
[LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176
[LoopInterchange] Interchange breaks program correctness #46867
[LoopInterchange] Loops should not interchanged due to dependencies #47259
[LoopInterchange] Loops should not interchanged due to control flow #47401
This patch does no longer insert the "S" dependency/direction into the
dependency matrix, so a dependency is never "S". We seem to have
forgotten what the exact meaning is of this dependency type, and don't
see why it should be treated differently.
We prefer correctness over incorrect and more aggressive results. I.e.,
this prevents the miscompiles at the expense of handling less cases,
i.e. making interchange more pessimistic. However, some of the cases
that are now rejected for dependence analysis reasons, were rejected
before too but for other reasons (e.g. profitability). So at least for
the llvm regression tests, the number of regression are very reasonable.
This should be a stopgap. We would like to get interchange enabled by
default and thus prefer correctness over unsafe transforms, and later
see if we can get solve the regressions.