Skip to content

[LoopInterchange] Remove 'S' Scalar Dependencies #119345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

sjoerdmeijer
Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer commented Dec 10, 2024

We are not handling 'S' scalar dependencies correctly and have at least
the following miscompiles related to that:

[LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176
[LoopInterchange] Interchange breaks program correctness #46867
[LoopInterchange] Loops should not interchanged due to dependencies #47259
[LoopInterchange] Loops should not interchanged due to control flow #47401

This patch does no longer insert the "S" dependency/direction into the
dependency matrix, so a dependency is never "S". We seem to have
forgotten what the exact meaning is of this dependency type, and don't
see why it should be treated differently.

We prefer correctness over incorrect and more aggressive results. I.e.,
this prevents the miscompiles at the expense of handling less cases,
i.e. making interchange more pessimistic. However, some of the cases
that are now rejected for dependence analysis reasons, were rejected
before too but for other reasons (e.g. profitability). So at least for
the llvm regression tests, the number of regression are very reasonable.
This should be a stopgap. We would like to get interchange enabled by
default and thus prefer correctness over unsafe transforms, and later
see if we can get solve the regressions.

@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

This makes loop-interchange more pessimistic, but also more correct as we are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that:

[LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176 [LoopInterchange] Interchange breaks program correctness #46867
[LoopInterchange] Loops should not interchanged due to dependencies #47259
[LoopInterchange] Loops should not interchanged due to control flow #47401

This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later lift this restriction.


Patch is 25.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119345.diff

9 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+8)
  • (modified) llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll (+12-32)
  • (modified) llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopInterchange/lcssa.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll (+59-65)
  • (modified) llvm/test/Transforms/LoopInterchange/profitability.ll (+6-14)
  • (modified) llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopInterchange/vector-gep-operand.ll (+3-13)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index a0c0080c0bda1c..909891f84dc191 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx,
 // if the direction matrix, after the same permutation is applied to its
 // columns, has no ">" direction as the leftmost non-"=" direction in any row.
 static bool isLexicographicallyPositive(std::vector<char> &DV) {
+  bool HasScalar = false;
+  bool FirstPosScalar = true;
   for (unsigned char Direction : DV) {
+    // FIXME: we reject scalar dependencies because we currently don't
+    // handle them correctly. This makes the legality check more conservative
+    // than needed, but is a stopgap to avoid miscompiles.
+    if (Direction == 'S' && !FirstPosScalar)
+      return false;
     if (Direction == '<')
       return true;
     if (Direction == '>' || Direction == '*')
       return false;
+    FirstPosScalar = false;
   }
   return true;
 }
diff --git a/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll b/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
index b3383655668980..2168cde7f5374b 100644
--- a/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
+++ b/llvm/test/Transforms/LoopInterchange/gh54176-scalar-deps.ll
@@ -29,19 +29,13 @@
 define dso_local i32 @test1(i1 %cond) {
 ; CHECK-LABEL: define dso_local i32 @test1(
 ; CHECK-SAME: i1 [[COND:%.*]]) {
-; CHECK-NEXT:  [[FOR_PREHEADER:.*:]]
-; CHECK-NEXT:    br label %[[INNERLOOP_PREHEADER:.*]]
-; CHECK:       [[OUTERLOOP_PREHEADER:.*]]:
+; CHECK-NEXT:  [[FOR_PREHEADER:.*]]:
 ; CHECK-NEXT:    br label %[[OUTERLOOP:.*]]
 ; CHECK:       [[OUTERLOOP]]:
-; CHECK-NEXT:    [[I:%.*]] = phi i64 [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ], [ 0, %[[OUTERLOOP_PREHEADER]] ]
-; CHECK-NEXT:    br label %[[INNERLOOP_SPLIT:.*]]
-; CHECK:       [[INNERLOOP_PREHEADER]]:
+; CHECK-NEXT:    [[I:%.*]] = phi i64 [ 0, %[[FOR_PREHEADER]] ], [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ]
 ; CHECK-NEXT:    br label %[[INNERLOOP:.*]]
 ; CHECK:       [[INNERLOOP]]:
-; CHECK-NEXT:    [[J:%.*]] = phi i64 [ [[TMP0:%.*]], %[[IF_END_SPLIT:.*]] ], [ 0, %[[INNERLOOP_PREHEADER]] ]
-; CHECK-NEXT:    br label %[[OUTERLOOP_PREHEADER]]
-; CHECK:       [[INNERLOOP_SPLIT]]:
+; CHECK-NEXT:    [[J:%.*]] = phi i64 [ 0, %[[OUTERLOOP]] ], [ [[TMP0:%.*]], %[[IF_END:.*]] ]
 ; CHECK-NEXT:    [[ARRAYIDX6_I:%.*]] = getelementptr inbounds [4 x [9 x i32]], ptr @f, i64 0, i64 [[J]], i64 [[I]]
 ; CHECK-NEXT:    [[I1:%.*]] = load i32, ptr [[ARRAYIDX6_I]], align 4
 ; CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp eq i32 [[I1]], 0
@@ -50,24 +44,20 @@ define dso_local i32 @test1(i1 %cond) {
 ; CHECK-NEXT:    store i32 3, ptr @g, align 4
 ; CHECK-NEXT:    br label %[[LAND_END]]
 ; CHECK:       [[LAND_END]]:
-; CHECK-NEXT:    br i1 [[COND]], label %[[IF_END:.*]], label %[[IF_THEN:.*]]
+; CHECK-NEXT:    br i1 [[COND]], label %[[IF_END]], label %[[IF_THEN:.*]]
 ; CHECK:       [[IF_THEN]]:
 ; CHECK-NEXT:    [[I2:%.*]] = load i32, ptr @g, align 4
 ; CHECK-NEXT:    [[INC_I:%.*]] = add i32 [[I2]], 1
 ; CHECK-NEXT:    store i32 [[INC_I]], ptr @g, align 4
 ; CHECK-NEXT:    br label %[[IF_END]]
 ; CHECK:       [[IF_END]]:
-; CHECK-NEXT:    [[J_NEXT:%.*]] = add nuw nsw i64 [[J]], 1
-; CHECK-NEXT:    [[EXITCOND_I:%.*]] = icmp eq i64 [[J_NEXT]], 3
-; CHECK-NEXT:    br label %[[FOR_LATCH]]
-; CHECK:       [[IF_END_SPLIT]]:
 ; CHECK-NEXT:    [[TMP0]] = add nuw nsw i64 [[J]], 1
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i64 [[TMP0]], 3
-; CHECK-NEXT:    br i1 [[TMP1]], label %[[EXIT:.*]], label %[[INNERLOOP]]
+; CHECK-NEXT:    br i1 [[TMP1]], label %[[FOR_LATCH]], label %[[INNERLOOP]]
 ; CHECK:       [[FOR_LATCH]]:
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT21_I]] = add nsw i64 [[I]], 1
 ; CHECK-NEXT:    [[CMP_I:%.*]] = icmp slt i64 [[I]], 2
-; CHECK-NEXT:    br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[IF_END_SPLIT]]
+; CHECK-NEXT:    br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[EXIT:.*]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    [[I3:%.*]] = load i32, ptr @g, align 4
 ; CHECK-NEXT:    ret i32 [[I3]]
@@ -139,19 +129,13 @@ exit:
 define dso_local i32 @test2(i1 %cond) {
 ; CHECK-LABEL: define dso_local i32 @test2(
 ; CHECK-SAME: i1 [[COND:%.*]]) {
-; CHECK-NEXT:  [[FOR_PREHEADER:.*:]]
-; CHECK-NEXT:    br label %[[INNERLOOP_PREHEADER:.*]]
-; CHECK:       [[OUTERLOOP_PREHEADER:.*]]:
+; CHECK-NEXT:  [[FOR_PREHEADER:.*]]:
 ; CHECK-NEXT:    br label %[[OUTERLOOP:.*]]
 ; CHECK:       [[OUTERLOOP]]:
-; CHECK-NEXT:    [[I:%.*]] = phi i64 [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ], [ 0, %[[OUTERLOOP_PREHEADER]] ]
-; CHECK-NEXT:    br label %[[INNERLOOP_SPLIT:.*]]
-; CHECK:       [[INNERLOOP_PREHEADER]]:
+; CHECK-NEXT:    [[I:%.*]] = phi i64 [ 0, %[[FOR_PREHEADER]] ], [ [[INDVARS_IV_NEXT21_I:%.*]], %[[FOR_LATCH:.*]] ]
 ; CHECK-NEXT:    br label %[[INNERLOOP:.*]]
 ; CHECK:       [[INNERLOOP]]:
-; CHECK-NEXT:    [[J:%.*]] = phi i64 [ [[TMP0:%.*]], %[[IF_END_SPLIT:.*]] ], [ 0, %[[INNERLOOP_PREHEADER]] ]
-; CHECK-NEXT:    br label %[[OUTERLOOP_PREHEADER]]
-; CHECK:       [[INNERLOOP_SPLIT]]:
+; CHECK-NEXT:    [[J:%.*]] = phi i64 [ 0, %[[OUTERLOOP]] ], [ [[TMP0:%.*]], %[[IF_END:.*]] ]
 ; CHECK-NEXT:    [[ARRAYIDX6_I:%.*]] = getelementptr inbounds [4 x [9 x i32]], ptr @f, i64 0, i64 [[J]], i64 [[I]]
 ; CHECK-NEXT:    [[I1:%.*]] = load i32, ptr [[ARRAYIDX6_I]], align 4
 ; CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp eq i32 [[I1]], 0
@@ -160,24 +144,20 @@ define dso_local i32 @test2(i1 %cond) {
 ; CHECK:       [[LAND_RHS]]:
 ; CHECK-NEXT:    br label %[[LAND_END]]
 ; CHECK:       [[LAND_END]]:
-; CHECK-NEXT:    br i1 [[COND]], label %[[IF_END:.*]], label %[[IF_THEN:.*]]
+; CHECK-NEXT:    br i1 [[COND]], label %[[IF_END]], label %[[IF_THEN:.*]]
 ; CHECK:       [[IF_THEN]]:
 ; CHECK-NEXT:    [[I2:%.*]] = load i32, ptr @g, align 4
 ; CHECK-NEXT:    [[INC_I:%.*]] = add i32 [[I2]], 1
 ; CHECK-NEXT:    store i32 [[INC_I]], ptr @g, align 4
 ; CHECK-NEXT:    br label %[[IF_END]]
 ; CHECK:       [[IF_END]]:
-; CHECK-NEXT:    [[J_NEXT:%.*]] = add nuw nsw i64 [[J]], 1
-; CHECK-NEXT:    [[EXITCOND_I:%.*]] = icmp eq i64 [[J_NEXT]], 3
-; CHECK-NEXT:    br label %[[FOR_LATCH]]
-; CHECK:       [[IF_END_SPLIT]]:
 ; CHECK-NEXT:    [[TMP0]] = add nuw nsw i64 [[J]], 1
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i64 [[TMP0]], 3
-; CHECK-NEXT:    br i1 [[TMP1]], label %[[EXIT:.*]], label %[[INNERLOOP]]
+; CHECK-NEXT:    br i1 [[TMP1]], label %[[FOR_LATCH]], label %[[INNERLOOP]]
 ; CHECK:       [[FOR_LATCH]]:
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT21_I]] = add nsw i64 [[I]], 1
 ; CHECK-NEXT:    [[CMP_I:%.*]] = icmp slt i64 [[I]], 2
-; CHECK-NEXT:    br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[IF_END_SPLIT]]
+; CHECK-NEXT:    br i1 [[CMP_I]], label %[[OUTERLOOP]], label %[[EXIT:.*]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    [[I3:%.*]] = load i32, ptr @g, align 4
 ; CHECK-NEXT:    ret i32 [[I3]]
diff --git a/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll b/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
index 79a1cb71169665..a2679c0f111cca 100644
--- a/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
+++ b/llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
@@ -18,7 +18,7 @@
 
 ; CHECK: --- !Missed
 ; CHECK-NEXT: Pass:            loop-interchange
-; CHECK-NEXT: Name:            UnsupportedPHI
+; CHECK-NEXT: Name:            Dependence 
 ; CHECK-NEXT: Function:        reduction_01
 
 ; IR-LABEL: @reduction_01(
@@ -71,7 +71,7 @@ for.end8:                                         ; preds = %for.cond1.for.inc6_
 
 ; CHECK: --- !Missed
 ; CHECK-NEXT: Pass:            loop-interchange
-; CHECK-NEXT: Name:            UnsupportedPHIOuter
+; CHECK-NEXT: Name:            UnsupportedPHIOuter 
 ; CHECK-NEXT: Function:        reduction_03
 
 ; IR-LABEL: @reduction_03(
diff --git a/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll b/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
index bad84224d445ab..691e8f11a0101a 100644
--- a/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
+++ b/llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
@@ -28,20 +28,20 @@ define void @innermost_latch_uses_values_in_middle_header() {
 ; CHECK:       [[INNERMOST_HEADER_PREHEADER]]:
 ; CHECK-NEXT:    br label %[[INNERMOST_HEADER:.*]]
 ; CHECK:       [[INNERMOST_HEADER]]:
-; CHECK-NEXT:    [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], %[[INNERMOST_LATCH_SPLIT:.*]] ], [ 4, %[[INNERMOST_HEADER_PREHEADER]] ]
+; CHECK-NEXT:    [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP3:%.*]], %[[INNERMOST_LATCH_SPLIT:.*]] ], [ 4, %[[INNERMOST_HEADER_PREHEADER]] ]
 ; CHECK-NEXT:    br label %[[MIDDLE_HEADER_PREHEADER]]
 ; CHECK:       [[INNERMOST_BODY]]:
 ; CHECK-NEXT:    [[ARRAYIDX9_I:%.*]] = getelementptr inbounds [1 x [6 x i32]], ptr @d, i64 0, i64 [[INDVAR_INNERMOST]], i64 [[INDVAR_MIDDLE]]
 ; CHECK-NEXT:    store i32 0, ptr [[ARRAYIDX9_I]], align 4
 ; CHECK-NEXT:    br label %[[INNERMOST_LATCH:.*]]
 ; CHECK:       [[INNERMOST_LATCH]]:
-; CHECK-NEXT:    [[INDVAR_INNERMOST_NEXT:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], 1
-; CHECK-NEXT:    [[TOBOOL5_I:%.*]] = icmp eq i64 [[INDVAR_INNERMOST_NEXT]], [[INDVAR_MIDDLE_WIDE]]
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], 1
+; CHECK-NEXT:    [[TOBOOL5_I:%.*]] = icmp eq i64 [[TMP1]], [[INDVAR_MIDDLE_WIDE]]
 ; CHECK-NEXT:    br label %[[MIDDLE_LATCH]]
 ; CHECK:       [[INNERMOST_LATCH_SPLIT]]:
 ; CHECK-NEXT:    [[INDVAR_MIDDLE_WIDE_LCSSA:%.*]] = phi i64 [ [[INDVAR_MIDDLE_WIDE]], %[[MIDDLE_LATCH]] ]
-; CHECK-NEXT:    [[TMP1]] = add nsw i64 [[INDVAR_INNERMOST]], 1
-; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], [[INDVAR_MIDDLE_WIDE_LCSSA]]
+; CHECK-NEXT:    [[TMP3]] = add nsw i64 [[INDVAR_INNERMOST]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP3]], [[INDVAR_MIDDLE_WIDE_LCSSA]]
 ; CHECK-NEXT:    br i1 [[TMP2]], label %[[OUTERMOST_LATCH_LOOPEXIT:.*]], label %[[INNERMOST_HEADER]]
 ; CHECK:       [[MIDDLE_LATCH]]:
 ; CHECK-NEXT:    [[INDVAR_MIDDLE_NEXT]] = add nsw i64 [[INDVAR_MIDDLE]], -1
diff --git a/llvm/test/Transforms/LoopInterchange/lcssa.ll b/llvm/test/Transforms/LoopInterchange/lcssa.ll
index b41eba4ef56173..23db652e9a984d 100644
--- a/llvm/test/Transforms/LoopInterchange/lcssa.ll
+++ b/llvm/test/Transforms/LoopInterchange/lcssa.ll
@@ -177,7 +177,7 @@ for.end16:                                        ; preds = %for.exit
 }
 
 ; PHI node in inner latch with multiple predecessors.
-; REMARK: Interchanged
+; REMARK:      Dependence 
 ; REMARK-NEXT: lcssa_05
 
 define void @lcssa_05(ptr %ptr) {
@@ -222,7 +222,7 @@ for.end16:                                        ; preds = %for.exit
   ret void
 }
 
-; REMARK: UnsupportedExitPHI
+; REMARK:      Dependence 
 ; REMARK-NEXT: lcssa_06
 
 define void @lcssa_06(ptr %ptr, ptr %ptr1) {
diff --git a/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll b/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
index 6db95c09b175f9..34de449a87c2d3 100644
--- a/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
+++ b/llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-loop-lcssa -S %s | FileCheck %s
 
 @b = global [3 x [5 x [8 x i16]]] [[5 x [8 x i16]] zeroinitializer, [5 x [8 x i16]] [[8 x i16] zeroinitializer, [8 x i16] [i16 0, i16 0, i16 0, i16 6, i16 1, i16 6, i16 0, i16 0], [8 x i16] zeroinitializer, [8 x i16] zeroinitializer, [8 x i16] zeroinitializer], [5 x [8 x i16]] zeroinitializer], align 2
@@ -21,44 +22,37 @@
 ;; }
 
 define void @test1() {
-;CHECK-LABEL: @test1(
-;CHECK:          entry:
-;CHECK-NEXT:       br label [[FOR_COND1_PREHEADER:%.*]]
-;CHECK:          for.body.preheader:
-;CHECK-NEXT:       br label  [[FOR_BODY:%.*]]
-;CHECK:          for.body:
-;CHECK-NEXT:       [[INDVARS_IV22:%.*]] = phi i64 [ [[INDVARS_IV_NEXT23:%.*]], [[FOR_INC8:%.*]] ], [ 0, [[FOR_BODY_PREHEADER:%.*]] ]
-;CHECK-NEXT:       [[TOBOOL:%.*]] = icmp eq i64 [[INDVARS_IV22:%.*]], 0
-;CHECK-NEXT:       br i1 [[TOBOOL]], label [[FOR_BODY3_SPLIT1:%.*]], label [[FOR_BODY3_SPLIT:%.*]]
-;CHECK:          for.cond1.preheader:
-;CHECK-NEXT:       br label [[FOR_BODY3:%.*]]
-;CHECK:          for.body3:
-;CHECK-NEXT:       [[INDVARS_IV:%.*]] = phi i64 [ 0, [[FOR_COND1_PREHEADER]] ], [ %3, [[FOR_BODY3_SPLIT]] ]
-;CHECK-NEXT:        br label [[FOR_BODY_PREHEADER]]
-;CHECK:          for.body3.split1:
-;CHECK-NEXT:       [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV22]], 5
-;CHECK-NEXT:       [[ARRAYIDX7:%.*]] = getelementptr inbounds [3 x [5 x [8 x i16]]], ptr @b, i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV]], i64 [[TMP0]]
-;CHECK-NEXT:       [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX7]]
-;CHECK-NEXT:       [[CONV:%.*]] = sext i16 [[TMP1]] to i32
-;CHECK-NEXT:       [[TMP2:%.*]] = load i32, ptr @a
-;CHECK-NEXT:       [[TMP_OR:%.*]] = or i32 [[TMP2]], [[CONV]]
-;CHECK-NEXT:       store i32 [[TMP_OR]], ptr @a
-;CHECK-NEXT:       [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
-;CHECK-NEXT:       [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], 3
-;CHECK-NEXT:       br label [[FOR_INC8_LOOPEXIT:%.*]]
-;CHECK:          for.body3.split:
-;CHECK-NEXT:       [[TMP3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
-;CHECK-NEXT:       [[TMP4:%.*]] = icmp ne i64 [[TMP3]], 3
-;CHECK-NEXT:       br i1 %4, label [[FOR_BODY3]], label [[FOR_END10:%.*]]
-;CHECK:          for.inc8.loopexit:
-;CHECK-NEXT:       br label [[FOR_INC8]]
-;CHECK:          for.inc8:
-;CHECK-NEXT:       [[INDVARS_IV_NEXT23]] = add nuw nsw i64 [[INDVARS_IV22]], 1
-;CHECK-NEXT:       [[EXITCOND25:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT23]], 3
-;CHECK-NEXT:       br i1 [[EXITCOND25]], label [[FOR_BODY]], label [[FOR_BODY3_SPLIT]]
-;CHECK:         for.end10:
-;CHECK-NEXT:       [[TMP5:%.*]] = load i32, ptr @a
-;CHECK-NEXT:       ret void
+; CHECK-LABEL: define void @test1() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[INDVARS_IV22:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT23:%.*]], %[[FOR_INC8:.*]] ]
+; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq i64 [[INDVARS_IV22]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL]], label %[[FOR_COND1_PREHEADER:.*]], label %[[FOR_INC8]]
+; CHECK:       [[FOR_COND1_PREHEADER]]:
+; CHECK-NEXT:    br label %[[FOR_BODY3:.*]]
+; CHECK:       [[FOR_BODY3]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[FOR_COND1_PREHEADER]] ], [ [[TMP3:%.*]], %[[FOR_BODY3]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = add nuw nsw i64 [[INDVARS_IV22]], 5
+; CHECK-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds [3 x [5 x [8 x i16]]], ptr @b, i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV]], i64 [[TMP0]]
+; CHECK-NEXT:    [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX7]], align 2
+; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr @a, align 4
+; CHECK-NEXT:    [[TMP_OR:%.*]] = or i32 [[TMP2]], [[CONV]]
+; CHECK-NEXT:    store i32 [[TMP_OR]], ptr @a, align 4
+; CHECK-NEXT:    [[TMP3]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i64 [[TMP3]], 3
+; CHECK-NEXT:    br i1 [[TMP4]], label %[[FOR_BODY3]], label %[[FOR_INC8_LOOPEXIT:.*]]
+; CHECK:       [[FOR_INC8_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[FOR_INC8]]
+; CHECK:       [[FOR_INC8]]:
+; CHECK-NEXT:    [[INDVARS_IV_NEXT23]] = add nuw nsw i64 [[INDVARS_IV22]], 1
+; CHECK-NEXT:    [[EXITCOND25:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT23]], 3
+; CHECK-NEXT:    br i1 [[EXITCOND25]], label %[[FOR_BODY]], label %[[FOR_END10:.*]]
+; CHECK:       [[FOR_END10]]:
+; CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr @a, align 4
+; CHECK-NEXT:    ret void
+;
 
 entry:
   br label %for.body
@@ -118,45 +112,45 @@ for.end10:                                        ; preds = %for.inc8
 ;; }
 
 define void @test2() {
-; CHECK-LABEL: @test2(
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[OUTERMOST_HEADER:%.*]]
-; CHECK:       outermost.header:
-; CHECK-NEXT:    [[INDVAR_OUTERMOST:%.*]] = phi i32 [ 10, [[ENTRY:%.*]] ], [ [[INDVAR_OUTERMOST_NEXT:%.*]], [[OUTERMOST_LATCH:%.*]] ]
+; CHECK-LABEL: define void @test2() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[OUTERMOST_HEADER:.*]]
+; CHECK:       [[OUTERMOST_HEADER]]:
+; CHECK-NEXT:    [[INDVAR_OUTERMOST:%.*]] = phi i32 [ 10, %[[ENTRY]] ], [ [[INDVAR_OUTERMOST_NEXT:%.*]], %[[OUTERMOST_LATCH:.*]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr @a, align 4
 ; CHECK-NEXT:    [[TOBOOL71_I:%.*]] = icmp eq i32 [[TMP0]], 0
-; CHECK-NEXT:    br label [[INNERMOST_PREHEADER:%.*]]
-; CHECK:       middle.header.preheader:
-; CHECK-NEXT:    br label [[MIDDLE_HEADER:%.*]]
-; CHECK:       middle.header:
-; CHECK-NEXT:    [[INDVAR_MIDDLE:%.*]] = phi i64 [ [[INDVAR_MIDDLE_NEXT:%.*]], [[MIDDLE_LATCH:%.*]] ], [ 4, [[MIDDLE_HEADER_PREHEADER:%.*]] ]
-; CHECK-NEXT:    br i1 [[TOBOOL71_I]], label [[INNERMOST_BODY_SPLIT1:%.*]], label [[INNERMOST_BODY_SPLIT:%.*]]
-; CHECK:       innermost.preheader:
-; CHECK-NEXT:    br label [[INNERMOST_BODY:%.*]]
-; CHECK:       innermost.body:
-; CHECK-NEXT:    [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], [[INNERMOST_BODY_SPLIT]] ], [ 4, [[INNERMOST_PREHEADER]] ]
-; CHECK-NEXT:    br label [[MIDDLE_HEADER_PREHEADER]]
-; CHECK:       innermost.body.split1:
+; CHECK-NEXT:    br label %[[INNERMOST_PREHEADER:.*]]
+; CHECK:       [[MIDDLE_HEADER_PREHEADER:.*]]:
+; CHECK-NEXT:    br label %[[MIDDLE_HEADER:.*]]
+; CHECK:       [[MIDDLE_HEADER]]:
+; CHECK-NEXT:    [[INDVAR_MIDDLE:%.*]] = phi i64 [ [[INDVAR_MIDDLE_NEXT:%.*]], %[[MIDDLE_LATCH:.*]] ], [ 4, %[[MIDDLE_HEADER_PREHEADER]] ]
+; CHECK-NEXT:    br i1 [[TOBOOL71_I]], label %[[INNERMOST_BODY_SPLIT1:.*]], label %[[INNERMOST_BODY_SPLIT:.*]]
+; CHECK:       [[INNERMOST_PREHEADER]]:
+; CHECK-NEXT:    br label %[[INNERMOST_BODY:.*]]
+; CHECK:       [[INNERMOST_BODY]]:
+; CHECK-NEXT:    [[INDVAR_INNERMOST:%.*]] = phi i64 [ [[TMP1:%.*]], %[[INNERMOST_BODY_SPLIT]] ], [ 4, %[[INNERMOST_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[MIDDLE_HEADER_PREHEADER]]
+; CHECK:       [[INNERMOST_BODY_SPLIT1]]:
 ; CHECK-NEXT:    [[ARRAYIDX9_I:%.*]] = getelementptr inbounds [1 x [6 x i32]], ptr @d, i64 0, i64 [[INDVAR_INNERMOST]], i64 [[INDVAR_MIDDLE]]
 ; CHECK-NEXT:    store i32 0, ptr [[ARRAYIDX9_I]], align 4
 ; CHECK-NEXT:    [[INDVAR_INNERMOST_NEXT:%.*]] = add nsw i64 [[INDVAR_INNERMOST]], -1
 ; CHECK-NEXT:    [[TOBOOL5_I:%.*]] = icmp eq i64 [[INDVAR_INNERMOST_NEXT]], 0
-; CHECK-NEXT:    br label [[MIDDLE_LATCH_LOOPEXIT:%.*]]
-; CHECK:       innermost.body.split:
+; CHECK-NEXT:    br label %[[INNERMOST_LOOPEXIT:.*]]
+; CHECK:       [[INNERMOST_BODY_SPLIT]]:
 ; CHECK-NEXT:    [[TMP1]] = add nsw i64 [[INDVAR_INNERMOST]], -1
 ; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 0
-; CHECK-NEXT:    br i1 [[TMP2]], label [[OUTERMOST_LATCH]], label [[INNERMOST_BODY]]
-; CHECK:       innermost.loopexit:
-; CHECK-NEXT:    br label [[MIDDLE_LATCH]]
-; CHECK:       middle.latch:
+; CHECK-NEXT:    br i1 [[TMP2]], label %[[OUTERMOST_LATCH]], label %[[INNERMOST_BODY]]
+; CHECK:       [[INNERMOST_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[MIDDLE_LATCH]]
+; CHECK:       [[MIDDLE_LATCH]]:
 ; CHECK-NEXT:    [[INDVAR_MIDDLE_NEXT]] = add nsw i64 [[INDVAR_MIDDLE]], -1
 ; CHECK-NEXT:    [[TOBOOL2_I:%.*]] = icmp eq i64 [[INDVAR_MIDDLE_NEXT]], 0
-; CHECK-NEXT:    br i1 [[TOBOOL2_I]], label [[INNERMOST_BODY_SPLIT]], label [[MIDDLE_HEADER]]
-; CHECK:       outermost.latch:
+; CHECK-NEXT:    br i1 [[TOBOOL2_I]], label %[[INNERMOST_BODY_SPLIT]], label %[[MIDDLE_HEADER]]
+; CHECK:       [[OUTERMOST_LATCH]]:
 ; CHECK-NEXT:    [[INDVAR_OUTERMOST_NEXT]] = add nsw i32 [[INDVAR_OUTERMOST]], -5
 ; CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp eq i32 [[INDVAR_OUTERMOST_NEXT]], 0
-; CHECK-NEXT:    br i1 [[TOBOOL_I]], label [[OUTERMOST_EXIT:%.*]], label [[OUTERMOST_HEADER]]
-; CHECK:       outermost.exit:
+; CHECK-NEXT:    br i1 [[TOBOOL_I]], label %[[OUTERMOST_EXIT:.*]], label %[[OUTERMOST_HEADER]]
+; CHECK:       [[OUTERMOST_EXIT]]...
[truncated]

@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx,
// if the direction matrix, after the same permutation is applied to its
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
static bool isLexicographicallyPositive(std::vector<char> &DV) {
bool HasScalar = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused?

@@ -187,11 +187,19 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx,
// if the direction matrix, after the same permutation is applied to its
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
static bool isLexicographicallyPositive(std::vector<char> &DV) {
bool HasScalar = false;
bool FirstPosScalar = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what FirstPosScalar means. This legality check seems to allow the permutation of [S <] to [< S]. Is this transformation legal? Or can such a case not happen? I tried some loops that have a row of [S <], and as far as these cases, the legality check rejects the interchange because of another row like [S =].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would second this. It looks to me that this patch allows some specific pattern to be interchanged without a schematic reasoning.

@sjoerdmeijer
Copy link
Collaborator Author

Thanks for the review @kasuga-fj and @CongzheUalberta! I agree with your feedback, I made a mess of the logic. I have now simplified things, the idea is to simply reject S dependencies:

This patch checks that the dependency matrix has no "S" direction as the leftmost non-"=" direction in any row.

I have updated the commit message / description of this patch with a better explanation.

@Meinersbur
Copy link
Member

I honestly never understood why a dependency that is "scalar" (that is, the dimension is not referenced in the index expression) is handled fundamentally different. If not references it should be equivalent to accessing (any of) the previous iteration. How about we remove

          if (D->isScalar(II)) {
            Direction = 'S';
            Dep.push_back(Direction);

in populateDependencyMatrix entirely, so a dependency is never S?

In any case, the patch in the current form should be conservatively correct.

@sjoerdmeijer
Copy link
Collaborator Author

I honestly never understood why a dependency that is "scalar" (that is, the dimension is not referenced in the index expression) is handled fundamentally different. If not references it should be equivalent to accessing (any of) the previous iteration. How about we remove

          if (D->isScalar(II)) {
            Direction = 'S';
            Dep.push_back(Direction);

in populateDependencyMatrix entirely, so a dependency is never S?

In any case, the patch in the current form should be conservatively correct.

I would be happy to remove it in the way you suggested, @Meinersbur. My only minor concern is that if we want to deal with these sort of dependencies later, we would need to restore some of this logic, so removing it now would mean more churn later? But let me know what you prefer, I am happy either way (leave it as it is because it is conservative/correct as you say, or implement your suggestion).

@kasuga-fj
Copy link
Contributor

I agree with removing the 'S'. I think leaving the incorrect dependencies risks causing correctness issues, especially when we change the code in the future. At the extreme, we need to insert something like if (Direction == 'S') { ... } as this patch does when adding new functions.

@Meinersbur
Copy link
Member

@sjoerdmeijer What kind of interchange do you want to preserve with S? IMHO the current code doesn't even understand what it should mean (or it was forgotten about at some point), so if we want to re-add it, it should get another full review.

Besides, this PR currently will just reject everything with S, but DependenceInfo may be able to determine the direction already.

@sjoerdmeijer
Copy link
Collaborator Author

@sjoerdmeijer What kind of interchange do you want to preserve with S? IMHO the current code doesn't even understand what it should mean (or it was forgotten about at some point), so if we want to re-add it, it should get another full review.

Besides, this PR currently will just reject everything with S, but DependenceInfo may be able to determine the direction already.

Okay, excellent points. I will prepare a new revision that removes S.

@sjoerdmeijer sjoerdmeijer changed the title [LoopInterchange] Bail out for Scalar Dependencies [LoopInterchange] Remove 'S' Scalar Dependencies Jan 13, 2025
@sjoerdmeijer
Copy link
Collaborator Author

Thanks for the review @Meinersbur , @kasuga-fj: this now no longer inserts 'S' into the dependency matrix, it should default to '*'.

Copy link

github-actions bot commented Jan 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me. but some tests do not appear to have achieved their original objectives. I'm not sure what we should do in such cases, but in my opinion a mechanical update does not make sense. Marking XFAIL may be one option.

Copy link
Collaborator Author

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me. but some tests do not appear to have achieved their original objectives. I'm not sure what we should do in such cases, but in my opinion a mechanical update does not make sense. Marking XFAIL may be one option.

Thanks, and I also had a bit of a nagging feeling about the test cases. I quite like the idea about the XFAILs. I think what I will do is the following:

  • I will first convert the test cases that were modified and check the full IR to tests that check the optimisation remark in a separate patch. Checking the the optimisation remark is more compact, less verbose and easier to see.
  • I will then see if adding XFAILS here is better, I think so.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and I also had a bit of a nagging feeling about the test cases. I quite like the idea about the XFAILs. I think what I will do is the following:

  • I will first convert the test cases that were modified and check the full IR to tests that check the optimisation remark in a separate patch. Checking the the optimisation remark is more compact, less verbose and easier to see.
  • I will then see if adding XFAILS here is better, I think so.

Cool! I second this idea.

sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 15, 2025
Checking the remark message if interchange did or didn't happen is more
straight forward than the full IR for these cases. This comment was
also made when I moved some tests away from relying on debug builds in
change llvm#116780, and this is a prep step for llvm#119345 that is going to
change these steps.
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 16, 2025
Checking the remark message if interchange did or didn't happen is more
straight forward than checking the full IR for these cases. This comment was
also made when I moved some tests away from relying on debug builds in
change llvm#116780, and this is a prep step for llvm#119345 that is going to
change these steps.
sjoerdmeijer added a commit that referenced this pull request Jan 16, 2025
Checking the remark message if interchange did or didn't happen is more
straight forward than the full IR for these cases. This comment was also
made when I moved some tests away from relying on debug builds in change
#116780, and this is a prep step for #119345 that is going to change
these test cases.
@sjoerdmeijer
Copy link
Collaborator Author

I have now XFAIL'ed the test cases, with 2 exceptions where it was already rejected but for different reasons.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM, now all that remains is to apply the formatter

@sjoerdmeijer
Copy link
Collaborator Author

Thanks, LGTM, now all that remains is to apply the formatter

Thanks a lot for the reviews, will do that before merging this.

We are not handling 'S' scalar dependencies correctly and have at least
the following miscompiles related to that:

[LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" llvm#54176
[LoopInterchange] Interchange breaks program correctness llvm#46867
[LoopInterchange] Loops should not interchanged due to dependencies llvm#47259
[LoopInterchange] Loops should not interchanged due to control flow llvm#47401

This patch does no longer insert the "S" dependency/direction into the
dependency matrix, so a dependency is never "S". We seem to have
forgotten what the exact meaning is of this dependency type, and don't
see why it should be treated differently.

We prefer correctness over incorrect and more aggressive results. I.e.,
this prevents the miscompiles at the expense of handling less cases,
i.e. making interchange more pessimistic. However, some of the cases
that are now rejected for dependence analysis reasons, were rejected
before too but for other reasons (e.g. profitability). So at least for
the llvm regression tests, the number of regression are very reasonable.
This should be a stopgap. We would like to get interchange enabled by
default and thus prefer correctness over unsafe transforms, and later
see if we can get solve the regressions.
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjoerdmeijer sjoerdmeijer merged commit 456ec1c into llvm:main Jan 20, 2025
6 of 7 checks passed
@sjoerdmeijer sjoerdmeijer deleted the li-scalar-deps-fix branch January 20, 2025 13:05
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 29, 2025
This is a work in progress patch to enable loop-interchange by default
and is a continuation of the RFC:

https://discourse.llvm.org/t/enabling-loop-interchange/82589

Basically, we promised to fix any compile-time and correctness issues in
the different components involved here (loop-interchange and dependence
analaysis.) before discussing enabling interchange by default. We think
are close to complete this; I would like to explain where we are and
wanted to check if there are any thoughts or concerns. A quick overview
of the correctness and compile-time improvements that we have made include:

Correctness:
- [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345)
- [LoopInterchange] Fix overflow in cost calculation (llvm#111807)
- [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj
- [DA] disambiguate evolution of base addresses (llvm#116628)

Compile-times:
- [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973)
- [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128)
- [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247)

And in terms of remaining work, we think we are very close to fixing
these depenence analysis issues:
- [DA] do not handle array accesses of different offsets (llvm#123436)
- [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630)
- [DA] use NSW arithmetic llvm#116632

The compile-time increase with a geomean increase of 0.19% looks good
(after committing llvm#124247), I think:

  stage1-O3:
  Benchmark
  kimwitu++        +0.10%
  sqlite3          +0.14%
  consumer-typeset +0.07%
  Bullet           +0.06%
  tramp3d-v4       +0.21%
  mafft            +0.39%
  ClamAVi          +0.06%
  lencod           +0.61%
  SPASS            +0.17%
  7zip             +0.08%
  geomean          +0.19%

See also:
http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au

We might want to look into lencod to see if we can improve more, but not
sure it is strictly necessary.
kasuga-fj added a commit to kasuga-fj/llvm-project that referenced this pull request Jan 31, 2025
The incorrect handling of scalar dependencies in LoopInterchange was
fixed by llvm#119345. This patch adds tests that are relative to the issues
fixed by llvm#119345.
kasuga-fj added a commit that referenced this pull request Feb 3, 2025
The incorrect handling of scalar dependencies in LoopInterchange was
fixed by #119345. This patch adds tests that are relative to the issues
fixed by #119345.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
The incorrect handling of scalar dependencies in LoopInterchange was
fixed by llvm#119345. This patch adds tests that are relative to the issues
fixed by llvm#119345.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants