Skip to content

[LoopInterchange] Handle LE and GE correctly #124901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

kasuga-fj
Copy link
Contributor

LoopInterchange have converted DVEntry::LE and DVEntry::GE in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead.

Resolve #123920

LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in
direction vectors to '<' and '>' respectively. This handling is
incorrect because the information about the '=' it lost. This leads to
miscompilation in some cases. To resolve this issue, convert them to '*'
instead.

Resolve llvm#123920
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

LoopInterchange have converted DVEntry::LE and DVEntry::GE in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead.

Resolve #123920


Full diff: https://github.com/llvm/llvm-project/pull/124901.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+8-3)
  • (added) llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll (+75)
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 38fc682698c53e..ca125d2c0c490c 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -160,11 +160,16 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
         unsigned Levels = D->getLevels();
         char Direction;
         for (unsigned II = 1; II <= Levels; ++II) {
+          // `DVEntry::LE` is converted to `*`. This is because `LE` means `<`
+          // or `=`, for which we don't have an equivalent representation, so
+          // that the conservative approximation is necessary. The same goes for
+          // `DVEntry::GE`.
+          // TODO: Use of fine-grained expressions allows for more accurate
+          // analysis.
           unsigned Dir = D->getDirection(II);
-          if (Dir == Dependence::DVEntry::LT || Dir == Dependence::DVEntry::LE)
+          if (Dir == Dependence::DVEntry::LT)
             Direction = '<';
-          else if (Dir == Dependence::DVEntry::GT ||
-                   Dir == Dependence::DVEntry::GE)
+          else if (Dir == Dependence::DVEntry::GT)
             Direction = '>';
           else if (Dir == Dependence::DVEntry::EQ)
             Direction = '=';
diff --git a/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll b/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll
new file mode 100644
index 00000000000000..c17d78f7cfce68
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll
@@ -0,0 +1,75 @@
+; RUN: opt < %s -passes=loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t \
+; RUN:     -verify-dom-info -verify-loop-info -verify-loop-lcssa
+; RUN: FileCheck --input-file=%t %s
+
+;; The original code:
+;;
+;; #define N 4
+;; int a[N*N][N*N][N*N];
+;; void f() {
+;;   for (int i = 0; i < N; i++)
+;;     for (int j = 1; j < 2*N; j++)
+;;       for (int k = 1; k < 2*N; k++)
+;;         a[2*i][k+1][j-1] -= a[i+N-1][k][j];
+;; }
+;;
+;; The entry of the direction vector for the outermost loop is `DVEntry::LE`.
+;; We need to treat this as `*`, not `<`. See issue #123920 for details.
+
+; CHECK: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+; CHECK: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+
+@a = dso_local global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
+
+define dso_local void @f() {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %i.039 = phi i32 [ 0, %entry ], [ %inc26, %for.cond.cleanup3 ]
+  %sub = add nuw nsw i32 %i.039, 3
+  %idxprom = zext nneg i32 %sub to i64
+  %mul = shl nuw nsw i32 %i.039, 1
+  %idxprom13 = zext nneg i32 %mul to i64
+  br label %for.cond5.preheader
+
+for.cond.cleanup:
+  ret void
+
+for.cond5.preheader:
+  %j.038 = phi i32 [ 1, %for.cond1.preheader ], [ %inc23, %for.cond.cleanup7 ]
+  %idxprom11 = zext nneg i32 %j.038 to i64
+  %sub18 = add nsw i32 %j.038, -1
+  %idxprom19 = sext i32 %sub18 to i64
+  br label %for.body8
+
+for.cond.cleanup3:
+  %inc26 = add nuw nsw i32 %i.039, 1
+  %cmp = icmp samesign ult i32 %i.039, 3
+  br i1 %cmp, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:
+  %inc23 = add nuw nsw i32 %j.038, 1
+  %cmp2 = icmp samesign ult i32 %j.038, 7
+  br i1 %cmp2, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:
+  %k.037 = phi i32 [ 1, %for.cond5.preheader ], [ %add15, %for.body8 ]
+  %idxprom9 = zext nneg i32 %k.037 to i64
+  %arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %idxprom, i64 %idxprom9, i64 %idxprom11
+  %0 = load i32, ptr %arrayidx12, align 4
+  %add15 = add nuw nsw i32 %k.037, 1
+  %idxprom16 = zext nneg i32 %add15 to i64
+  %arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %idxprom13, i64 %idxprom16, i64 %idxprom19
+  %1 = load i32, ptr %arrayidx20, align 4
+  %sub21 = sub nsw i32 %1, %0
+  store i32 %sub21, ptr %arrayidx20, align 4
+  %cmp6 = icmp samesign ult i32 %k.037, 7
+  br i1 %cmp6, label %for.body8, label %for.cond.cleanup7
+}

Copy link
Collaborator

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this direction, correctness should be the first priority.
Also, as you mentioned, this does not regress many cases.

LGTM

@kasuga-fj
Copy link
Contributor Author

Thanks for your review!

@kasuga-fj kasuga-fj merged commit 690f251 into llvm:main Jan 29, 2025
8 of 10 checks passed
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 29, 2025
This is a work in progress patch to enable loop-interchange by default
and is a continuation of the RFC:

https://discourse.llvm.org/t/enabling-loop-interchange/82589

Basically, we promised to fix any compile-time and correctness issues in
the different components involved here (loop-interchange and dependence
analaysis.) before discussing enabling interchange by default. We think
are close to complete this; I would like to explain where we are and
wanted to check if there are any thoughts or concerns. A quick overview
of the correctness and compile-time improvements that we have made include:

Correctness:
- [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345)
- [LoopInterchange] Fix overflow in cost calculation (llvm#111807)
- [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj
- [DA] disambiguate evolution of base addresses (llvm#116628)

Compile-times:
- [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973)
- [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128)
- [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247)

And in terms of remaining work, we think we are very close to fixing
these depenence analysis issues:
- [DA] do not handle array accesses of different offsets (llvm#123436)
- [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630)
- [DA] use NSW arithmetic llvm#116632

The compile-time increase with a geomean increase of 0.19% looks good
(after committing llvm#124247), I think:

  stage1-O3:
  Benchmark
  kimwitu++        +0.10%
  sqlite3          +0.14%
  consumer-typeset +0.07%
  Bullet           +0.06%
  tramp3d-v4       +0.21%
  mafft            +0.39%
  ClamAVi          +0.06%
  lencod           +0.61%
  SPASS            +0.17%
  7zip             +0.08%
  geomean          +0.19%

See also:
http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au

We might want to look into lencod to see if we can improve more, but not
sure it is strictly necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LoopInterchange] Interchange potentially breaks the program correctness
3 participants