[LoopInterchange] Handle LE and GE correctly #124901

kasuga-fj · 2025-01-29T09:01:00Z

LoopInterchange have converted DVEntry::LE and DVEntry::GE in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead.

Resolve #123920

LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve llvm#123920

llvmbot · 2025-01-29T09:01:34Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

LoopInterchange have converted DVEntry::LE and DVEntry::GE in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead.

Resolve #123920

Full diff: https://github.com/llvm/llvm-project/pull/124901.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+8-3)
(added) llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll (+75)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index 38fc682698c53e..ca125d2c0c490c 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -160,11 +160,16 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
         unsigned Levels = D->getLevels();
         char Direction;
         for (unsigned II = 1; II <= Levels; ++II) {
+          // `DVEntry::LE` is converted to `*`. This is because `LE` means `<`
+          // or `=`, for which we don't have an equivalent representation, so
+          // that the conservative approximation is necessary. The same goes for
+          // `DVEntry::GE`.
+          // TODO: Use of fine-grained expressions allows for more accurate
+          // analysis.
           unsigned Dir = D->getDirection(II);
-          if (Dir == Dependence::DVEntry::LT || Dir == Dependence::DVEntry::LE)
+          if (Dir == Dependence::DVEntry::LT)
             Direction = '<';
-          else if (Dir == Dependence::DVEntry::GT ||
-                   Dir == Dependence::DVEntry::GE)
+          else if (Dir == Dependence::DVEntry::GT)
             Direction = '>';
           else if (Dir == Dependence::DVEntry::EQ)
             Direction = '=';
diff --git a/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll b/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll
new file mode 100644
index 00000000000000..c17d78f7cfce68
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/outer-dependency-lte.ll
@@ -0,0 +1,75 @@
+; RUN: opt < %s -passes=loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t \
+; RUN:     -verify-dom-info -verify-loop-info -verify-loop-lcssa
+; RUN: FileCheck --input-file=%t %s
+
+;; The original code:
+;;
+;; #define N 4
+;; int a[N*N][N*N][N*N];
+;; void f() {
+;;   for (int i = 0; i < N; i++)
+;;     for (int j = 1; j < 2*N; j++)
+;;       for (int k = 1; k < 2*N; k++)
+;;         a[2*i][k+1][j-1] -= a[i+N-1][k][j];
+;; }
+;;
+;; The entry of the direction vector for the outermost loop is `DVEntry::LE`.
+;; We need to treat this as `*`, not `<`. See issue #123920 for details.
+
+; CHECK: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+; CHECK: --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Dependence
+; CHECK-NEXT: Function:        f
+
+@a = dso_local global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
+
+define dso_local void @f() {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:
+  %i.039 = phi i32 [ 0, %entry ], [ %inc26, %for.cond.cleanup3 ]
+  %sub = add nuw nsw i32 %i.039, 3
+  %idxprom = zext nneg i32 %sub to i64
+  %mul = shl nuw nsw i32 %i.039, 1
+  %idxprom13 = zext nneg i32 %mul to i64
+  br label %for.cond5.preheader
+
+for.cond.cleanup:
+  ret void
+
+for.cond5.preheader:
+  %j.038 = phi i32 [ 1, %for.cond1.preheader ], [ %inc23, %for.cond.cleanup7 ]
+  %idxprom11 = zext nneg i32 %j.038 to i64
+  %sub18 = add nsw i32 %j.038, -1
+  %idxprom19 = sext i32 %sub18 to i64
+  br label %for.body8
+
+for.cond.cleanup3:
+  %inc26 = add nuw nsw i32 %i.039, 1
+  %cmp = icmp samesign ult i32 %i.039, 3
+  br i1 %cmp, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7:
+  %inc23 = add nuw nsw i32 %j.038, 1
+  %cmp2 = icmp samesign ult i32 %j.038, 7
+  br i1 %cmp2, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8:
+  %k.037 = phi i32 [ 1, %for.cond5.preheader ], [ %add15, %for.body8 ]
+  %idxprom9 = zext nneg i32 %k.037 to i64
+  %arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %idxprom, i64 %idxprom9, i64 %idxprom11
+  %0 = load i32, ptr %arrayidx12, align 4
+  %add15 = add nuw nsw i32 %k.037, 1
+  %idxprom16 = zext nneg i32 %add15 to i64
+  %arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %idxprom13, i64 %idxprom16, i64 %idxprom19
+  %1 = load i32, ptr %arrayidx20, align 4
+  %sub21 = sub nsw i32 %1, %0
+  store i32 %sub21, ptr %arrayidx20, align 4
+  %cmp6 = icmp samesign ult i32 %k.037, 7
+  br i1 %cmp6, label %for.body8, label %for.cond.cleanup7
+}

sjoerdmeijer

I agree with this direction, correctness should be the first priority.
Also, as you mentioned, this does not regress many cases.

LGTM

kasuga-fj · 2025-01-29T10:30:47Z

Thanks for your review!

@kasuga-fj

This is a work in progress patch to enable loop-interchange by default and is a continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345) - [LoopInterchange] Fix overflow in cost calculation (llvm#111807) - [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj - [DA] disambiguate evolution of base addresses (llvm#116628) Compile-times: - [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973) - [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128) - [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247) And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - [DA] do not handle array accesses of different offsets (llvm#123436) - [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630) - [DA] use NSW arithmetic llvm#116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing llvm#124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.

llvmbot added the llvm:transforms label Jan 29, 2025

kasuga-fj requested review from madhur13490, CongzheUalberta and sjoerdmeijer January 29, 2025 09:01

sjoerdmeijer approved these changes Jan 29, 2025

View reviewed changes

kasuga-fj merged commit 690f251 into llvm:main Jan 29, 2025
8 of 10 checks passed

sjoerdmeijer mentioned this pull request Jan 29, 2025

[LoopInterchange] Enable it by default (WIP) #124911

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopInterchange] Handle LE and GE correctly #124901

[LoopInterchange] Handle LE and GE correctly #124901

Uh oh!

kasuga-fj commented Jan 29, 2025

Uh oh!

llvmbot commented Jan 29, 2025

Uh oh!

sjoerdmeijer left a comment

Uh oh!

kasuga-fj commented Jan 29, 2025

Uh oh!

Uh oh!

Uh oh!

[LoopInterchange] Handle LE and GE correctly #124901

[LoopInterchange] Handle LE and GE correctly #124901

Uh oh!

Conversation

kasuga-fj commented Jan 29, 2025

Uh oh!

llvmbot commented Jan 29, 2025

Uh oh!

sjoerdmeijer left a comment

Choose a reason for hiding this comment

Uh oh!

kasuga-fj commented Jan 29, 2025

Uh oh!

Uh oh!

Uh oh!