Skip to content

Commit 5ff4eb2

Browse files
committed
[LoopInterchange] Relax the legality check to accept more patterns
We lose opportunities to interchange loops because the current legality check is stricter than necessary. This patch relaxes the restriction and increases the number of acceptable patterns. Here is a motivating example. ``` // From TSVC s231 for (int nl=0;nl<100;nl++) { for (int i=0;i<256;i++) { for (int j=1;j<256;j++) aa[j][i] = aa[j-1][i] + bb[j][i]; } dummy(aa, bb); } ``` This patch allows us to interchange the two innermost in the above code. Note, however, that the current implementation interchanges these loops twice so that they end up going back in the original order.
1 parent 9534d27 commit 5ff4eb2

File tree

3 files changed

+330
-19
lines changed

3 files changed

+330
-19
lines changed

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Lines changed: 120 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ using LoopVector = SmallVector<Loop *, 8>;
7272
// TODO: Check if we can use a sparse matrix here.
7373
using CharMatrix = std::vector<std::vector<char>>;
7474

75+
// Classification of a direction vector by the leftmost element after removing
76+
// '=' and 'I' from it.
77+
enum class DirectionVectorPattern {
78+
Zero, ///< The direction vector contains only '=' or 'I'.
79+
Positive, ///< The leftmost element after removing '=' and 'I' is '<'.
80+
Negative, ///< The leftmost element after removing '=' and 'I' is '>'.
81+
All, ///< The leftmost element after removing '=' and 'I' is '*'.
82+
};
83+
7584
} // end anonymous namespace
7685

7786
// Minimum loop depth supported.
@@ -199,35 +208,128 @@ static void interChangeDependencies(CharMatrix &DepMatrix, unsigned FromIndx,
199208
std::swap(DepMatrix[I][ToIndx], DepMatrix[I][FromIndx]);
200209
}
201210

202-
// After interchanging, check if the direction vector is valid.
203-
// [Theorem] A permutation of the loops in a perfect nest is legal if and only
204-
// if the direction matrix, after the same permutation is applied to its
205-
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
206-
static bool isLexicographicallyPositive(std::vector<char> &DV) {
207-
for (unsigned char Direction : DV) {
208-
if (Direction == '<')
211+
// Clasify the direction vector into the four patterns. The target vector is
212+
// [DV[Left], DV[Left+1], ..., DV[Right-1]], not the whole of \p DV.
213+
static DirectionVectorPattern
214+
classifyDirectionVector(const std::vector<char> &DV, unsigned Left,
215+
unsigned Right) {
216+
assert(Left <= Right && "Left must be less or equal to Right");
217+
for (unsigned I = Left; I < Right; I++) {
218+
unsigned char Direction = DV[I];
219+
switch (Direction) {
220+
case '<':
221+
return DirectionVectorPattern::Positive;
222+
case '>':
223+
return DirectionVectorPattern::Negative;
224+
case '*':
225+
return DirectionVectorPattern::All;
226+
case '=':
227+
case 'I':
228+
break;
229+
default:
230+
llvm_unreachable("Unknown element in direction vector");
231+
}
232+
}
233+
return DirectionVectorPattern::Zero;
234+
}
235+
236+
// Check whether the requested interchange is legal or not. The interchange is
237+
// valid if the following condition holds:
238+
//
239+
// [Cond] For two instructions that can access the same location, the execution
240+
// order of the instructions before and after interchanged is the same.
241+
//
242+
// If the direction vector doesn't contain '*', the above Cond is equivalent to
243+
// one of the following:
244+
//
245+
// - The leftmost non-'=' element is '<' before and after interchanging.
246+
// - The leftmost non-'=' element is '>' before and after interchanging.
247+
// - All the elements in the direction vector is '='.
248+
//
249+
// As for '*', we must treat it as having dependency in all directions. It could
250+
// be '<', it could be '>', it could be '='. We can eliminate '*'s from the
251+
// direction vector by enumerating all possible patterns by replacing '*' with
252+
// '<' or '>' or '=', and then doing the above checks for all of them. The
253+
// enumeration can grow exponentially, so it is not practical to run it as it
254+
// is. Fortunately, we can perform the following pruning.
255+
//
256+
// - For '*' to the left of \p OuterLoopId, replacing it with '=' is allowed.
257+
//
258+
// This is because, for patterns where '<' (or '>') is assigned to some '*' to
259+
// the left of \p OuterLoopId, the first (or second) condition above holds
260+
// regardless of interchanging. After doing this pruning, the interchange is
261+
// legal if the leftmost non-'=' element is the same before and after swapping
262+
// the element of \p OuterLoopId and \p InnerLoopId.
263+
//
264+
//
265+
// Example: Consider the following loop.
266+
//
267+
// ```
268+
// for (i=0; i<=32; i++)
269+
// for (j=0; j<N-1; j++)
270+
// for (k=0; k<N-1; k++) {
271+
// Src: A[i][j][k] = ...;
272+
// Dst: use(A[32-i][j+1][k+1]);
273+
// }
274+
// ```
275+
//
276+
// In this case, the direction vector is [* < <] (if the analysis is powerful
277+
// enough). The enumeration of all possible patterns by replacing '*' is as
278+
// follows:
279+
//
280+
// - [< < <] : when i < 16
281+
// - [= < <] : when i = 16
282+
// - [> < <] : when i > 16
283+
//
284+
// We can prove that it is safe to interchange the innermost two loops here,
285+
// because the interchange doesn't change the leftmost non-'=' element for all
286+
// enumerated vectors.
287+
//
288+
// TODO: There are cases where the interchange is legal but rejected. At least
289+
// the following patterns are legal:
290+
// - If both Dep[OuterLoopId] and Dep[InnerLoopId] are '=', the interchange is
291+
// legal regardless of any other elements.
292+
// - If the loops are adjacent to each other and at least one of them is '=',
293+
// the interchange is legal even if the other is '*'.
294+
static bool isLegalToInterchangeLoopsForRow(std::vector<char> Dep,
295+
unsigned InnerLoopId,
296+
unsigned OuterLoopId) {
297+
// Replace '*' to the left of OuterLoopId with '='. The presence of '<' means
298+
// that the direction vector is something like [= = = < ...], where the
299+
// interchange is safe.
300+
for (unsigned I = 0; I < OuterLoopId; I++) {
301+
if (Dep[I] == '<' || Dep[I] == '>')
209302
return true;
210-
if (Direction == '>' || Direction == '*')
211-
return false;
303+
Dep[I] = '=';
212304
}
213-
return true;
305+
306+
// From this point on, all elements to the left of OuterLoopId are considered
307+
// to be '='.
308+
309+
// Perform legality checks by comparing the leftmost non-'=' element between
310+
// before and after the interchange. If either one is '*', then the
311+
// interchange is unsafe. Otherwise it is safe if the element is equal.
312+
auto BeforePattern =
313+
classifyDirectionVector(Dep, OuterLoopId, InnerLoopId + 1);
314+
if (BeforePattern == DirectionVectorPattern::All)
315+
return false;
316+
std::swap(Dep[InnerLoopId], Dep[OuterLoopId]);
317+
auto AfterPattern =
318+
classifyDirectionVector(Dep, OuterLoopId, InnerLoopId + 1);
319+
return BeforePattern == AfterPattern;
214320
}
215321

216322
// Checks if it is legal to interchange 2 loops.
217323
static bool isLegalToInterChangeLoops(CharMatrix &DepMatrix,
218324
unsigned InnerLoopId,
219325
unsigned OuterLoopId) {
220326
unsigned NumRows = DepMatrix.size();
221-
std::vector<char> Cur;
222327
// For each row check if it is valid to interchange.
223328
for (unsigned Row = 0; Row < NumRows; ++Row) {
224-
// Create temporary DepVector check its lexicographical order
225-
// before and after swapping OuterLoop vs InnerLoop
226-
Cur = DepMatrix[Row];
227-
if (!isLexicographicallyPositive(Cur))
228-
return false;
229-
std::swap(Cur[InnerLoopId], Cur[OuterLoopId]);
230-
if (!isLexicographicallyPositive(Cur))
329+
// The vector is copied because the elements will be modified in the
330+
// `isLegalToInterchangeLoopsForRow`
331+
if (!isLegalToInterchangeLoopsForRow(DepMatrix[Row], InnerLoopId,
332+
OuterLoopId))
231333
return false;
232334
}
233335
return true;

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ for.end8: ; preds = %for.cond1.for.inc6_
7474

7575
; CHECK: --- !Missed
7676
; CHECK-NEXT: Pass: loop-interchange
77-
; CHECK-NEXT: Name: Dependence
77+
; CHECK-NEXT: Name: UnsupportedPHIOuter
7878
; CHECK-NEXT: Function: reduction_03
7979

8080
; IR-LABEL: @reduction_03(
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
; REQUIRES: asserts
2+
; RUN: opt < %s -passes=loop-interchange -verify-dom-info -verify-loop-info \
3+
; RUN: -disable-output -debug 2>&1 | FileCheck %s
4+
5+
@a = dso_local global [20 x [20 x [20 x i32]]] zeroinitializer, align 4
6+
@aa = dso_local global [256 x [256 x float]] zeroinitializer, align 64
7+
@bb = dso_local global [256 x [256 x float]] zeroinitializer, align 64
8+
9+
;; for (int nl=0;nl<100;++nl)
10+
;; for (int i=0;i<256;++i)
11+
;; for (int j=1;j<256;++j)
12+
;; aa[j][i] = aa[j-1][i] + bb[j][i];
13+
;;
14+
;; The direction vector of `aa` is [* = >]. We can interchange the innermost
15+
;; two loops, The direction vector after interchanging will be [* > =].
16+
17+
; CHECK: Dependency matrix before interchange:
18+
; CHECK-NEXT: * = >
19+
; CHECK-NEXT: * = =
20+
; CHECK-NEXT: Processing InnerLoopId = 2 and OuterLoopId = 1
21+
; CHECK-NEXT: Checking if loops are tightly nested
22+
; CHECK-NEXT: Checking instructions in Loop header and Loop latch
23+
; CHECK-NEXT: Loops are perfectly nested
24+
; CHECK-NEXT: Loops are legal to interchange
25+
; CHECK: Dependency matrix after interchange:
26+
; CHECK-NEXT: * > =
27+
; CHECK-NEXT: * = =
28+
29+
define void @all_eq_gt() {
30+
entry:
31+
br label %for.cond1.preheader
32+
33+
for.cond1.preheader:
34+
%nl.036 = phi i32 [ 0, %entry ], [ %inc23, %for.cond.cleanup3 ]
35+
br label %for.cond5.preheader
36+
37+
for.cond.cleanup3:
38+
%inc23 = add nuw nsw i32 %nl.036, 1
39+
%exitcond43 = icmp ne i32 %inc23, 100
40+
br i1 %exitcond43, label %for.cond1.preheader, label %for.cond.cleanup
41+
42+
for.cond.cleanup7:
43+
%indvars.iv.next40 = add nuw nsw i64 %indvars.iv39, 1
44+
%exitcond42 = icmp ne i64 %indvars.iv.next40, 256
45+
br i1 %exitcond42, label %for.cond5.preheader, label %for.cond.cleanup3
46+
47+
for.body8:
48+
%indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
49+
%0 = add nsw i64 %indvars.iv, -1
50+
%arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %0, i64 %indvars.iv39
51+
%1 = load float, ptr %arrayidx10, align 4
52+
%arrayidx14 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv39
53+
%2 = load float, ptr %arrayidx14, align 4
54+
%add = fadd fast float %2, %1
55+
%arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv39
56+
store float %add, ptr %arrayidx18, align 4
57+
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
58+
%exitcond = icmp ne i64 %indvars.iv.next, 256
59+
br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
60+
61+
for.cond5.preheader:
62+
%indvars.iv39 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next40, %for.cond.cleanup7 ]
63+
br label %for.body8
64+
65+
for.cond.cleanup:
66+
ret void
67+
}
68+
69+
;; for (int i=0;i<256;++i)
70+
;; for (int j=1;j<256;++j)
71+
;; aa[j][i] = aa[j-1][255-i] + bb[j][i];
72+
;;
73+
;; The direction vector of `aa` is [* >]. We cannot interchange the loops
74+
;; because we must handle a `*` dependence conservatively.
75+
76+
; CHECK: Dependency matrix before interchange:
77+
; CHECK-NEXT: * >
78+
; CHECK-NEXT: Processing InnerLoopId = 1 and OuterLoopId = 0
79+
; CHECK-NEXT: Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
80+
; CHECK-NEXT: Not interchanging loops. Cannot prove legality.
81+
82+
define void @all_gt() {
83+
entry:
84+
br label %for.cond1.preheader
85+
86+
for.cond1.preheader:
87+
%indvars.iv31 = phi i64 [ 0, %entry ], [ %indvars.iv.next32, %for.cond.cleanup3 ]
88+
%0 = sub nuw nsw i64 255, %indvars.iv31
89+
br label %for.body4
90+
91+
for.cond.cleanup3:
92+
%indvars.iv.next32 = add nuw nsw i64 %indvars.iv31, 1
93+
%exitcond35 = icmp ne i64 %indvars.iv.next32, 256
94+
br i1 %exitcond35, label %for.cond1.preheader, label %for.cond.cleanup
95+
96+
for.body4:
97+
%indvars.iv = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
98+
%1 = add nsw i64 %indvars.iv, -1
99+
%arrayidx7 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %1, i64 %0
100+
%2 = load float, ptr %arrayidx7, align 4
101+
%arrayidx11 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv31
102+
%3 = load float, ptr %arrayidx11, align 4
103+
%add = fadd fast float %3, %2
104+
%arrayidx15 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv31
105+
store float %add, ptr %arrayidx15, align 4
106+
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
107+
%exitcond = icmp ne i64 %indvars.iv.next, 256
108+
br i1 %exitcond, label %for.body4, label %for.cond.cleanup3
109+
110+
for.cond.cleanup:
111+
ret void
112+
}
113+
114+
;; for (int i=0;i<255;++i)
115+
;; for (int j=1;j<256;++j)
116+
;; aa[j][i] = aa[j-1][i+1] + bb[j][i];
117+
;;
118+
;; The direciton vector of `aa` is [< >]. We cannot interchange the loops
119+
;; because the read/write order for `aa` cannot be changed.
120+
121+
; CHECK: Dependency matrix before interchange:
122+
; CHECK-NEXT: < >
123+
; CHECK-NEXT: Processing InnerLoopId = 1 and OuterLoopId = 0
124+
; CHECK-NEXT: Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
125+
; CHECK-NEXT: Not interchanging loops. Cannot prove legality.
126+
127+
define void @lt_gt() {
128+
entry:
129+
br label %for.cond1.preheader
130+
131+
for.cond1.preheader:
132+
%indvars.iv31 = phi i64 [ 0, %entry ], [ %indvars.iv.next32, %for.cond.cleanup3 ]
133+
%indvars.iv.next32 = add nuw nsw i64 %indvars.iv31, 1
134+
br label %for.body4
135+
136+
for.cond.cleanup3:
137+
%exitcond34 = icmp ne i64 %indvars.iv.next32, 255
138+
br i1 %exitcond34, label %for.cond1.preheader, label %for.cond.cleanup
139+
140+
for.body4:
141+
%indvars.iv = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
142+
%0 = add nsw i64 %indvars.iv, -1
143+
%arrayidx6 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %0, i64 %indvars.iv.next32
144+
%1 = load float, ptr %arrayidx6, align 4
145+
%arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv31
146+
%2 = load float, ptr %arrayidx10, align 4
147+
%add11 = fadd fast float %2, %1
148+
%arrayidx15 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv31
149+
store float %add11, ptr %arrayidx15, align 4
150+
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
151+
%exitcond = icmp ne i64 %indvars.iv.next, 256
152+
br i1 %exitcond, label %for.body4, label %for.cond.cleanup3
153+
154+
for.cond.cleanup:
155+
ret void
156+
}
157+
158+
;; for (int i=0;i<20;i++)
159+
;; for (int j=0;j<20;j++)
160+
;; for (int k=1;k<20;k++)
161+
;; a[i][j][k] = a[i][5][k-1];
162+
;;
163+
;; The direction vector of `a` is [= * >]. We cannot interchange all the loops.
164+
165+
; CHECK: Dependency matrix before interchange:
166+
; CHECK-NEXT: = * >
167+
; CHECK-NEXT: Processing InnerLoopId = 2 and OuterLoopId = 1
168+
; CHECK-NEXT: Failed interchange InnerLoopId = 2 and OuterLoopId = 1 due to dependence
169+
; CHECK-NEXT: Not interchanging loops. Cannot prove legality.
170+
; CHECK-NEXT: Processing InnerLoopId = 1 and OuterLoopId = 0
171+
; CHECK-NEXT: Failed interchange InnerLoopId = 1 and OuterLoopId = 0 due to dependence
172+
; CHECK-NEXT: Not interchanging loops. Cannot prove legality.
173+
174+
define void @eq_all_gt() {
175+
entry:
176+
br label %for.cond1.preheader
177+
178+
for.cond1.preheader:
179+
%indvars.iv44 = phi i64 [ 0, %entry ], [ %indvars.iv.next45, %for.cond.cleanup3 ]
180+
br label %for.cond5.preheader
181+
182+
for.cond.cleanup3:
183+
%indvars.iv.next45 = add nuw nsw i64 %indvars.iv44, 1
184+
%exitcond47 = icmp ne i64 %indvars.iv.next45, 20
185+
br i1 %exitcond47, label %for.cond1.preheader, label %for.cond.cleanup
186+
187+
for.cond.cleanup7:
188+
%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
189+
%exitcond43 = icmp ne i64 %indvars.iv.next41, 20
190+
br i1 %exitcond43, label %for.cond5.preheader, label %for.cond.cleanup3
191+
192+
for.body8:
193+
%indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
194+
%0 = add nsw i64 %indvars.iv, -1
195+
%arrayidx11 = getelementptr inbounds [20 x [20 x [20 x i32]]], ptr @a, i64 0, i64 %indvars.iv44, i64 5, i64 %0
196+
%1 = load i32, ptr %arrayidx11, align 4
197+
%arrayidx17 = getelementptr inbounds nuw [20 x [20 x [20 x i32]]], ptr @a, i64 0, i64 %indvars.iv44, i64 %indvars.iv40, i64 %indvars.iv
198+
store i32 %1, ptr %arrayidx17, align 4
199+
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
200+
%exitcond = icmp ne i64 %indvars.iv.next, 20
201+
br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
202+
203+
for.cond5.preheader:
204+
%indvars.iv40 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next41, %for.cond.cleanup7 ]
205+
br label %for.body8
206+
207+
for.cond.cleanup:
208+
ret void
209+
}

0 commit comments

Comments
 (0)