Skip to content

Commit 0200626

Browse files
committed
[IndVars] An implementation of loop predication without a need for speculation
This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits. This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms). I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops. Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars. One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops. At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform. Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default. I also need to give some though to supporting widenable conditions in this framing. Differential Revision: https://reviews.llvm.org/D67408 llvm-svn: 373351
1 parent 9dba603 commit 0200626

File tree

2 files changed

+928
-12
lines changed

2 files changed

+928
-12
lines changed

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Lines changed: 138 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,11 @@ static cl::opt<bool>
124124
DisableLFTR("disable-lftr", cl::Hidden, cl::init(false),
125125
cl::desc("Disable Linear Function Test Replace optimization"));
126126

127+
static cl::opt<bool>
128+
LoopPredication("indvars-predicate-loops", cl::Hidden, cl::init(false),
129+
cl::desc("Predicate conditions in read only loops"));
130+
131+
127132
namespace {
128133

129134
struct RewritePhi;
@@ -144,7 +149,7 @@ class IndVarSimplify {
144149
bool rewriteNonIntegerIVs(Loop *L);
145150

146151
bool simplifyAndExtend(Loop *L, SCEVExpander &Rewriter, LoopInfo *LI);
147-
bool optimizeLoopExits(Loop *L);
152+
bool optimizeLoopExits(Loop *L, SCEVExpander &Rewriter);
148153

149154
bool canLoopBeDeleted(Loop *L, SmallVector<RewritePhi, 8> &RewritePhiSet);
150155
bool rewriteLoopExitValues(Loop *L, SCEVExpander &Rewriter);
@@ -2641,7 +2646,7 @@ bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
26412646
return MadeAnyChanges;
26422647
}
26432648

2644-
bool IndVarSimplify::optimizeLoopExits(Loop *L) {
2649+
bool IndVarSimplify::optimizeLoopExits(Loop *L, SCEVExpander &Rewriter) {
26452650
SmallVector<BasicBlock*, 16> ExitingBlocks;
26462651
L->getExitingBlocks(ExitingBlocks);
26472652

@@ -2719,8 +2724,7 @@ bool IndVarSimplify::optimizeLoopExits(Loop *L) {
27192724
assert(MaxExitCount->getType() == ExitCount->getType());
27202725

27212726
// Can we prove that some other exit must be taken strictly before this
2722-
// one? TODO: handle cases where ule is known, and equality is covered
2723-
// by a dominating exit
2727+
// one?
27242728
if (SE->isLoopEntryGuardedByCond(L, CmpInst::ICMP_ULT,
27252729
MaxExitCount, ExitCount)) {
27262730
bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
@@ -2737,14 +2741,136 @@ bool IndVarSimplify::optimizeLoopExits(Loop *L) {
27372741
// TODO: If we can prove that the exiting iteration is equal to the exit
27382742
// count for this exit and that no previous exit oppurtunities exist within
27392743
// the loop, then we can discharge all other exits. (May fall out of
2740-
// previous TODO.)
2741-
2742-
// TODO: If we can't prove any relation between our exit count and the
2743-
// loops exit count, but taking this exit doesn't require actually running
2744-
// the loop (i.e. no side effects, no computed values used in exit), then
2745-
// we can replace the exit test with a loop invariant test which exits on
2746-
// the first iteration.
2744+
// previous TODO.)
27472745
}
2746+
2747+
// Finally, see if we can rewrite our exit conditions into a loop invariant
2748+
// form. If we have a read-only loop, and we can tell that we must exit down
2749+
// a path which does not need any of the values computed within the loop, we
2750+
// can rewrite the loop to exit on the first iteration. Note that this
2751+
// doesn't either a) tell us the loop exits on the first iteration (unless
2752+
// *all* exits are predicateable) or b) tell us *which* exit might be taken.
2753+
// This transformation looks a lot like a restricted form of dead loop
2754+
// elimination, but restricted to read-only loops and without neccesssarily
2755+
// needing to kill the loop entirely.
2756+
if (!LoopPredication)
2757+
return Changed;
2758+
2759+
if (!SE->hasLoopInvariantBackedgeTakenCount(L))
2760+
return Changed;
2761+
2762+
// Note: ExactBTC is the exact backedge taken count *iff* the loop exits
2763+
// through *explicit* control flow. We have to eliminate the possibility of
2764+
// implicit exits (see below) before we know it's truly exact.
2765+
const SCEV *ExactBTC = SE->getBackedgeTakenCount(L);
2766+
if (isa<SCEVCouldNotCompute>(ExactBTC) ||
2767+
!SE->isLoopInvariant(ExactBTC, L) ||
2768+
!isSafeToExpand(ExactBTC, *SE))
2769+
return Changed;
2770+
2771+
auto Filter = [&](BasicBlock *ExitingBB) {
2772+
// If our exiting block exits multiple loops, we can only rewrite the
2773+
// innermost one. Otherwise, we're changing how many times the innermost
2774+
// loop runs before it exits.
2775+
if (LI->getLoopFor(ExitingBB) != L)
2776+
return true;
2777+
2778+
// Can't rewrite non-branch yet.
2779+
BranchInst *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
2780+
if (!BI)
2781+
return true;
2782+
2783+
// If already constant, nothing to do.
2784+
if (isa<Constant>(BI->getCondition()))
2785+
return true;
2786+
2787+
// If the exit block has phis, we need to be able to compute the values
2788+
// within the loop which contains them. This assumes trivially lcssa phis
2789+
// have already been removed; TODO: generalize
2790+
BasicBlock *ExitBlock =
2791+
BI->getSuccessor(L->contains(BI->getSuccessor(0)) ? 1 : 0);
2792+
if (!empty(ExitBlock->phis()))
2793+
return true;
2794+
2795+
const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
2796+
assert(!isa<SCEVCouldNotCompute>(ExactBTC) && "implied by having exact trip count");
2797+
if (!SE->isLoopInvariant(ExitCount, L) ||
2798+
!isSafeToExpand(ExitCount, *SE))
2799+
return true;
2800+
2801+
return false;
2802+
};
2803+
auto Erased = std::remove_if(ExitingBlocks.begin(), ExitingBlocks.end(),
2804+
Filter);
2805+
ExitingBlocks.erase(Erased, ExitingBlocks.end());
2806+
2807+
if (ExitingBlocks.empty())
2808+
return Changed;
2809+
2810+
// We rely on not being able to reach an exiting block on a later iteration
2811+
// than it's statically compute exit count. The implementaton of
2812+
// getExitCount currently has this invariant, but assert it here so that
2813+
// breakage is obvious if this ever changes..
2814+
assert(llvm::all_of(ExitingBlocks, [&](BasicBlock *ExitingBB) {
2815+
return DT->dominates(ExitingBB, L->getLoopLatch());
2816+
}));
2817+
2818+
// At this point, ExitingBlocks consists of only those blocks which are
2819+
// predicatable. Given that, we know we have at least one exit we can
2820+
// predicate if the loop is doesn't have side effects and doesn't have any
2821+
// implicit exits (because then our exact BTC isn't actually exact).
2822+
// @Reviewers - As structured, this is O(I^2) for loop nests. Any
2823+
// suggestions on how to improve this? I can obviously bail out for outer
2824+
// loops, but that seems less than ideal. MemorySSA can find memory writes,
2825+
// is that enough for *all* side effects?
2826+
for (BasicBlock *BB : L->blocks())
2827+
for (auto &I : *BB)
2828+
// TODO:isGuaranteedToTransfer
2829+
if (I.mayHaveSideEffects() || I.mayThrow())
2830+
return Changed;
2831+
2832+
// Finally, do the actual predication for all predicatable blocks. A couple
2833+
// of notes here:
2834+
// 1) We don't bother to constant fold dominated exits with identical exit
2835+
// counts; that's simply a form of CSE/equality propagation and we leave
2836+
// it for dedicated passes.
2837+
// 2) We insert the comparison at the branch. Hoisting introduces additional
2838+
// legality constraints and we leave that to dedicated logic. We want to
2839+
// predicate even if we can't insert a loop invariant expression as
2840+
// peeling or unrolling will likely reduce the cost of the otherwise loop
2841+
// varying check.
2842+
Rewriter.setInsertPoint(L->getLoopPreheader()->getTerminator());
2843+
IRBuilder<> B(L->getLoopPreheader()->getTerminator());
2844+
Value *ExactBTCV = nullptr; //lazy generated if needed
2845+
for (BasicBlock *ExitingBB : ExitingBlocks) {
2846+
const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
2847+
2848+
auto *BI = cast<BranchInst>(ExitingBB->getTerminator());
2849+
Value *NewCond;
2850+
if (ExitCount == ExactBTC) {
2851+
NewCond = L->contains(BI->getSuccessor(0)) ?
2852+
B.getFalse() : B.getTrue();
2853+
} else {
2854+
Value *ECV = Rewriter.expandCodeFor(ExitCount);
2855+
if (!ExactBTCV)
2856+
ExactBTCV = Rewriter.expandCodeFor(ExactBTC);
2857+
Value *RHS = ExactBTCV;
2858+
if (ECV->getType() != RHS->getType()) {
2859+
Type *WiderTy = SE->getWiderType(ECV->getType(), RHS->getType());
2860+
ECV = B.CreateZExt(ECV, WiderTy);
2861+
RHS = B.CreateZExt(RHS, WiderTy);
2862+
}
2863+
auto Pred = L->contains(BI->getSuccessor(0)) ?
2864+
ICmpInst::ICMP_NE : ICmpInst::ICMP_EQ;
2865+
NewCond = B.CreateICmp(Pred, ECV, RHS);
2866+
}
2867+
Value *OldCond = BI->getCondition();
2868+
BI->setCondition(NewCond);
2869+
if (OldCond->use_empty())
2870+
DeadInsts.push_back(OldCond);
2871+
Changed = true;
2872+
}
2873+
27482874
return Changed;
27492875
}
27502876

@@ -2804,7 +2930,7 @@ bool IndVarSimplify::run(Loop *L) {
28042930
// Eliminate redundant IV cycles.
28052931
NumElimIV += Rewriter.replaceCongruentIVs(L, DT, DeadInsts);
28062932

2807-
Changed |= optimizeLoopExits(L);
2933+
Changed |= optimizeLoopExits(L, Rewriter);
28082934

28092935
// If we have a trip count expression, rewrite the loop's exit condition
28102936
// using it.

0 commit comments

Comments
 (0)