Skip to content

Commit 82cac1d

Browse files
committed
[LoopVectorize] Add support for vectorisation of simple early exit loops
This patch adds support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we only vectorise loops with the following criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 2. The early exit block dominates the latch block. 3. There are no loads after the early exit block. 4. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 5. We must be able to prove at compile-time that loops will not contain faulting loads. For point 5 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll
1 parent 496de32 commit 82cac1d

18 files changed

+3464
-88
lines changed

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,9 @@ class LoopAccessInfo {
587587
/// not legal to insert them.
588588
bool hasConvergentOp() const { return HasConvergentOp; }
589589

590+
/// Return true if the loop may fault due to memory accesses.
591+
bool mayFault() const { return LoopMayFault; }
592+
590593
const RuntimePointerChecking *getRuntimePointerChecking() const {
591594
return PtrRtChecking.get();
592595
}
@@ -608,6 +611,24 @@ class LoopAccessInfo {
608611
unsigned getNumStores() const { return NumStores; }
609612
unsigned getNumLoads() const { return NumLoads;}
610613

614+
/// Returns the block that exits early from the loop, if there is one.
615+
/// Otherwise returns nullptr.
616+
BasicBlock *getSpeculativeEarlyExitingBlock() const {
617+
return SpeculativeEarlyExitingBB;
618+
}
619+
620+
/// Returns the successor of the block that exits early from the loop, if
621+
/// there is one. Otherwise returns nullptr.
622+
BasicBlock *getSpeculativeEarlyExitBlock() const {
623+
return SpeculativeEarlyExitBB;
624+
}
625+
626+
/// Returns all blocks with a countable exit, i.e. the exit-not-taken count
627+
/// is known exactly at compile time.
628+
const SmallVector<BasicBlock *, 4> &getCountableEarlyExitingBlocks() const {
629+
return CountableEarlyExitBlocks;
630+
}
631+
611632
/// The diagnostics report generated for the analysis. E.g. why we
612633
/// couldn't analyze the loop.
613634
const OptimizationRemarkAnalysis *getReport() const { return Report.get(); }
@@ -659,6 +680,10 @@ class LoopAccessInfo {
659680
/// pass.
660681
bool canAnalyzeLoop();
661682

683+
/// Returns true if this is a supported early exit loop that we can analyze
684+
/// in this pass.
685+
bool isAnalyzableEarlyExitLoop();
686+
662687
/// Save the analysis remark.
663688
///
664689
/// LAA does not directly emits the remarks. Instead it stores it which the
@@ -696,6 +721,17 @@ class LoopAccessInfo {
696721
/// Cache the result of analyzeLoop.
697722
bool CanVecMem = false;
698723
bool HasConvergentOp = false;
724+
bool LoopMayFault = false;
725+
726+
/// Keeps track of the early-exiting block, if present.
727+
BasicBlock *SpeculativeEarlyExitingBB = nullptr;
728+
729+
/// Keeps track of the successor of the early-exiting block, if present.
730+
BasicBlock *SpeculativeEarlyExitBB = nullptr;
731+
732+
/// Keeps track of all the early exits with known or countable exit-not-taken
733+
/// counts.
734+
SmallVector<BasicBlock *, 4> CountableEarlyExitBlocks;
699735

700736
/// Indicator that there are non vectorizable stores to a uniform address.
701737
bool HasDependenceInvolvingLoopInvariantAddress = false;

llvm/include/llvm/Analysis/ScalarEvolution.h

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -892,9 +892,13 @@ class ScalarEvolution {
892892
/// Similar to getBackedgeTakenCount, except it will add a set of
893893
/// SCEV predicates to Predicates that are required to be true in order for
894894
/// the answer to be correct. Predicates can be checked with run-time
895-
/// checks and can be used to perform loop versioning.
896-
const SCEV *getPredicatedBackedgeTakenCount(const Loop *L,
897-
SmallVector<const SCEVPredicate *, 4> &Predicates);
895+
/// checks and can be used to perform loop versioning. If \p Speculative is
896+
/// true, this will attempt to return the speculative backedge count for loops
897+
/// with early exits. However, this is only possible if we can formulate an
898+
/// exact expression for the backedge count from the latch block.
899+
const SCEV *getPredicatedBackedgeTakenCount(
900+
const Loop *L, SmallVector<const SCEVPredicate *, 4> &Predicates,
901+
bool Speculative = false);
898902

899903
/// When successful, this returns a SCEVConstant that is greater than or equal
900904
/// to (i.e. a "conservative over-approximation") of the value returend by
@@ -912,6 +916,12 @@ class ScalarEvolution {
912916
return getBackedgeTakenCount(L, SymbolicMaximum);
913917
}
914918

919+
/// Return all the exiting blocks in with exact exit counts.
920+
void getExactExitingBlocks(const Loop *L,
921+
SmallVector<BasicBlock *, 4> *Blocks) {
922+
getBackedgeTakenInfo(L).getExactExitingBlocks(L, this, Blocks);
923+
}
924+
915925
/// Return true if the backedge taken count is either the value returned by
916926
/// getConstantMaxBackedgeTakenCount or zero.
917927
bool isBackedgeTakenCountMaxOrZero(const Loop *L);
@@ -1534,13 +1544,27 @@ class ScalarEvolution {
15341544
const SCEV *getExact(const Loop *L, ScalarEvolution *SE,
15351545
SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
15361546

1547+
/// Similar to the above, except we permit unknown exit counts from
1548+
/// non-latch exit blocks. Any such early exit blocks must dominate the
1549+
/// latch and so the returned expression represents the speculative, or
1550+
/// maximum possible, *backedge-taken* count of the loop. If there is no
1551+
/// exact exit count for the latch this function returns
1552+
/// SCEVCouldNotCompute.
1553+
const SCEV *getSpeculative(
1554+
const Loop *L, ScalarEvolution *SE,
1555+
SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
1556+
15371557
/// Return the number of times this loop exit may fall through to the back
15381558
/// edge, or SCEVCouldNotCompute. The loop is guaranteed not to exit via
15391559
/// this block before this number of iterations, but may exit via another
15401560
/// block.
15411561
const SCEV *getExact(const BasicBlock *ExitingBlock,
15421562
ScalarEvolution *SE) const;
15431563

1564+
/// Return all the exiting blocks in with exact exit counts.
1565+
void getExactExitingBlocks(const Loop *L, ScalarEvolution *SE,
1566+
SmallVector<BasicBlock *, 4> *Blocks) const;
1567+
15441568
/// Get the constant max backedge taken count for the loop.
15451569
const SCEV *getConstantMax(ScalarEvolution *SE) const;
15461570

@@ -2316,6 +2340,9 @@ class PredicatedScalarEvolution {
23162340
/// Get the (predicated) backedge count for the analyzed loop.
23172341
const SCEV *getBackedgeTakenCount();
23182342

2343+
/// Get the (predicated) speculative backedge count for the analyzed loop.
2344+
const SCEV *getSpeculativeBackedgeTakenCount();
2345+
23192346
/// Adds a new predicate.
23202347
void addPredicate(const SCEVPredicate &Pred);
23212348

@@ -2384,6 +2411,9 @@ class PredicatedScalarEvolution {
23842411

23852412
/// The backedge taken count.
23862413
const SCEV *BackedgeCount = nullptr;
2414+
2415+
/// The speculative backedge taken count.
2416+
const SCEV *SpeculativeBackedgeCount = nullptr;
23872417
};
23882418

23892419
template <> struct DenseMapInfo<ScalarEvolution::FoldID> {

llvm/include/llvm/IR/IRBuilder.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2503,6 +2503,13 @@ class IRBuilderBase {
25032503
return CreateShuffleVector(V, PoisonValue::get(V->getType()), Mask, Name);
25042504
}
25052505

2506+
Value *CreateCountTrailingZeroElems(Type *ResTy, Value *Mask,
2507+
const Twine &Name = "") {
2508+
return CreateIntrinsic(
2509+
Intrinsic::experimental_cttz_elts, {ResTy, Mask->getType()},
2510+
{Mask, getInt1(/*ZeroIsPoison=*/true)}, nullptr, Name);
2511+
}
2512+
25062513
Value *CreateExtractValue(Value *Agg, ArrayRef<unsigned> Idxs,
25072514
const Twine &Name = "") {
25082515
if (auto *V = Folder.FoldExtractValue(Agg, Idxs))

llvm/include/llvm/Support/GenericLoopInfo.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,10 @@ template <class BlockT, class LoopT> class LoopBase {
294294
/// Otherwise return null.
295295
BlockT *getUniqueExitBlock() const;
296296

297+
/// Return the exit block for the latch if one exists. This function assumes
298+
/// the loop has a latch.
299+
BlockT *getLatchExitBlock() const;
300+
297301
/// Return true if this loop does not have any exit blocks.
298302
bool hasNoExitBlocks() const;
299303

llvm/include/llvm/Support/GenericLoopInfoImpl.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,16 @@ BlockT *LoopBase<BlockT, LoopT>::getUniqueExitBlock() const {
159159
return getExitBlockHelper(this, true).first;
160160
}
161161

162+
template <class BlockT, class LoopT>
163+
BlockT *LoopBase<BlockT, LoopT>::getLatchExitBlock() const {
164+
BlockT *Latch = getLoopLatch();
165+
assert(Latch && "Latch block must exists");
166+
for (BlockT *Successor : children<BlockT *>(Latch))
167+
if (!contains(Successor))
168+
return Successor;
169+
return nullptr;
170+
}
171+
162172
/// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
163173
template <class BlockT, class LoopT>
164174
void LoopBase<BlockT, LoopT>::getExitEdges(

llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,11 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
124124
/// "expanded" form.
125125
bool LSRMode;
126126

127+
/// If the loop has an early exit we may have to use the speculative backedge
128+
/// count, since the normal backedge count function is unable to compute a
129+
/// SCEV expression.
130+
bool UseSpeculativeBackedgeCount;
131+
127132
typedef IRBuilder<InstSimplifyFolder, IRBuilderCallbackInserter> BuilderType;
128133
BuilderType Builder;
129134

@@ -176,10 +181,12 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
176181
public:
177182
/// Construct a SCEVExpander in "canonical" mode.
178183
explicit SCEVExpander(ScalarEvolution &se, const DataLayout &DL,
179-
const char *name, bool PreserveLCSSA = true)
184+
const char *name, bool PreserveLCSSA = true,
185+
bool UseSpeculativeBackedgeCount = false)
180186
: SE(se), DL(DL), IVName(name), PreserveLCSSA(PreserveLCSSA),
181187
IVIncInsertLoop(nullptr), IVIncInsertPos(nullptr), CanonicalMode(true),
182188
LSRMode(false),
189+
UseSpeculativeBackedgeCount(UseSpeculativeBackedgeCount),
183190
Builder(se.getContext(), InstSimplifyFolder(DL),
184191
IRBuilderCallbackInserter(
185192
[this](Instruction *I) { rememberInstruction(I); })) {

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,24 @@ class LoopVectorizationLegality {
374374
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
375375
}
376376

377+
/// Returns true if the loop has a early exit with a exact backedge
378+
/// count that is speculative.
379+
bool hasSpeculativeEarlyExit() const {
380+
return LAI && LAI->getSpeculativeEarlyExitingBlock();
381+
}
382+
383+
/// Returns the early exiting block in a loop with a speculative backedge
384+
/// count.
385+
BasicBlock *getSpeculativeEarlyExitingBlock() const {
386+
return LAI->getSpeculativeEarlyExitingBlock();
387+
}
388+
389+
/// Returns the destination of an early exiting block in a loop with a
390+
/// speculative backedge count.
391+
BasicBlock *getSpeculativeEarlyExitBlock() const {
392+
return LAI->getSpeculativeEarlyExitBlock();
393+
}
394+
377395
/// Returns true if vector representation of the instruction \p I
378396
/// requires mask.
379397
bool isMaskRequired(const Instruction *I) const {

0 commit comments

Comments
 (0)