Skip to content

Commit 4826143

Browse files
committed
[LoopVectorize] Enable more early exit vectorisation tests
PR llvm#112138 introduced initial support for dispatching to multiple exit blocks via split middle blocks. This patch fixes a few issues so that we can enable more tests to use the new enable-early-exit-vectorization flag. Fixes are: 1. The code to bail out for any loop live-out values happens too late. This is because collectUsersInExitBlocks ignores induction variables, which get dealt with in fixupIVUsers. I've moved the check much earlier in processLoop by looking for outside users of loop-defined values. 2. We shouldn't yet be interleaving when vectorising loops with uncountable early exits, since we've not added support for this yet. 3. Similarly, we also shouldn't be creating vector epilogues. 4. Similarly, we shouldn't enable tail-folding. 5. The existing implementation doesn't yet support loops that require scalar epilogues, although I plan to add that as part of PR llvm#88385. 6. The new split middle blocks weren't being added to the parent loop. 7. VPIRInstruction::execute assumed that the VPIRBasicBlock predecessors correspond like-for-like with the predecessors of the scalar exit block prior to vectorisation. For example, collectUsersInExitBlocks adds the operands to the VPIRInstruction in the order returned by predecessors(ExitBB), whereas VPIRInstruction::execute processes the operands in order of the VPIRBasicBlock predecessors. There is absolutely no guarantee that they match up, which in some cases (such as the yacr2 test in the LLVM test suite) they don't. I've fixed this by maintaining the old behaviour when there is a single operand, and when there are 2 or more operands we use the same ordering as the BasicBlock predecessors.
1 parent b1a40e4 commit 4826143

14 files changed

+453
-48
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1375,9 +1375,12 @@ bool LoopVectorizationLegality::isFixedOrderRecurrence(
13751375
}
13761376

13771377
bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) const {
1378-
// When vectorizing early exits, create predicates for all blocks, except the
1379-
// header.
1380-
if (hasUncountableEarlyExit() && BB != TheLoop->getHeader())
1378+
// The only block currently permitted after the early exiting block is the
1379+
// loop latch, so only that blocks needs predication.
1380+
// FIXME: Once we support instructions in the loop that cannot be executed
1381+
// speculatively, such as stores, we will also need to predicate all blocks
1382+
// leading up to the early exit too.
1383+
if (hasUncountableEarlyExit() && BB == TheLoop->getLoopLatch())
13811384
return true;
13821385
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
13831386
}

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 63 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2945,6 +2945,21 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
29452945
PSE.getSE()->forgetLoop(OrigLoop);
29462946
PSE.getSE()->forgetBlockAndLoopDispositions();
29472947

2948+
// When dealing with uncountable early exits we create middle.split blocks
2949+
// between the vector loop region and the exit block. These blocks need
2950+
// adding to any outer loop.
2951+
VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
2952+
Loop *OuterLoop = OrigLoop->getParentLoop();
2953+
if (Legal->hasUncountableEarlyExit() && OuterLoop) {
2954+
VPBasicBlock *MiddleVPBB = State.Plan->getMiddleBlock();
2955+
VPBlockBase *PredVPBB = MiddleVPBB->getSinglePredecessor();
2956+
while (PredVPBB && PredVPBB != VectorRegion) {
2957+
BasicBlock *MiddleSplitBB = State.CFG.VPBB2IRBB[cast<VPBasicBlock>(PredVPBB)];
2958+
OuterLoop->addBasicBlockToLoop(MiddleSplitBB, *LI);
2959+
PredVPBB = PredVPBB->getSinglePredecessor();
2960+
}
2961+
}
2962+
29482963
// After vectorization, the exit blocks of the original loop will have
29492964
// additional predecessors. Invalidate SCEVs for the exit phis in case SE
29502965
// looked through single-entry phis.
@@ -2975,7 +2990,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
29752990
for (Instruction *PI : PredicatedInstructions)
29762991
sinkScalarOperands(&*PI);
29772992

2978-
VPRegionBlock *VectorRegion = State.Plan->getVectorLoopRegion();
29792993
VPBasicBlock *HeaderVPBB = VectorRegion->getEntryBasicBlock();
29802994
BasicBlock *HeaderBB = State.CFG.VPBB2IRBB[HeaderVPBB];
29812995

@@ -4051,7 +4065,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
40514065
// a bottom-test and a single exiting block. We'd have to handle the fact
40524066
// that not every instruction executes on the last iteration. This will
40534067
// require a lane mask which varies through the vector loop body. (TODO)
4054-
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
4068+
if (Legal->hasUncountableEarlyExit() ||
4069+
TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
40554070
// If there was a tail-folding hint/switch, but we can't fold the tail by
40564071
// masking, fallback to a vectorization with a scalar epilogue.
40574072
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
@@ -4670,7 +4685,9 @@ bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
46704685
// Epilogue vectorization code has not been auditted to ensure it handles
46714686
// non-latch exits properly. It may be fine, but it needs auditted and
46724687
// tested.
4673-
if (OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
4688+
// TODO: Add support for loops with an early exit.
4689+
if (Legal->hasUncountableEarlyExit() ||
4690+
OrigLoop->getExitingBlock() != OrigLoop->getLoopLatch())
46744691
return false;
46754692

46764693
return true;
@@ -4920,6 +4937,10 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
49204937
if (!Legal->isSafeForAnyVectorWidth())
49214938
return 1;
49224939

4940+
// We don't attempt to perform interleaving for early exit loops.
4941+
if (Legal->hasUncountableEarlyExit())
4942+
return 1;
4943+
49234944
auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
49244945
const bool HasReductions = !Legal->getReductionVars().empty();
49254946

@@ -7753,11 +7774,14 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
77537774

77547775
// 2.5 Collect reduction resume values.
77557776
auto *ExitVPBB = BestVPlan.getMiddleBlock();
7756-
if (VectorizingEpilogue)
7777+
if (VectorizingEpilogue) {
7778+
assert(!ILV.Legal->hasUncountableEarlyExit() &&
7779+
"Epilogue vectorisation not yet supported with early exits");
77577780
for (VPRecipeBase &R : *ExitVPBB) {
77587781
fixReductionScalarResumeWhenVectorizingEpilog(
77597782
&R, State, State.CFG.VPBB2IRBB[ExitVPBB]);
77607783
}
7784+
}
77617785

77627786
// 2.6. Maintain Loop Hints
77637787
// Keep all loop hints from the original loop on the vector loop (we'll
@@ -9227,21 +9251,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
92279251
addExitUsersForFirstOrderRecurrences(*Plan, ExitUsersToFix);
92289252
addUsersInExitBlocks(*Plan, ExitUsersToFix);
92299253

9230-
// Currently only live-ins can be used by exit values. We also bail out if any
9231-
// exit value isn't handled in VPlan yet, i.e. a VPIRInstruction in the exit
9232-
// without any operands.
9233-
if (Legal->hasUncountableEarlyExit()) {
9234-
if (any_of(Plan->getExitBlocks(), [](VPIRBasicBlock *ExitBB) {
9235-
return any_of(*ExitBB, [](VPRecipeBase &R) {
9236-
auto VPIRI = cast<VPIRInstruction>(&R);
9237-
return VPIRI->getNumOperands() == 0 ||
9238-
any_of(VPIRI->operands(),
9239-
[](VPValue *Op) { return !Op->isLiveIn(); });
9240-
});
9241-
}))
9242-
return nullptr;
9243-
}
9244-
92459254
// ---------------------------------------------------------------------------
92469255
// Transform initial VPlan: Apply previously taken decisions, in order, to
92479256
// bring the VPlan to its final state.
@@ -10003,13 +10012,29 @@ bool LoopVectorizePass::processLoop(Loop *L) {
1000310012
if (LVL.hasUncountableEarlyExit()) {
1000410013
if (!EnableEarlyExitVectorization) {
1000510014
reportVectorizationFailure("Auto-vectorization of loops with uncountable "
10006-
"early exit is not yet supported",
10015+
"early exit is disabled",
1000710016
"Auto-vectorization of loops with uncountable "
10008-
"early exit is not yet supported",
10009-
"UncountableEarlyExitLoopsUnsupported", ORE,
10017+
"early exit is disabled",
10018+
"UncountableEarlyExitLoopsDisabled", ORE,
1001010019
L);
1001110020
return false;
1001210021
}
10022+
for (BasicBlock *BB : L->blocks()) {
10023+
for (Instruction &I : *BB) {
10024+
for (User *U : I.users()) {
10025+
Instruction *UI = cast<Instruction>(U);
10026+
if (!L->contains(UI)) {
10027+
reportVectorizationFailure(
10028+
"Auto-vectorization of loops with uncountable "
10029+
"early exit and live-outs is not yet supported",
10030+
"Auto-vectorization of loop with uncountable "
10031+
"early exit and live-outs is not yet supported",
10032+
"UncountableEarlyExitLoopLiveOutsUnsupported", ORE, L);
10033+
return false;
10034+
}
10035+
}
10036+
}
10037+
}
1001310038
}
1001410039

1001510040
// Entrance to the VPlan-native vectorization path. Outer loops are processed
@@ -10026,6 +10051,22 @@ bool LoopVectorizePass::processLoop(Loop *L) {
1002610051
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());
1002710052
bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
1002810053

10054+
if (LVL.hasUncountableEarlyExit()) {
10055+
BasicBlock *LoopLatch = L->getLoopLatch();
10056+
if (IAI.requiresScalarEpilogue() ||
10057+
llvm::any_of(LVL.getCountableExitingBlocks(), [LoopLatch](BasicBlock *BB) {
10058+
return BB != LoopLatch;
10059+
})) {
10060+
reportVectorizationFailure("Auto-vectorization of early exit loops "
10061+
"requiring a scalar epilogue is unsupported",
10062+
"Auto-vectorization of early exit loops "
10063+
"requiring a scalar epilogue is unsupported",
10064+
"UncountableEarlyExitUnsupported", ORE,
10065+
L);
10066+
return false;
10067+
}
10068+
}
10069+
1002910070
// If an override option has been passed in for interleaved accesses, use it.
1003010071
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
1003110072
UseInterleaved = EnableInterleavedMemAccesses;

llvm/lib/Transforms/Vectorize/VPlan.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3762,6 +3762,10 @@ class VPlan {
37623762
/// been modeled in VPlan directly.
37633763
DenseMap<const SCEV *, VPValue *> SCEVToExpansion;
37643764

3765+
/// Mapping from the middle.split VPBasicBlock to the original early exiting
3766+
/// block.
3767+
DenseMap<BasicBlock *, VPBlockBase *> EarlyExitingBlocks;
3768+
37653769
public:
37663770
/// Construct a VPlan with original preheader \p Preheader, trip count \p TC,
37673771
/// \p Entry to the plan and with \p ScalarHeader wrapping the original header
@@ -3842,6 +3846,22 @@ class VPlan {
38423846
/// header.
38433847
auto getExitBlocks();
38443848

3849+
/// Add a mapping of the exiting BasicBlock to the exiting VPBlockBase, which
3850+
/// is essentially the middle.split block used for uncountable early exits.
3851+
void addEarlyExitingBlockToMap(VPBlockBase *VPBB, BasicBlock *BB) {
3852+
EarlyExitingBlocks[BB] = VPBB;
3853+
}
3854+
3855+
/// Return the exiting VPBlockBase, i.e. the middle.split block, that
3856+
/// corresponds to the original loop's exiting block.
3857+
VPBlockBase *getExitingBlock(BasicBlock *BB) {
3858+
auto I = EarlyExitingBlocks.find(BB);
3859+
// If there is no entry for this block it must be the middle block.
3860+
if (I == EarlyExitingBlocks.end())
3861+
return getMiddleBlock();
3862+
return I->second;
3863+
}
3864+
38453865
/// The trip count of the original loop.
38463866
VPValue *getTripCount() const {
38473867
assert(TripCount && "trip count needs to be set before accessing it");

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -833,13 +833,25 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
833833
void VPIRInstruction::execute(VPTransformState &State) {
834834
assert((isa<PHINode>(&I) || getNumOperands() == 0) &&
835835
"Only PHINodes can have extra operands");
836+
BasicBlock *ExitBB = cast<VPIRBasicBlock>(getParent())->getIRBasicBlock();
837+
SmallVector<BasicBlock *, 4> OrigPreds(predecessors(ExitBB));
836838
for (const auto &[Idx, Op] : enumerate(operands())) {
837839
VPValue *ExitValue = Op;
838840
auto Lane = vputils::isUniformAfterVectorization(ExitValue)
839841
? VPLane::getFirstLane()
840842
: VPLane::getLastLaneForVF(State.VF);
841-
VPBlockBase *Pred = getParent()->getPredecessors()[Idx];
842-
auto *PredVPBB = Pred->getExitingBasicBlock();
843+
844+
VPBasicBlock *PredVPBB;
845+
// If there is just a single operand then we don't have to worry about
846+
// early exits and mapping the blocks.
847+
if (getNumOperands() == 1)
848+
PredVPBB = cast<VPBasicBlock>(getParent()->getSinglePredecessor());
849+
else {
850+
// The operands are ordered according to predecessors in the original
851+
// scalar loop.
852+
BasicBlock *OrigPredBB = OrigPreds[Idx];
853+
PredVPBB = cast<VPBasicBlock>(State.Plan->getExitingBlock(OrigPredBB));
854+
}
843855
BasicBlock *PredBB = State.CFG.VPBB2IRBB[PredVPBB];
844856
// Set insertion point in PredBB in case an extract needs to be generated.
845857
// TODO: Model extracts explicitly.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1839,6 +1839,11 @@ void VPlanTransforms::handleUncountableEarlyExit(
18391839
VPBlockUtils::connectBlocks(NewMiddle, VPExitBlock);
18401840
VPBlockUtils::connectBlocks(NewMiddle, MiddleVPBB);
18411841

1842+
// Establish a mapping between this new VPBasicBlock and the uncountable
1843+
// exiting block so that we can add incoming values to phis in the exit
1844+
// block correctly.
1845+
Plan.addEarlyExitingBlockToMap(NewMiddle, Exiting);
1846+
18421847
VPBuilder MiddleBuilder(NewMiddle);
18431848
MiddleBuilder.createNaryOp(VPInstruction::BranchOnCond, {EarlyExitTaken});
18441849
}

llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll

Lines changed: 55 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
2-
; RUN: opt -S < %s -p loop-vectorize | FileCheck %s --check-prefixes=CHECK
2+
; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s --check-prefixes=CHECK
33

44
target triple = "aarch64-unknown-linux-gnu"
55

@@ -272,22 +272,66 @@ define i32 @diff_exit_block_needs_scev_check(i32 %end) {
272272
; CHECK-NEXT: call void @init_mem(ptr [[P1]], i64 1024)
273273
; CHECK-NEXT: call void @init_mem(ptr [[P2]], i64 1024)
274274
; CHECK-NEXT: [[END_CLAMPED:%.*]] = and i32 [[END]], 1023
275+
; CHECK-NEXT: [[TMP19:%.*]] = trunc i32 [[END]] to i10
276+
; CHECK-NEXT: [[TMP20:%.*]] = zext i10 [[TMP19]] to i64
277+
; CHECK-NEXT: [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP20]], i64 1)
278+
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[UMAX1]], 12
279+
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
280+
; CHECK: vector.scevcheck:
281+
; CHECK-NEXT: [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[END_CLAMPED]], i32 1)
282+
; CHECK-NEXT: [[TMP2:%.*]] = add nsw i32 [[UMAX]], -1
283+
; CHECK-NEXT: [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
284+
; CHECK-NEXT: [[TMP4:%.*]] = add i8 1, [[TMP3]]
285+
; CHECK-NEXT: [[TMP5:%.*]] = icmp ult i8 [[TMP4]], 1
286+
; CHECK-NEXT: [[TMP6:%.*]] = icmp ugt i32 [[TMP2]], 255
287+
; CHECK-NEXT: [[TMP7:%.*]] = or i1 [[TMP5]], [[TMP6]]
288+
; CHECK-NEXT: br i1 [[TMP7]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
289+
; CHECK: vector.ph:
290+
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[UMAX1]], 4
291+
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[UMAX1]], [[N_MOD_VF]]
292+
; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i8
275293
; CHECK-NEXT: br label [[FOR_BODY1:%.*]]
294+
; CHECK: vector.body:
295+
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[FOR_BODY1]] ]
296+
; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0
297+
; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[P1]], i64 [[TMP8]]
298+
; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i32 0
299+
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP10]], align 4
300+
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[P2]], i64 [[TMP8]]
301+
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i32 0
302+
; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP12]], align 4
303+
; CHECK-NEXT: [[TMP13:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD3]]
304+
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i1> [[TMP13]], splat (i1 true)
305+
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
306+
; CHECK-NEXT: [[TMP15:%.*]] = xor <4 x i1> [[TMP14]], splat (i1 true)
307+
; CHECK-NEXT: [[TMP16:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP15]])
308+
; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
309+
; CHECK-NEXT: [[TMP18:%.*]] = or i1 [[TMP16]], [[TMP17]]
310+
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_SPLIT:%.*]], label [[FOR_BODY1]], !llvm.loop [[LOOP0:![0-9]+]]
311+
; CHECK: middle.split:
312+
; CHECK-NEXT: br i1 [[TMP16]], label [[FOUND:%.*]], label [[MIDDLE_BLOCK:%.*]]
313+
; CHECK: middle.block:
314+
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[UMAX1]], [[N_VEC]]
315+
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
316+
; CHECK: scalar.ph:
317+
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
318+
; CHECK-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
319+
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
276320
; CHECK: for.body:
277-
; CHECK-NEXT: [[IND:%.*]] = phi i8 [ [[IND_NEXT:%.*]], [[FOR_INC:%.*]] ], [ 0, [[ENTRY:%.*]] ]
278-
; CHECK-NEXT: [[GEP_IND:%.*]] = phi i64 [ [[GEP_IND_NEXT:%.*]], [[FOR_INC]] ], [ 0, [[ENTRY]] ]
321+
; CHECK-NEXT: [[IND:%.*]] = phi i8 [ [[IND_NEXT:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
322+
; CHECK-NEXT: [[GEP_IND:%.*]] = phi i64 [ [[GEP_IND_NEXT:%.*]], [[FOR_INC]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
279323
; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[P1]], i64 [[GEP_IND]]
280324
; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX1]], align 4
281325
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[P2]], i64 [[GEP_IND]]
282326
; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
283327
; CHECK-NEXT: [[CMP_EARLY:%.*]] = icmp eq i32 [[TMP0]], [[TMP1]]
284-
; CHECK-NEXT: br i1 [[CMP_EARLY]], label [[FOUND:%.*]], label [[FOR_INC]]
328+
; CHECK-NEXT: br i1 [[CMP_EARLY]], label [[FOUND]], label [[FOR_INC]]
285329
; CHECK: for.inc:
286330
; CHECK-NEXT: [[IND_NEXT]] = add i8 [[IND]], 1
287331
; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[IND_NEXT]] to i32
288332
; CHECK-NEXT: [[GEP_IND_NEXT]] = add i64 [[GEP_IND]], 1
289333
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[CONV]], [[END_CLAMPED]]
290-
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY1]], label [[EXIT:%.*]]
334+
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
291335
; CHECK: found:
292336
; CHECK-NEXT: ret i32 1
293337
; CHECK: exit:
@@ -331,3 +375,9 @@ declare <vscale x 4 x i32> @foo_vec(<vscale x 4 x i32>)
331375

332376
attributes #0 = { "vector-function-abi-variant"="_ZGVsNxv_foo(foo_vec)" }
333377
attributes #1 = { "target-features"="+sve" vscale_range(1,16) }
378+
;.
379+
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
380+
; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
381+
; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
382+
; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]]}
383+
;.

llvm/test/Transforms/LoopVectorize/early_exit_legality.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ define i32 @diff_exit_block_needs_scev_check(i32 %end) {
1111
; CHECK-LABEL: LV: Checking a loop in 'diff_exit_block_needs_scev_check'
1212
; CHECK: Found an early exit loop with symbolic max backedge taken count: (-1 + (1 umax (zext i10 (trunc i32 %end to i10) to i32)))<nsw>
1313
; CHECK-NEXT: LV: We can vectorize this loop!
14-
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is not yet supported.
14+
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is disabled.
1515
entry:
1616
%p1 = alloca [1024 x i32]
1717
%p2 = alloca [1024 x i32]
@@ -49,7 +49,7 @@ define i64 @same_exit_block_pre_inc_use1() {
4949
; CHECK-LABEL: LV: Checking a loop in 'same_exit_block_pre_inc_use1'
5050
; CHECK: LV: Found an early exit loop with symbolic max backedge taken count: 63
5151
; CHECK-NEXT: LV: We can vectorize this loop!
52-
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is not yet supported.
52+
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is disabled.
5353
entry:
5454
%p1 = alloca [1024 x i8]
5555
%p2 = alloca [1024 x i8]
@@ -141,7 +141,7 @@ define i64 @loop_contains_load_after_early_exit(ptr dereferenceable(1024) align(
141141
; CHECK-LABEL: LV: Checking a loop in 'loop_contains_load_after_early_exit'
142142
; CHECK: LV: Found an early exit loop with symbolic max backedge taken count: 63
143143
; CHECK-NEXT: LV: We can vectorize this loop!
144-
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is not yet supported.
144+
; CHECK-NEXT: LV: Not vectorizing: Auto-vectorization of loops with uncountable early exit is disabled.
145145
entry:
146146
%p1 = alloca [1024 x i8]
147147
call void @init_mem(ptr %p1, i64 1024)

llvm/test/Transforms/LoopVectorize/multi_early_exit.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
2-
; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
2+
; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
33

44
declare void @init_mem(ptr, i64);
55

llvm/test/Transforms/LoopVectorize/multi_early_exit_live_outs.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
2-
; RUN: opt -S < %s -p loop-vectorize | FileCheck %s
2+
; RUN: opt -S < %s -p loop-vectorize -enable-early-exit-vectorization | FileCheck %s
33

44
declare void @init_mem(ptr, i64);
55

0 commit comments

Comments
 (0)