Skip to content

[VPlan] Refactor VPlan creation, add transform introducing region (NFC). #128419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 9, 2025
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
VectorCombine.cpp
VPlan.cpp
VPlanAnalysis.cpp
VPlanConstruction.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should files designed to hold VPlan tranforms be called VPlanTransforms*.cpp to help classify them in the absence of a subdirectory, as in VPlanTransformUnroll, VPlanTransformPredicate, etc.,?

What transform(s) are destines to reside in this file - all the preliminary Scalar Transforms according to roadmap (VPlanTransformsScalar.cpp?), or only some of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should files designed to hold VPlan tranforms be called VPlanTransforms*.cpp to help classify them in the absence of a subdirectory, as in VPlanTransformUnroll, VPlanTransformPredicate, etc.,?

Sounds good! We could also introduce a subfolder?

What transform(s) are destines to reside in this file - all the preliminary Scalar Transforms according to roadmap (VPlanTransformsScalar.cpp?), or only some of them?

In the short term, in addition also the code to build the initial VPlan (i.e. what currently resides in VPHCFGBuilder) and possibly recipe widening?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's go with VPlanConstruction.cpp for now, which expresses what it's about rather than how (may or may not involve VPlanTransforms).

VPlanHCFGBuilder.cpp
VPlanRecipes.cpp
VPlanSLP.cpp
Expand Down
18 changes: 11 additions & 7 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9312,14 +9312,17 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
return !CM.requiresScalarEpilogue(VF.isVector());
},
Range);
VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

auto Plan = std::make_unique<VPlan>(OrigLoop);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above explanation regarding "Create initial VPlan skeleton, having ..." remains intact, this change only affects how to get there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think so.

// Build hierarchical CFG.
// Convert to VPlan-transform and consoliate all transforms for VPlan
// creation.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();

VPlanTransforms::introduceTopLevelVectorLoopRegion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this introduce all regions, i.e., lifting a flat CFG into a hierarchical one? With the inverse lowing conversion taking place at the end, to simplify code-gen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I added a TODO for a follow-up. As this is for now only needed in the native path, it's probably best to do it separately.

*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
Expand Down Expand Up @@ -9615,13 +9618,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
true, false, OrigLoop);

auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();

VPlanTransforms::introduceTopLevelVectorLoopRegion(
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);

for (ElementCount VF : Range)
Plan->addVF(VF);

Expand Down
91 changes: 7 additions & 84 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -880,85 +880,6 @@ VPlan::~VPlan() {
delete BackedgeTakenCount;
}

VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop) {
auto Plan = std::make_unique<VPlan>(TheLoop);
VPBlockBase *ScalarHeader = Plan->getScalarHeader();

// Connect entry only to vector preheader initially. Entry will also be
// connected to the scalar preheader later, during skeleton creation when
// runtime guards are added as needed. Note that when executing the VPlan for
// an epilogue vector loop, the original entry block here will be replaced by
// a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
// generating code for the main vector loop.
VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
// vectorizing loops with uncountable early exits.
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
"Invalid loop count");
ScalarEvolution &SE = *PSE.getSE();
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
InductionTy, TheLoop);
Plan->TripCount =
vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);

// Create VPRegionBlock, with empty header and latch blocks, to be filled
// during processing later.
VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
auto *TopRegion = Plan->createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);

VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);

VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
return Plan;
}

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 2) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
VPIRBasicBlock *VPExitBlock = Plan->getExitBlock(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);

auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
VPBuilder Builder(MiddleVPBB);
VPValue *Cmp =
TailFolded
? Plan->getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
: Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
&Plan->getVectorTripCount(),
ScalarLatchTerm->getDebugLoc(), "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
ScalarLatchTerm->getDebugLoc());
return Plan;
}

void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
VPTransformState &State) {
Type *TCTy = TripCountV->getType();
Expand Down Expand Up @@ -1135,11 +1056,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
}

O << "\n";
if (TripCount->isLiveIn())
O << "Live-in ";
TripCount->printAsOperand(O, SlotTracker);
O << " = original trip-count";
O << "\n";
if (TripCount) {
if (TripCount->isLiveIn())
O << "Live-in ";
TripCount->printAsOperand(O, SlotTracker);
O << " = original trip-count";
O << "\n";
}
}

LLVM_DUMP_METHOD
Expand Down
24 changes: 8 additions & 16 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3503,21 +3503,6 @@ class VPlan {
VPBB->setPlan(this);
}

/// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
/// original scalar pre-header) which contains SCEV expansions that need
/// to happen before the CFG is modified (when executing a VPlan for the
/// epilogue vector loop, the original entry needs to be replaced by a new
/// one); a VPBasicBlock for the vector pre-header, followed by a region for
/// the vector loop, followed by the middle VPBasicBlock. If a check is needed
/// to guard executing the scalar epilogue loop, it will be added to the
/// middle block, together with VPBasicBlocks for the scalar preheader and
/// exit blocks. \p InductionTy is the type of the canonical induction and
/// used for related values, like the trip count expression.
static VPlanPtr createInitialVPlan(Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop);

/// Prepare the plan for execution, setting up the required live-in values.
void prepareToExecute(Value *TripCount, Value *VectorTripCount,
VPTransformState &State);
Expand Down Expand Up @@ -3579,11 +3564,18 @@ class VPlan {
return TripCount;
}

/// Set the trip count assuming it is currently null; if it is not - use
/// resetTripCount().
void setTripCount(VPValue *NewTripCount) {
assert(!TripCount && NewTripCount && "TripCount should not be set yet.");
TripCount = NewTripCount;
}

/// Resets the trip count for the VPlan. The caller must make sure all uses of
/// the original trip count have been replaced.
void resetTripCount(VPValue *NewTripCount) {
assert(TripCount && NewTripCount && TripCount->getNumUsers() == 0 &&
"TripCount always must be set");
"TripCount must be set when resetting");
TripCount = NewTripCount;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above error message that "TripCount always must be set" needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

}

Expand Down
101 changes: 101 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
//===-- VPlanConstruction.cpp - Transforms for initial VPlan construction -===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
///
/// \file
/// This file implements transforms for initial VPlan construction.
///
//===----------------------------------------------------------------------===//

#include "LoopVectorizationPlanner.h"
#include "VPlan.h"
#include "VPlanCFG.h"
#include "VPlanTransforms.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"

using namespace llvm;

void VPlanTransforms::introduceTopLevelVectorLoopRegion(
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
// TODO: Generalize to introduce all loop regions.
auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);

VPBasicBlock *OriginalLatch =
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
assert(OriginalLatch->getNumSuccessors() == 0 &&
"Plan should end at top level latch");

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
// vectorizing loops with uncountable early exits.
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
"Invalid loop count");
ScalarEvolution &SE = *PSE.getSE();
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
InductionTy, TheLoop);
Plan.setTripCount(
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));

// Create VPRegionBlock, with existing header and new empty latch block, to be
// filled.
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
auto *TopRegion = Plan.createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop,
// because VPlan is expected to end at top level latch.
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);
Comment on lines +58 to +59
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to justify setting parenthood by explaining something like

Suggested change
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop, because VPlan is expected to end at top level latch.
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!


VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);

VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
return;
}

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 2) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);

auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
VPBuilder Builder(MiddleVPBB);
VPValue *Cmp =
TailFolded
? Plan.getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
: Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
&Plan.getVectorTripCount(),
ScalarLatchTerm->getDebugLoc(), "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
ScalarLatchTerm->getDebugLoc());
}
Loading
Loading