Skip to content

[VPlan] Introduce all loop regions as VPlan transform. (NFC) #129402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 16, 2025
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9312,14 +9312,14 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
Range);
auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG.
// Convert to VPlan-transform and consoliate all transforms for VPlan
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
// TODO: Convert to VPlan-transform and consolidate all transforms for VPlan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

// creation.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
HCFGBuilder.buildPlainCFG();

VPlanTransforms::introduceTopLevelVectorLoopRegion(
*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);
VPlanTransforms::createLoopRegions(*Plan, Legal->getWidestInductionType(),
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
Expand Down Expand Up @@ -9619,10 +9619,10 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
HCFGBuilder.buildPlainCFG();

VPlanTransforms::introduceTopLevelVectorLoopRegion(
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
VPlanTransforms::createLoopRegions(*Plan, Legal->getWidestInductionType(),
PSE, true, false, OrigLoop);

for (ElementCount VF : Range)
Plan->addVF(VF);
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ class SCEV;
class Type;
class VPBasicBlock;
class VPBuilder;
class VPDominatorTree;
class VPRegionBlock;
class VPlan;
class VPLane;
Expand Down
108 changes: 83 additions & 25 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,26 +14,93 @@
#include "LoopVectorizationPlanner.h"
#include "VPlan.h"
#include "VPlanCFG.h"
#include "VPlanDominatorTree.h"
#include "VPlanTransforms.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"

using namespace llvm;

void VPlanTransforms::introduceTopLevelVectorLoopRegion(
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
// TODO: Generalize to introduce all loop regions.
auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
/// Checks if \p HeaderVPB is a loop header block in the plain CFG; that is, it
/// has exactly 2 predecessors (preheader and latch), where the block
/// dominates the latch and the preheader dominates the block. If it is a
/// header block, returns a pair with the corresponding preheader and latch
/// blocks. Otherwise return std::nullopt.
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>>
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having the plain CFG-based VPlan which mirrors the IR have consistent recipes and CFG where the predecessors of header blocks are ordered preheader, latch? That would simplify a bool isHeader(VPB, VPDT) which would have only one option to check, and also createLoopRegion(VPB) which could directly retrieve the preheader and latch from the header? And possibly support other transforms that may operate on CFG-based VPlan after its construction and before its conversion to HCFG, if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means effectively going back to the orignial verison, which enforced the specific order? The current version moves the canonicalization of order outside the CFG construction, and leaves the plain CFG a 1-1 mapping of the incomig CFG, with both phi recipes and VPBB predecessor order matching the order in the original CFG.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// dominates the latch and the preheader dominates the block. If it is a
/// header block, returns a pair with the corresponding preheader and latch
/// blocks. Otherwise return std::nullopt.
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>>
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) {
/// dominates the latch and the preheader dominates the block. If it is a
/// header block return true, making sure the preheader appears first and
/// the latch second. Otherwise return false.
static bool canonicalHeader(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

ArrayRef<VPBlockBase *> Preds = HeaderVPB->getPredecessors();
if (Preds.size() != 2)
return std::nullopt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::nullopt;
return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks


VPBasicBlock *OriginalLatch =
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
assert(OriginalLatch->getNumSuccessors() == 0 &&
"Plan should end at top level latch");
auto *PreheaderVPBB = cast<VPBasicBlock>(Preds[0]);
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *PreheaderVPBB = cast<VPBasicBlock>(Preds[0]);
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1]);
auto *PreheaderVPBB = Preds[0];
auto *LatchVPBB = Preds[1];

or are the casts needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks

if (VPDT.dominates(PreheaderVPBB, HeaderVPB) &&
VPDT.dominates(HeaderVPB, LatchVPBB))
return {std::make_pair(PreheaderVPBB, LatchVPBB)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {std::make_pair(PreheaderVPBB, LatchVPBB)};
return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


std::swap(PreheaderVPBB, LatchVPBB);
if (VPDT.dominates(PreheaderVPBB, HeaderVPB) &&
VPDT.dominates(HeaderVPB, LatchVPBB))
return {std::make_pair(PreheaderVPBB, LatchVPBB)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPDT.dominates(HeaderVPB, LatchVPBB))
return {std::make_pair(PreheaderVPBB, LatchVPBB)};
VPDT.dominates(HeaderVPB, LatchVPBB)) {
// Canonicalize predecessors of header so that preheader is first and latch second.
HeaderVPB->swapPredecessors();
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis())
R.swapOperands();
return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!


return std::nullopt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return std::nullopt;
return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

}

/// Create a new VPRegionBlock if there is a loop starting at \p HeaderVPB.
static void createLoopRegion(VPlan &Plan, VPBlockBase *HeaderVPB,
VPDominatorTree &VPDT) {
auto Res = getPreheaderAndLatch(HeaderVPB, VPDT);
if (!Res)
return;

const auto &[PreheaderVPBB, LatchVPBB] = *Res;

// Swap the operands of header phis if needed. After creating the region, the
// incoming value from the preheader must be the first operand and the one
// from the latch must be the second operand.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consistency is between recipes and CFG, so better set it correctly before introducing regions/HCFG? (Specifically, between operands of phi recipes and predecessors, and between conditions/cases of branch/switch recipes and successors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is consistent in the plain CFG, and needs to be update here to match the new order of predecessors introduced here.

if (HeaderVPB->getPredecessors()[0] != PreheaderVPBB) {
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis()) {
VPValue *Inc0 = R.getOperand(0);
R.setOperand(0, R.getOperand(1));
R.setOperand(1, Inc0);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto Res = getPreheaderAndLatch(HeaderVPB, VPDT);
if (!Res)
return;
const auto &[PreheaderVPBB, LatchVPBB] = *Res;
// Swap the operands of header phis if needed. After creating the region, the
// incoming value from the preheader must be the first operand and the one
// from the latch must be the second operand.
if (HeaderVPB->getPredecessors()[0] != PreheaderVPBB) {
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis()) {
VPValue *Inc0 = R.getOperand(0);
R.setOperand(0, R.getOperand(1));
R.setOperand(1, Inc0);
}
}
auto *PreheaderVPBB = HeaderVPB->getPredecessors()[0];
auto *LatchVPBB = HeaderVPB->getPredecessors()[1];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

VPBlockUtils::disconnectBlocks(PreheaderVPBB, HeaderVPB);
VPBlockUtils::disconnectBlocks(LatchVPBB, HeaderVPB);
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor();
assert(LatchVPBB->getNumSuccessors() <= 1 && "Latch has more than one successor");
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor();

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

assert(LatchVPBB->getNumSuccessors() <= 1 &&
"Latch has more than one successor");
if (Succ)
VPBlockUtils::disconnectBlocks(LatchVPBB, Succ);

auto *R = Plan.createVPRegionBlock(HeaderVPB, LatchVPBB, "",
false /*isReplicator*/);
R->setParent(HeaderVPB->getParent());
// All VPBB's reachable shallowly from HeaderVPB belong to top level loop,
// because VPlan is expected to end at top level latch disconnected above.
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPB))
VPBB->setParent(R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about setting the parent of the newly introduced region itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!


VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);
if (Succ)
VPBlockUtils::connectBlocks(R, Succ);
}

void VPlanTransforms::createLoopRegions(VPlan &Plan, Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop) {
VPDominatorTree VPDT;
VPDT.recalculate(Plan);
for (VPBlockBase *HeaderVPB : vp_depth_first_shallow(Plan.getEntry()))
createLoopRegion(Plan, HeaderVPB, VPDT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
createLoopRegion(Plan, HeaderVPB, VPDT);
if (isHeader(HeaderVPB, VPDT))
createLoopRegion(Plan, HeaderVPB);

? Otherwise createLoopRegion() should more accurately be called tryToCreateLoopRegion()...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the name to tryToCreateLoopRegion, to avoid adding new arguments, thanks


VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
auto *OrigExiting = TopRegion->getExiting();
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, OrigExiting);
TopRegion->setExiting(LatchVPBB);
Comment on lines +93 to +96
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this additional latch block really needed, or can it be removed (independently)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unsed only for convenience, when adjusting reductions it is used to place selects there if needed. I'll look into remove it separartely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment, along with explaining that this section assigns distinct names to the (just created) Top Region and its latch block, which is introduced as a convenience.
Can also reset the name of Top Region's header to "vector.body" here, instead of setting it during plain CFG construction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

TopRegion->setName("vector loop");

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
Expand All @@ -47,18 +114,9 @@ void VPlanTransforms::introduceTopLevelVectorLoopRegion(
Plan.setTripCount(
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));

// Create VPRegionBlock, with existing header and new empty latch block, to be
// filled.
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
auto *TopRegion = Plan.createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop,
// because VPlan is expected to end at top level latch.
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);

VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
VPBlockUtils::insertBlockAfter(VecPreheader, Plan.getEntry());

VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);

Expand Down
Loading