Skip to content

[VPlan] Implement interleaving as VPlan-to-VPlan transform. #95842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
0c3c293
[VPlan] Implement interleaving as VPlan-to-VPlan transform.
fhahn Jun 13, 2024
cba8b59
!fixup use pattern matching in a few more cases.
fhahn Jun 20, 2024
f3e47f5
!fixup rebase and fixup
fhahn Aug 12, 2024
4abc317
!fixup address latest comments, thanks!
fhahn Aug 14, 2024
9360440
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 4, 2024
cf7d783
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 5, 2024
41b7cc9
!fixup address comments, thanks
fhahn Sep 5, 2024
713eec1
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 5, 2024
ced94e8
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 5, 2024
6fd2416
!fixup remove empty vputils namespace from VPlan.h
fhahn Sep 5, 2024
548474c
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 10, 2024
ecdf378
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 12, 2024
4daee0a
!fixup address latest comments
fhahn Sep 15, 2024
23ac7f6
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 15, 2024
faf867c
!fixup address latest comments, thanks
fhahn Sep 15, 2024
26fc035
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 19, 2024
34595b8
!fixup fix build failure
fhahn Sep 19, 2024
e441720
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 19, 2024
a838eb4
!fixup address latest
fhahn Sep 19, 2024
1cd971c
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 19, 2024
2b0b1e9
!fixup add VPUnrollPartAccessor to access getUnrollPart[Operand]
fhahn Sep 19, 2024
470b374
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 19, 2024
7ff3b63
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 20, 2024
862121d
!fixup update to avoid unused variable.
fhahn Sep 20, 2024
bb7ddcf
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 20, 2024
1a5113c
!fixup update tests
fhahn Sep 20, 2024
6ce7bf8
!fixup address comments, thanks!
fhahn Sep 20, 2024
99bc59d
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 20, 2024
65150b5
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 21, 2024
d2073e5
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 21, 2024
2db5340
!fixup test updates
fhahn Sep 21, 2024
5de37ef
Merge remote-tracking branch 'origin/main' into vplan-unroll-explicit…
fhahn Sep 21, 2024
2e8535d
!fixup update comment
fhahn Sep 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ add_llvm_component_library(LLVMVectorize
VPlanRecipes.cpp
VPlanSLP.cpp
VPlanTransforms.cpp
VPlanUnroll.cpp
VPlanVerifier.cpp
VPlanUtils.cpp

Expand Down
15 changes: 15 additions & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,15 @@ class VPBuilder {
DebugLoc DL, const Twine &Name = "") {
return createInstruction(Opcode, Operands, DL, Name);
}
VPInstruction *createNaryOp(unsigned Opcode,
std::initializer_list<VPValue *> Operands,
std::optional<FastMathFlags> FMFs = {},
DebugLoc DL = {}, const Twine &Name = "") {
if (FMFs)
return tryInsertInstruction(
new VPInstruction(Opcode, Operands, *FMFs, DL, Name));
return createInstruction(Opcode, Operands, DL, Name);
}

VPInstruction *createOverflowingOp(unsigned Opcode,
std::initializer_list<VPValue *> Operands,
Expand All @@ -164,6 +173,7 @@ class VPBuilder {
return tryInsertInstruction(
new VPInstruction(Opcode, Operands, WrapFlags, DL, Name));
}

VPValue *createNot(VPValue *Operand, DebugLoc DL = {},
const Twine &Name = "") {
return createInstruction(VPInstruction::Not, {Operand}, DL, Name);
Expand Down Expand Up @@ -223,6 +233,11 @@ class VPBuilder {
return tryInsertInstruction(new VPScalarCastRecipe(Opcode, Op, ResultTy));
}

VPWidenCastRecipe *createWidenCast(Instruction::CastOps Opcode, VPValue *Op,
Type *ResultTy) {
return tryInsertInstruction(new VPWidenCastRecipe(Opcode, Op, ResultTy));
}

VPScalarIVStepsRecipe *
createScalarIVSteps(Instruction::BinaryOps InductionOpcode,
FPMathOperator *FPBinOp, VPValue *IV, VPValue *Step) {
Expand Down
6 changes: 5 additions & 1 deletion llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7507,6 +7507,10 @@ LoopVectorizationPlanner::executePlan(
"expanded SCEVs to reuse can only be used during epilogue vectorization");
(void)IsEpilogueVectorization;

// TODO: Move to VPlan transform stage once the transition to the VPlan-based
// cost model is complete for better cost estimates.
VPlanTransforms::unrollByUF(BestVPlan, BestUF,
OrigLoop->getHeader()->getModule()->getContext());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);

LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << BestVF
Expand Down Expand Up @@ -7625,7 +7629,7 @@ LoopVectorizationPlanner::executePlan(
if (MiddleTerm->isConditional() &&
hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
// Assume that `Count % VectorTripCount` is equally distributed.
unsigned TripCount = State.UF * State.VF.getKnownMinValue();
unsigned TripCount = BestVPlan.getUF() * State.VF.getKnownMinValue();
assert(TripCount > 0 && "trip count should not be zero");
const uint32_t Weights[] = {1, TripCount - 1};
setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,7 @@ void VPTransformState::setDebugLocFrom(DebugLoc DL) {
->shouldEmitDebugInfoForProfiling() &&
!EnableFSDiscriminator) {
// FIXME: For scalable vectors, assume vscale=1.
unsigned UF = Plan->getUF();
auto NewDIL =
DIL->cloneByMultiplyingDuplicationFactor(UF * VF.getKnownMinValue());
if (NewDIL)
Expand Down Expand Up @@ -1018,6 +1019,10 @@ static void replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, BasicBlock *IRBB) {
/// Assumes a single pre-header basic-block was created for this. Introduce
/// additional basic-blocks as needed, and fill them all.
void VPlan::execute(VPTransformState *State) {
// Set UF to 1, as the unrollByUF VPlan transform already explicitly unrolled
// the VPlan.
// TODO: Remove State::UF and all uses.
State->UF = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related thought: is InnerLoopUnroller still employed, when VF=1;UF>1, now that unrolling is implemented earlier in VPlan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be removed as follow-up. It doesn't really serve anything even without this change, as it just offers a constructor that doesn't take a VF

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, InnerLoopUnroller used to override a few methods, but indeed seems ready to be retired.

// Initialize CFG state.
State->CFG.PrevVPBB = nullptr;
State->CFG.ExitBB = State->CFG.PrevBB->getSingleSuccessor();
Expand Down Expand Up @@ -1093,6 +1098,10 @@ void VPlan::execute(VPTransformState *State) {
// consistent placement of all induction updates.
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
Inc->moveBefore(VectorLatchBB->getTerminator()->getPrevNode());

// Use the steps for the last part as backedge value for the induction.
if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand(), 0));
continue;
}

Expand Down
84 changes: 76 additions & 8 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,7 @@ class VPBlockBase {
VPBlocksTy &getSuccessors() { return Successors; }

iterator_range<VPBlockBase **> successors() { return Successors; }
iterator_range<VPBlockBase **> predecessors() { return Predecessors; }

const VPBlocksTy &getPredecessors() const { return Predecessors; }
VPBlocksTy &getPredecessors() { return Predecessors; }
Expand Down Expand Up @@ -724,6 +725,11 @@ class VPLiveOut : public VPUser {

PHINode *getPhi() const { return Phi; }

/// Live-outs are marked as only using the first part during the transition
/// to unrolling directly on VPlan.
/// TODO: Remove after unroller transition.
bool onlyFirstPartUsed(const VPValue *Op) const override { return true; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A live-out user can only use the first part - given than only a single part is available now? I.e., answer depends on interleaving taking place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, mostly for the transition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!


#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the VPLiveOut to \p O.
void print(raw_ostream &O, VPSlotTracker &SlotTracker) const;
Expand Down Expand Up @@ -1226,11 +1232,24 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
#endif
};

/// Helper to access the operand that contains the unroll part for this recipe
/// after unrolling.
template <unsigned PartOpIdx> class VPUnrollPartAccessor {
protected:
/// Return the VPValue operand containing the unroll part or null if there is
/// no such operand.
VPValue *getUnrollPartOperand(VPUser &U) const;

/// Return the unroll part.
unsigned getUnrollPart(VPUser &U) const;
};

/// This is a concrete Recipe that models a single VPlan-level instruction.
/// While as any Recipe it may generate a sequence of IR instructions when
/// executed, these instructions would always form a single-def expression as
/// the VPInstruction is also a single def-use vertex.
class VPInstruction : public VPRecipeWithIRFlags {
class VPInstruction : public VPRecipeWithIRFlags,
public VPUnrollPartAccessor<1> {
friend class VPlanSlp;

public:
Expand Down Expand Up @@ -1764,7 +1783,8 @@ class VPWidenGEPRecipe : public VPRecipeWithIRFlags {
/// A recipe to compute the pointers for widened memory accesses of IndexTy for
/// all parts. If IsReverse is true, compute pointers for accessing the input in
/// reverse order per part.
class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
class VPVectorPointerRecipe : public VPRecipeWithIRFlags,
public VPUnrollPartAccessor<1> {
Type *IndexedTy;
bool IsReverse;

Expand All @@ -1789,7 +1809,7 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
bool onlyFirstPartUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");
assert(getNumOperands() == 1 && "must have a single operand");
assert(getNumOperands() <= 2 && "must have at most two operands");
return true;
}

Expand Down Expand Up @@ -1948,6 +1968,12 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
VPValue *getVFValue() { return getOperand(2); }
const VPValue *getVFValue() const { return getOperand(2); }

VPValue *getSplatVFValue() {
// If the recipe has been unrolled (4 operands), return the VPValue for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the recipe has been unrolled (4 operands), return the VPValue for the
// If the recipe has been unrolled (5 operands), return the VPValue for the

// induction increment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// induction increment.
// induction increment, otherwise return null.

return getNumOperands() == 5 ? getOperand(3) : nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return getNumOperands() == 5 ? getOperand(3) : nullptr;
return getNumOperands() >= 4 ? getOperand(3) : nullptr;

suffice to ensure there's a 4th operand?

}

/// Returns the first defined value as TruncInst, if it is one or nullptr
/// otherwise.
TruncInst *getTruncInst() { return Trunc; }
Expand All @@ -1967,9 +1993,17 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
Type *getScalarType() const {
return Trunc ? Trunc->getType() : IV->getType();
}

/// Returns the VPValue representing the value of this induction at
/// the last unrolled part, if it exists. Returns itself if unrolling did not
/// take place.
VPValue *getLastUnrolledPartOperand() {
return getNumOperands() == 5 ? getOperand(4) : this;
}
Comment on lines +1996 to +2002
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better place this accessor for operand 4 above, after getSplatVFValue(), completing the accessors for operands 1,2 and 3.

};

class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe {
class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe,
public VPUnrollPartAccessor<3> {
const InductionDescriptor &IndDesc;

bool IsScalarAfterVectorization;
Expand Down Expand Up @@ -2006,6 +2040,13 @@ class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe {
/// Returns the induction descriptor for the recipe.
const InductionDescriptor &getInductionDescriptor() const { return IndDesc; }

/// Returns the VPValue representing the value of this induction at
/// the first unrolled part, if it exists. Returns itself if unrolling did not
/// take place.
VPValue *getFirstUnrolledPartOperand() {
return getUnrollPart(*this) == 0 ? this : getOperand(2);
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,
Expand Down Expand Up @@ -2088,7 +2129,8 @@ struct VPFirstOrderRecurrencePHIRecipe : public VPHeaderPHIRecipe {
/// A recipe for handling reduction phis. The start value is the first operand
/// of the recipe and the incoming value from the backedge is the second
/// operand.
class VPReductionPHIRecipe : public VPHeaderPHIRecipe {
class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
public VPUnrollPartAccessor<2> {
/// Descriptor for the reduction.
const RecurrenceDescriptor &RdxDesc;

Expand Down Expand Up @@ -2907,7 +2949,10 @@ class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
~VPActiveLaneMaskPHIRecipe() override = default;

VPActiveLaneMaskPHIRecipe *clone() override {
return new VPActiveLaneMaskPHIRecipe(getOperand(0), getDebugLoc());
auto *R = new VPActiveLaneMaskPHIRecipe(getOperand(0), getDebugLoc());
if (getNumOperands() == 2)
R->addOperand(getOperand(1));
return R;
}

VP_CLASSOF_IMPL(VPDef::VPActiveLaneMaskPHISC)
Expand Down Expand Up @@ -2966,7 +3011,8 @@ class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
};

/// A Recipe for widening the canonical induction variable of the vector loop.
class VPWidenCanonicalIVRecipe : public VPSingleDefRecipe {
class VPWidenCanonicalIVRecipe : public VPSingleDefRecipe,
public VPUnrollPartAccessor<1> {
public:
VPWidenCanonicalIVRecipe(VPCanonicalIVPHIRecipe *CanonicalIV)
: VPSingleDefRecipe(VPDef::VPWidenCanonicalIVSC, {CanonicalIV}) {}
Expand Down Expand Up @@ -3052,7 +3098,8 @@ class VPDerivedIVRecipe : public VPSingleDefRecipe {

/// A recipe for handling phi nodes of integer and floating-point inductions,
/// producing their scalar values.
class VPScalarIVStepsRecipe : public VPRecipeWithIRFlags {
class VPScalarIVStepsRecipe : public VPRecipeWithIRFlags,
public VPUnrollPartAccessor<2> {
Instruction::BinaryOps InductionOpcode;

public:
Expand Down Expand Up @@ -3548,6 +3595,11 @@ class VPlan {

bool hasUF(unsigned UF) const { return UFs.empty() || UFs.contains(UF); }

unsigned getUF() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned getUF() const {
unsigned getUF() const {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

assert(UFs.size() == 1 && "Expected a single UF");
return UFs[0];
}

void setUF(unsigned UF) {
assert(hasUF(UF) && "Cannot set the UF not already in plan");
UFs.clear();
Expand Down Expand Up @@ -3732,6 +3784,22 @@ class VPBlockUtils {
connectBlocks(BlockPtr, NewBlock);
}

/// Insert disconnected block \p NewBlock before \p Blockptr. First
/// disconnects all predecessors of \p BlockPtr and connects them to \p
/// NewBlock. Add \p NewBlock as predecessor of \p BlockPtr and \p BlockPtr as
/// successor of \p NewBlock.
static void insertBlockBefore(VPBlockBase *NewBlock, VPBlockBase *BlockPtr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

assert(NewBlock->getSuccessors().empty() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, can insert after and swap the block contents, but these may be a series of recipes or a cfg of blocks...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as-is for now.

NewBlock->getPredecessors().empty() &&
"Can't insert new block with predecessors or successors.");
NewBlock->setParent(BlockPtr->getParent());
for (VPBlockBase *Pred : to_vector(BlockPtr->predecessors())) {
disconnectBlocks(Pred, BlockPtr);
connectBlocks(Pred, NewBlock);
}
connectBlocks(NewBlock, BlockPtr);
}

/// Insert disconnected VPBlockBases \p IfTrue and \p IfFalse after \p
/// BlockPtr. Add \p IfTrue and \p IfFalse as succesors of \p BlockPtr and \p
/// BlockPtr as predecessor of \p IfTrue and \p IfFalse. Propagate \p BlockPtr
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,10 @@ struct UnaryRecipe_match {
return DefR && match(DefR);
}

bool match(const VPSingleDefRecipe *R) {
return match(static_cast<const VPRecipeBase *>(R));
}

bool match(const VPRecipeBase *R) {
if (!detail::MatchRecipeAndOpcode<Opcode, RecipeTys...>::match(R))
return false;
Expand Down
Loading