Skip to content

[VPlan] Update scalar induction resume values in VPlan. #110577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
33b3aac
[VPlan] Update induction resume values in VPlan.
fhahn Sep 30, 2024
8040cdb
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 9, 2024
c98b6d3
!fixup simplify code, update remaining tests.
fhahn Nov 9, 2024
a14749c
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 9, 2024
d3728f4
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 10, 2024
ad1f578
!fixup only update InductionBypassValues if there's a bypass.
fhahn Nov 10, 2024
a11bca4
!fixup update comment.
fhahn Nov 10, 2024
56e82ef
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 16, 2024
e8d78a9
!fixup address latest comments, thanks!
fhahn Nov 16, 2024
c7a5b03
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 20, 2024
5d2eb8b
!fixup address latest comments, thanks!
fhahn Nov 20, 2024
9393eda
!fixup fix formatting
fhahn Nov 20, 2024
c54e8f2
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 24, 2024
d8717f9
!fixup fix build failure
fhahn Nov 24, 2024
3e5dbff
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 28, 2024
a02b278
!fixup update remaining tests
fhahn Nov 28, 2024
674d15b
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 1, 2024
6998270
!fixup address latest comments, thanks
fhahn Dec 1, 2024
55cd843
!fixup remove redundant phi
fhahn Dec 1, 2024
e292a3d
!fixup formatting, undo unnecessary test changes
fhahn Dec 1, 2024
93f3304
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 4, 2024
ce214f5
!fixup address latest comments, thanks!
fhahn Dec 4, 2024
b37e297
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 5, 2024
a1d2a13
!fixup address comments and reduce some test diffs
fhahn Dec 5, 2024
61e6d95
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 6, 2024
e96323f
!fixup adjust comments, pass IV phi filter.
fhahn Dec 6, 2024
c964dad
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 101 additions & 62 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -512,16 +512,18 @@ class InnerLoopVectorizer {
/// Fix the non-induction PHIs in \p Plan.
void fixNonInductionPHIs(VPTransformState &State);

/// Create a new phi node for the induction variable \p OrigPhi to resume
/// iteration count in the scalar epilogue, from where the vectorized loop
/// left off. \p Step is the SCEV-expanded induction step to use. In cases
/// where the loop skeleton is more complicated (i.e., epilogue vectorization)
/// and the resume values can come from an additional bypass block, the \p
/// AdditionalBypass pair provides information about the bypass block and the
/// end value on the edge from bypass to this loop.
PHINode *createInductionResumeValue(
PHINode *OrigPhi, const InductionDescriptor &ID, Value *Step,
ArrayRef<BasicBlock *> BypassBlocks,
/// Create a ResumePHI VPInstruction for the induction variable \p OrigPhi to
/// resume iteration count in the scalar epilogue from where the vectorized
/// loop left off, and add it to the scalar preheader of VPlan. \p Step is the
/// SCEV-expanded induction step to use. In cases where the loop skeleton is
/// more complicated (i.e., epilogue vectorization) and the resume values can
/// come from an additional bypass block, the \p AdditionalBypass pair
/// provides this additional bypass block along with the resume value coming
/// from it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that this function (still) also creates a (non VP)Value (or possibly two, via emitTransformedIndex()). W/o SCEV-expanding. Perhaps its avoidance of using SCEV expand should be revisited (independently)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks !

void createInductionResumeValue(
VPIRInstruction *PhiIRI, PHINode *OrigPhi, const InductionDescriptor &ID,
Value *Step, ArrayRef<BasicBlock *> BypassBlocks,
VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass = {nullptr, nullptr});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent): an optional pair with default None may look better than a pair with default nulls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


/// Returns the original loop trip count.
Expand All @@ -532,6 +534,11 @@ class InnerLoopVectorizer {
/// count of the original loop for both main loop and epilogue vectorization.
void setTripCount(Value *TC) { TripCount = TC; }

std::pair<BasicBlock *, Value *>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::pair<BasicBlock *, Value *>
/// Retrieve the bypass value and predecessor associated with an original induction header phi.
std::pair<BasicBlock *, Value *>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

getInductionBypassValue(PHINode *OrigPhi) const {
return InductionBypassValues.at(OrigPhi);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
getInductionBypassValue(PHINode *OrigPhi) const {
return InductionBypassValues.at(OrigPhi);
getInductionAdditionalBypass(PHINode *OrigPhi) const {
return Induction2AdditionalBypass.at(OrigPhi);

these are only the additional bypasses, and include both Value and its predecessor block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

}

protected:
friend class LoopVectorizationPlanner;

Expand Down Expand Up @@ -663,6 +670,11 @@ class InnerLoopVectorizer {
/// for cleaning the checks, if vectorization turns out unprofitable.
GeneratedRTChecks &RTChecks;

/// Mapping of induction phis to their bypass values and bypass blocks. They
/// need to be added to their phi nodes after the epilogue skeleton has been
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// need to be added to their phi nodes after the epilogue skeleton has been
/// need to be added as operands to phi nodes in the scalar loop preheader after the epilogue skeleton has been

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, than

/// created.
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionBypassValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionBypassValues;
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionAdditionalBypasses;

These are only the additional bypasses, and include their predecessors, i.e., not only values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks


VPlan &Plan;
};

Expand Down Expand Up @@ -2576,9 +2588,10 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
nullptr, Twine(Prefix) + "scalar.ph");
}

PHINode *InnerLoopVectorizer::createInductionResumeValue(
PHINode *OrigPhi, const InductionDescriptor &II, Value *Step,
ArrayRef<BasicBlock *> BypassBlocks,
void InnerLoopVectorizer::createInductionResumeValue(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth renaming createInductionResumeVPValue()?
This used to create only IR Values, in particular a PHINode in scalar preheader. Now it creates a ResumePhi recipe in scalar preheader instead, while still generating IR in vector preheader/bypass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

VPIRInstruction *PhiR, PHINode *OrigPhi, const InductionDescriptor &II,
Value *Step, ArrayRef<BasicBlock *> BypassBlocks,
VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPIRInstruction *PhiR, PHINode *OrigPhi, const InductionDescriptor &II,
Value *Step, ArrayRef<BasicBlock *> BypassBlocks,
VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass) {
VPIRInstruction *PhiR, const InductionDescriptor &II,
Value *Step, ArrayRef<BasicBlock *> BypassBlocks,
VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass) {
auto *OrigPhi = cast<PHINode>(&PhiR->getInstruction());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the PHINode argument has been removed in the latest version

Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
assert(VectorTripCount && "Expected valid arguments");
Expand Down Expand Up @@ -2611,27 +2624,21 @@ PHINode *InnerLoopVectorizer::createInductionResumeValue(
}
}

// Create phi nodes to merge from the backedge-taken check block.
PHINode *BCResumeVal =
PHINode::Create(OrigPhi->getType(), 3, "bc.resume.val",
LoopScalarPreHeader->getFirstNonPHIIt());
// Copy original phi DL over to the new one.
BCResumeVal->setDebugLoc(OrigPhi->getDebugLoc());

// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.
BCResumeVal->addIncoming(EndValue, LoopMiddleBlock);

// Fix the scalar body counter (PHI node).
// The old induction's phi node in the scalar body needs the truncated
// value.
for (BasicBlock *BB : BypassBlocks)
BCResumeVal->addIncoming(II.getStartValue(), BB);
auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
VPInstruction::ResumePhi,
{Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())},
OrigPhi->getDebugLoc(), "bc.resume.val");
assert(PhiR->getNumOperands() == 0 && "PhiR should not have any operands");
PhiR->addOperand(ResumePhiRecipe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two Phi Recipes, better rename PhiR say into InductionPhiIRI to distinguish it from ResumePhiRecipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks


if (AdditionalBypass.first)
BCResumeVal->setIncomingValueForBlock(AdditionalBypass.first,
EndValueFromAdditionalBypass);
return BCResumeVal;
if (AdditionalBypass.first) {
// Store the bypass values here, as they need to be added to their phi nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Store the bypass values here, as they need to be added to their phi nodes
// Store the bypass value here, as it needs to be added as operand to its scalar preheader phi node

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

// after the epilogue skeleton has been created.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value for "non-additional" bypasses also need to be added after creating epilog skeleton.

This is due to ResumePhi holding only two operands (one value coming from the vectorized loop and the other value bypassing it), whereas when vectorizing the epilog loop it should have three?
Perhaps allow ResumePhi to hold a third operand, sorted "upwards", according to https://llvm.org/docs/Vectorizers.html#epilogue-vectorization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but maybe better done as follow up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well, perhaps worth leaving a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

assert(!InductionBypassValues.contains(OrigPhi) &&
"entry for OrigPhi already exits");
InductionBypassValues[OrigPhi] = {AdditionalBypass.first,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting OrigPhi is not already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

EndValueFromAdditionalBypass};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment why the information is stored in InductionBypassValues to be handled later, rather than added as an operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks

}
}

/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
Expand All @@ -2655,19 +2662,24 @@ void InnerLoopVectorizer::createInductionResumeValues(
(!AdditionalBypass.first && !AdditionalBypass.second)) &&
"Inconsistent information about additional bypass.");
// We are going to resume the execution of the scalar loop.
// Go over all of the induction variables that we found and fix the
// PHIs that are left in the scalar version of the loop.
// The starting values of PHI nodes depend on the counter of the last
// iteration in the vectorized loop.
// If we come from a bypass edge then we need to start from the original
// Go over all of the induction variables in the scalar header and fix the
// PHIs that are left in the scalar version of the loop. The starting values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Go over all of the induction variables in the scalar header and fix the
// PHIs that are left in the scalar version of the loop. The starting values
// Go over all of the induction variable PHIs of the scalar loop header and fix their starting values, which depend on the counter of the last iteration of the vectorized loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

// of PHI nodes depend on the counter of the last iteration in the vectorized
// loop. If we come from a bypass edge then we need to start from the original
// start value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// of PHI nodes depend on the counter of the last iteration in the vectorized
// loop. If we come from a bypass edge then we need to start from the original
// start value.
// If we come from one of the LoopBypassBlocks then we need to start from the original
// start value. If we come from the AdditionalBypass then we need to start from its value.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

for (const auto &InductionEntry : Legal->getInductionVars()) {
PHINode *OrigPhi = InductionEntry.first;
const InductionDescriptor &II = InductionEntry.second;
PHINode *BCResumeVal = createInductionResumeValue(
OrigPhi, II, getExpandedStep(II, ExpandedSCEVs), LoopBypassBlocks,
AdditionalBypass);
OrigPhi->setIncomingValueForBlock(LoopScalarPreHeader, BCResumeVal);
VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to iterate over the IRI's of scalar loop header, looking for those wrapping inductions? Something like:

  for (VPRecipeBase &R : *Plan.getScalarHeader()) {
    auto *IRI = cast<VPIRInstruction>(&R);
    auto *Inst = dyn_cast<PHINode>(IRI->getInstruction());
    if (!Inst)
      break;
    if (!Legal->getInductionVars().contains(Inst))
      continue;
    const InductionDescriptor &II = Legal->getInductionVars()[Inst];
    createInductionResumeValue(IRI, II, getExpandedStep(II, ExpandedSCEVs),
                               LoopBypassBlocks, ScalarPHBuilder,
                               AdditionalBypass);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long-term roadmap, it may be better to update eveny ResumePhi of the scalar preheader when each bypass block is introduced, to keep this introduction atomic, keeping control-flow predecessors up-to-date with data-flow phi's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

for (VPRecipeBase &R : *Plan.getScalarHeader()) {
auto *PhiR = cast<VPIRInstruction>(&R);
auto *Phi = dyn_cast<PHINode>(&PhiR->getInstruction());
if (!Phi)
break;
if (!Legal->getInductionVars().contains(Phi))
continue;
const InductionDescriptor &II = Legal->getInductionVars().find(Phi)->second;
createInductionResumeValue(
PhiR, Phi, II, getExpandedStep(II, ExpandedSCEVs), LoopBypassBlocks,
ScalarPHBuilder, AdditionalBypass);
}
}

Expand Down Expand Up @@ -7708,13 +7720,21 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(

BestVPlan.execute(&State);

// 2.5 Collect reduction resume values.
auto *ExitVPBB = BestVPlan.getMiddleBlock();
if (VectorizingEpilogue)
// 2.5 When vectorizing the epilogue, fix reduction resume values and
// induction resume values from the bypass blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// induction resume values from the bypass blocks.
// induction resume values from the additional bypass block - after the main loop and bypassing the epilog loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

if (VectorizingEpilogue) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 2.5 When vectorizing the epilogue, fix reduction resume values and
// induction resume values from the bypass blocks.
if (VectorizingEpilogue) {
// 2.5 When vectorizing the epilogue, fix reduction and
// induction resume values from the additional bypass block.
if (VectorizingEpilogue) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

for (VPRecipeBase &R : *ExitVPBB) {
fixReductionScalarResumeWhenVectorizingEpilog(
&R, State, State.CFG.VPBB2IRBB[ExitVPBB]);
}
BasicBlock *PH = OrigLoop->getLoopPreheader();
for (const auto &[IVPhi, _] : Legal->getInductionVars()) {
auto *Inc = cast<PHINode>(IVPhi->getIncomingValueForBlock(PH));
const auto &[BB, V] = ILV.getInductionBypassValue(IVPhi);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth emphasizing that this is devoted to the additional bypass values only, e.g., rename getInductionAdditionalBypass()? - it returns not only the value but also its parent/predecessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated,t hanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note/assert that BB is the same for all IV's. Perhaps worth retrieving it once, say via get "AdditionalBypassBlock(), so that getInductionAdditionalBypass() can be getInductionAdditionalBypassValue, and also feeding it to fixReductionScalarResumeWhenVectorizingEpilog().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Inc->setIncomingValueForBlock(BB, V);
}
}

// 2.6. Maintain Loop Hints
// Keep all loop hints from the original loop on the vector loop (we'll
Expand Down Expand Up @@ -7803,10 +7823,37 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// Generate the induction variable.
EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);

// Skip induction resume value creation here because they will be created in
// the second pass for the scalar loop. The induction resume values for the
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
// Create induction resume values and ResumePhis for the inductions in the
// epilogue loop in the VPlan for the epilogue vector loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence is somewhat cumbersome.

Below EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() continues to call plural createInductionResumeValues(), for resume values of the scalar loop, with additional bypass.
Here EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() now calls singular createInductionResumeValue(), repeatedly, for resume values of the epilog loop, w/o additional bypass.

Can plural createInductionResumeValues() be called here as well, w/o additional bypass? It seems to be doing exactly what's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, at the cost of creating more bypass values than needed; in particular we only need resume values for widened inductions, other inductions resume from the canonical one which will get created separately.

I updated it for now, but there are some tests which need to be updated with the redundant phis (which previously got removed). Thinking about it now, we might have to create them all here, as an induction being widened in the main loop may not be widened in the epilogue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there still more bypass values created than needed, leading to test changes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Create induction resume values and ResumePhis for the inductions in the
// epilogue loop in the VPlan for the epilogue vector loop.
// Generate VPValues and ResumePhi recipes for inductions in the epilog loop to resume from the main loop or bypass it.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, tha

VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
for (VPRecipeBase &R :
Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we iterate over the IRI's of the scalar loop header instead, as suggested above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think we can here, as we don't have a mapping from IR values to VPValues (other than live-outs here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over the IRI's of scalar header instead of the integer/fp induction header phi recipes of vector header, relieves the need for such a mapping - to find PhiR which wraps IndPhi (by searching thru the recipes of scalar header for each IndPhi), as in plural createInductionResumeValues()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here originally tried to just create resume value for wide phis, and the scalar VPIRInstructions don't have a link to the induction phi recipes. But I replaced this now as per the suggestion above.

// Create induction resume values for both widened pointer and
// integer/fp inductions and update the start value of the induction
// recipes to use the resume value.
PHINode *IndPhi = nullptr;
const InductionDescriptor *ID;
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) {
IndPhi = cast<PHINode>(Ind->getUnderlyingValue());
ID = &Ind->getInductionDescriptor();
} else if (auto *WidenInd = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) {
IndPhi = WidenInd->getPHINode();
ID = &WidenInd->getInductionDescriptor();
} else {
continue;
}

auto *ScalarLoopHeader = Plan.getScalarHeader();
for (VPRecipeBase &R : *ScalarLoopHeader) {
auto *PhiR = cast<VPIRInstruction>(&R);
if (&PhiR->getInstruction() != IndPhi)
continue;
createInductionResumeValue(PhiR, IndPhi, *ID,
getExpandedStep(*ID, ExpandedSCEVs),
LoopBypassBlocks, ScalarPHBuilder);
}
}

return {LoopVectorPreHeader, nullptr};
}
Expand Down Expand Up @@ -10295,23 +10342,16 @@ bool LoopVectorizePass::processLoop(Loop *L) {
RdxDesc.getRecurrenceStartValue());
}
} else {
// Create induction resume values for both widened pointer and
// integer/fp inductions and update the start value of the induction
// recipes to use the resume value.
// Retrieve the induction resume values for wide inductions from
// their original phi nodes in the scalar loop.
PHINode *IndPhi = nullptr;
const InductionDescriptor *ID;
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) {
IndPhi = cast<PHINode>(Ind->getUnderlyingValue());
ID = &Ind->getInductionDescriptor();
} else {
auto *WidenInd = cast<VPWidenIntOrFpInductionRecipe>(&R);
IndPhi = WidenInd->getPHINode();
ID = &WidenInd->getInductionDescriptor();
}

ResumeV = MainILV.createInductionResumeValue(
IndPhi, *ID, getExpandedStep(*ID, ExpandedSCEVs),
{EPI.MainLoopIterationCountCheck});
ResumeV = IndPhi->getIncomingValueForBlock(L->getLoopPreheader());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an existing VPValue to resume from, is it not already used as the start value of header phi induction recipes of epilog loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but the code here sets the start value for the header recipes in the epilogue. We could possibly get it from the ResumePhis in the main vector loop VPlan, but the VPValues would be defined in a different plan?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Perhaps worth a note explaining the (missing) connection to ResumePhi recipes in main loop VPlan, which were executed and hooked up to the Phi nodes of the scalar loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like // Hook up to the PHINode generated by a ResumePhi recipe of main loop VPlan, which feeds the scalar loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

}
assert(ResumeV && "Must have a resume value");
VPValue *StartVal = BestEpiPlan.getOrAddLiveIn(ResumeV);
Expand All @@ -10323,7 +10363,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,
DT, true, &ExpandedSCEVs);
++LoopsEpilogueVectorized;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(independent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks

if (!MainILV.areSafetyChecksAdded())
DisableRuntimeUnroll = true;
} else {
Expand Down
5 changes: 4 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,10 @@ Value *VPInstruction::generate(VPTransformState &State) {
State.CFG
.VPBB2IRBB[cast<VPBasicBlock>(getParent()->getSinglePredecessor())];
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred);
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) {
// TODO: Predecessors are temporarily reversed to reduce test changes.
// Remove it and update remaining tests after functional change landed.
for (auto *OtherPred :
reverse(to_vector(predecessors(Builder.GetInsertBlock())))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better reverse the predecessors when they are set, rather than here during VPlan::execute()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to keep the number of test changes lower by trying to better match the order in the phis and should be dropped after landing this change as follow up. Added a TODO

assert(OtherPred != VPlanPred &&
"VPlan predecessors should not be connected yet");
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,10 +205,10 @@ define void @test_widen_induction_variable_start(ptr %A, i64 %N, i64 %start) {
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 4
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[START]], [[N_VEC]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[START]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i64> [[DOTSPLAT]], <i64 0, i64 1>
; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[START]], [[N_VEC]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down Expand Up @@ -433,7 +433,7 @@ define void @test_widen_extended_induction(ptr %dst) {
; CHECK: vec.epilog.middle.block:
; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; CHECK: vec.epilog.scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i8 [ 16, [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ 16, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[ITER_CHECK:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i8 [ 16, [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ 16, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i8 [ [[BC_RESUME_VAL1]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down
Loading
Loading