Skip to content

[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. #94760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 11, 2024
Merged
100 changes: 54 additions & 46 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -607,10 +607,6 @@ class InnerLoopVectorizer {
BasicBlock *MiddleBlock, BasicBlock *VectorHeader,
VPlan &Plan, VPTransformState &State);

/// Create the phi node for the resume value of first order recurrences in the
/// scalar preheader and update the users in the scalar loop.
void fixFixedOrderRecurrence(VPLiveOut *LO, VPTransformState &State);

/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);
Expand Down Expand Up @@ -3294,19 +3290,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
if (EnableVPlanNativePath)
fixNonInductionPHIs(Plan, State);

// At this point every instruction in the original loop is widened to a
// vector form. Note that fixing reduction phis, as well as extracting the
// exit and resume values for fixed-order recurrences are already modeled in
// VPlan. All that remains to do here is to create a phi in the scalar
// pre-header for each fixed-order recurrence resume value.
// TODO: Also model creating phis in the scalar pre-header in VPlan.
for (const auto &[_, LO] : to_vector(Plan.getLiveOuts())) {
if (!Legal->isFixedOrderRecurrence(LO->getPhi()))
continue;
fixFixedOrderRecurrence(LO, State);
Plan.removeLiveOut(LO->getPhi());
Comment on lines -3298 to -3299
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more to remove here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, removed, thanks!

}

// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);
PSE.getSE()->forgetBlockAndLoopDispositions();
Expand Down Expand Up @@ -3343,8 +3326,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VectorLoop->getHeader(), Plan, State);
}

// Fix LCSSA phis not already fixed earlier. Extracts may need to be generated
// in the exit block, so update the builder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to retain the first sentence: // Fix live-out phis not already fixed earlier., (LCSSA holds only for exit block phis, now live-outs are fixed also for non LCSSA header phis in scalar header) and remove the following line

  State.Builder.SetInsertPoint(State.CFG.ExitBB,
                               State.CFG.ExitBB->getFirstNonPHIIt());

?
(also good to remove the subsequent loop sinking scalar operands, when ready :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

State.Builder.SetInsertPoint(State.CFG.ExitBB,
State.CFG.ExitBB->getFirstNonPHIIt());
for (const auto &KV : Plan.getLiveOuts())
Expand Down Expand Up @@ -3374,33 +3355,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VF.getKnownMinValue() * UF);
}

void InnerLoopVectorizer::fixFixedOrderRecurrence(VPLiveOut *LO,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (independent): is the second part of the following comment from above along with setting of insertion point still relevant?

  // Fix LCSSA phis not already fixed earlier. Extracts may need to be generated
  // in the exit block, so update the builder.
  State.Builder.SetInsertPoint(State.CFG.ExitBB,
                               State.CFG.ExitBB->getFirstNonPHIIt());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved setting the insert point to VPLiveOut based on the predecessor.

VPTransformState &State) {
// Extract the last vector element in the middle block. This will be the
// initial value for the recurrence when jumping to the scalar loop.
VPValue *VPExtract = LO->getOperand(0);
using namespace llvm::VPlanPatternMatch;
assert(match(VPExtract, m_VPInstruction<VPInstruction::ExtractFromEnd>(
m_VPValue(), m_VPValue())) &&
"FOR LiveOut expects to use an extract from end.");
Value *ResumeScalarFOR = State.get(VPExtract, UF - 1, true);

// Fix the initial value of the original recurrence in the scalar loop.
PHINode *ScalarHeaderPhi = LO->getPhi();
auto *InitScalarFOR =
ScalarHeaderPhi->getIncomingValueForBlock(LoopScalarPreHeader);
Builder.SetInsertPoint(LoopScalarPreHeader, LoopScalarPreHeader->begin());
auto *ScalarPreheaderPhi =
Builder.CreatePHI(ScalarHeaderPhi->getType(), 2, "scalar.recur.init");
for (auto *BB : predecessors(LoopScalarPreHeader)) {
auto *Incoming = BB == LoopMiddleBlock ? ResumeScalarFOR : InitScalarFOR;
ScalarPreheaderPhi->addIncoming(Incoming, BB);
}
ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader,
ScalarPreheaderPhi);
Comment on lines -3391 to -3392
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this incoming is set here, rather than by VPLiveIn's fixPhi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalarPreheaderPhi is being generated by ResumePhi instead, and is being set here as the incoming value of ScalarHeaderPhi - w/o any vector-to-scalar extraction - so can ResumePhi (in its current use) entail a vector-to-scalar extraction (as opposed to reusing it for feeding exit block phis)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, ResumePhi won't generate any extracts, as it will be fed by the extracted scalar resume value of the FOR. Possibly more accurately to mark ResumePhi as only having its first lane used instead of vector-to-scalar. (Done in latest version)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Would be good to simply indicate somehow that its dealing with a single scalar - coming in (from each pred) and going out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an isSingleScalar helper to handle this

ScalarHeaderPhi->setName("scalar.recur");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this setting of scalar header phi's name to "scalar.recur" retained?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No at the moment it is left as is. It would need to be done as post-processing, but adjusting the name of the phi in the scalar loop seems a bit inconsistent already (we don't rename any other phis or values in the scalar loop), so IMO removing it makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, better to remove it in a separate preparatory patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do ,thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 88e9c56

}

void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
// The basic block and loop containing the predicated instruction.
auto *PredBB = PredInst->getParent();
Expand Down Expand Up @@ -8522,6 +8476,58 @@ static void addUsersInExitBlock(VPBasicBlock *HeaderVPBB, Loop *OrigLoop,
}
}

/// Feed a resume value for every FOR from the vector loop to the scalar loop,
/// if middle block branches to scalar preheader, by introducing ExtractFromEnd
/// and ResumePhi recipes in each, respectively, and a VPLiveOut which uses the
/// latter and corresponds to the scalar header.
static void addLiveOutsForFirstOrderRecurrences(VPlan &Plan) {
VPRegionBlock *VectorRegion = Plan.getVectorLoopRegion();

// Start by finding out if middle block branches to scalar preheader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Start by finding out if middle block branches to scalar preheader.
// Start by finding out if middle block branches to scalar preheader, which is not a VPIRBasicBlock, unlike Exit block - the other possible successor of middle block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

// TODO: Should be replaced by
// Plan->getScalarLoopRegion()->getSinglePredecessor() in the future once the
// scalar region is modeled as well.
VPBasicBlock *ScalarPHVPBB = nullptr;
auto *MiddleVPBB = cast<VPBasicBlock>(VectorRegion->getSingleSuccessor());
for (VPBlockBase *Succ : MiddleVPBB->getSuccessors()) {
if (isa<VPIRBasicBlock>(Succ))
continue;
assert(!ScalarPHVPBB && "Two candidates for ScalarPHVPBB?");
ScalarPHVPBB = cast<VPBasicBlock>(Succ);
}
if (!ScalarPHVPBB)
return;

for (auto &H : VectorRegion->getEntryBasicBlock()->phis()) {
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto &H : VectorRegion->getEntryBasicBlock()->phis()) {
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H);
for (auto &HeaderPhi : VectorRegion->getEntryBasicBlock()->phis()) {
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&HeaderPhi);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, thanks!

if (!FOR)
continue;

VPBuilder B(ScalarPHVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBuilder B(ScalarPHVPBB);
VPBuilder ScalarPHBuilder(ScalarPHVPBB);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed thanks! Also hoisted out defining the builders.

VPBuilder MiddleBuilder(MiddleVPBB);
// Reset insert point so new recipes are inserted before terminator and
// condition, if there is either the former or both.
if (auto *Terminator = MiddleVPBB->getTerminator()) {
auto *Condition = dyn_cast<VPInstruction>(Terminator->getOperand(0));
assert((!Condition || Condition->getParent() == MiddleVPBB) &&
"Condition expected in MiddleVPBB");
MiddleBuilder.setInsertPoint(Condition ? Condition : Terminator);
}

// Extract the resume value and create a new VPLiveOut for it.
auto *Resume = MiddleBuilder.createNaryOp(
VPInstruction::ExtractFromEnd,
{FOR->getBackedgeValue(),
Plan.getOrAddLiveIn(
ConstantInt::get(Plan.getCanonicalIV()->getScalarType(), 1))},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be clearer to pre-generate a VPValue of 1 and reuse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

{}, "vector.recur.extract");
auto *ResumePhiRecipe =
B.createNaryOp(VPInstruction::ResumePhi, {Resume, FOR->getStartValue()},
{}, "scalar.recur.init");
Plan.addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), ResumePhiRecipe);
}
}

VPlanPtr
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {

Expand Down Expand Up @@ -8686,6 +8692,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();

addLiveOutsForFirstOrderRecurrences(*Plan);

// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
Expand Down
17 changes: 11 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -682,7 +682,10 @@ class VPBlockBase {
};

/// A value that is used outside the VPlan. The operand of the user needs to be
/// added to the associated LCSSA phi node.
/// added to the associated LCSSA phi node. The incoming block from VPlan is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// added to the associated LCSSA phi node. The incoming block from VPlan is
/// added to the associated phi node. The incoming block from VPlan is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

/// determined by where the VPValue is defined: if it is defined by a recipe
/// outside a region, its parent block is used, otherwise the middle block is
/// used.
class VPLiveOut : public VPUser {
PHINode *Phi;

Expand All @@ -694,11 +697,8 @@ class VPLiveOut : public VPUser {
return U->getVPUserID() == VPUser::VPUserID::LiveOut;
}

/// Fixup the wrapped LCSSA phi node in the unique exit block. This simply
/// means we need to add the appropriate incoming value from the middle
/// block as exiting edges from the scalar epilogue loop (if present) are
/// already in place, and we exit the vector loop exclusively to the middle
/// block.
/// Fixup the wrapped LCSSA phi node. This means we need to add the
/// appropriate incoming value from the precessor Pred.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Fixup the wrapped LCSSA phi node. This means we need to add the
/// appropriate incoming value from the precessor Pred.
/// Fix the wrapped phi node. This means adding an incoming value to exit block phi's from the vector loop via middle block (values from scalar loop already reach these phi's), and updating the value to scalar header phi's from the scalar preheader.

Note that a phi (ResumePhi?) could be added inside middle block for the former case, thereby having VPlan maintain LCSSA form, and ensuring parent of incoming value is always the desired predecessor. Can be done as a follow-up (or preparatory?) patch if preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording.

Note that a phi (ResumePhi?) could be added inside middle block for the former case, thereby having VPlan maintain LCSSA form, and ensuring parent of incoming value is always the desired predecessor. Can be done as a follow-up (or preparatory?) patch if preferred.

Would be good as follow-up

void fixPhi(VPlan &Plan, VPTransformState &State);

/// Returns true if the VPLiveOut uses scalars of operand \p Op.
Expand Down Expand Up @@ -1191,6 +1191,11 @@ class VPInstruction : public VPRecipeWithIRFlags {
SLPStore,
ActiveLaneMask,
ExplicitVectorLength,
/// Creates a scalar phi in a leaf VPBB with a single predecessor in VPlan.
/// The first operand is the incoming value from the predecessor in VPlan,
/// the second operand is the incoming value for all other predecessors
/// (which are currently not modeled in VPlan).
ResumePhi,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for renaming, worth updating the title and summary of the patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title + description should be updated now

CalculateTripCountMinusVF,
// Increment the canonical IV separately for each unrolled part.
CanonicalIVIncrementForPart,
Expand Down
41 changes: 37 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -193,11 +193,18 @@ void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
auto Lane = vputils::isUniformAfterVectorization(ExitValue)
? VPLane::getFirstLane()
: VPLane::getLastLaneForVF(State.VF);
VPBasicBlock *MiddleVPBB =
VPBasicBlock *PredVPBB =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBasicBlock *PredVPBB =
VPBasicBlock *MiddleVPBB =

cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor());
BasicBlock *MiddleBB = State.CFG.VPBB2IRBB[MiddleVPBB];
Phi->addIncoming(State.get(ExitValue, VPIteration(State.UF - 1, Lane)),
MiddleBB);
VPRecipeBase *DefRecipe = ExitValue->getDefiningRecipe();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPRecipeBase *DefRecipe = ExitValue->getDefiningRecipe();
auto *ExitVPBB = ExitValue->getDefiningRecipe() ? ExitValue->getDefiningRecipe()->getParent() : nullptr;

can ExitValue be a live in, or can we assert it has a defining recipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a live in

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, perhaps a live-in feeding a live-out can be useful, e.g., to indicate if the loop was entered(?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but I think this mostly boils down to not having to unnecessarily restrict the inputs.

if (DefRecipe && !DefRecipe->getParent()->getParent())
PredVPBB = DefRecipe->getParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Values leaving the vector loop reach live out phi's in the exit block via middle block.

auto *PredVPBB = !ExitVPBB || ExitVPBB-> getEnclosingLoopRegion() ? MiddleVPBB : ExitVPBB;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks (with Exit -> Exiting)

BasicBlock *PredBB = State.CFG.VPBB2IRBB[PredVPBB];
State.Builder.SetInsertPoint(PredBB, PredBB->getFirstNonPHIIt());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment that insertion point is set in PrefBB for potential generation of instructions to extract Lane before feeding it to Phi?

Is this potential post-extract why ResumePhi is considered (maybe!) vector-to-scalar? These extracts best materialize into recipes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment + TODO, thanks!

Is this potential post-extract why ResumePhi is considered (maybe!) vector-to-scalar? These extracts best materialize into recipes.

Yes, we need to select the last lane, which may require creating a new extract, if the VPValue is a vector. Added a TODO to model extracts explicitly.

Value *V = State.get(ExitValue, VPIteration(State.UF - 1, Lane));
if (Phi->getBasicBlockIndex(PredBB) != -1)
Phi->setIncomingValueForBlock(PredBB, V);
else
Phi->addIncoming(V, PredBB);
Comment on lines +210 to +214
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we need to figure out which predecessor P corresponds to V, and whether Phi already knows about P and should (re)set its incoming for it, or a new incoming should be added for it.

For live outs that correspond to exit block, P is middle block (which can be retrieved directly from Plan as done now, or from State.CFG.ExitBB, or from State.builder which is (still?) set to ExitBB).
For live outs that correspond to scalar header, P is scalar preheader.

Can we check ExitValue's parent to figure out P, in both cases, asserting that its corresponding IRBB is a predecessor of Phi's block?

Phi's of exit block know not of middle block, as exit block (and its phi's) is originally connected to scalar loop only.
Phis of scalar header do know of scalar preheader, as the latter was split from the former during skeleton creation?

fixPhi()'s documentation above needs update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the defining recipe's parent, if it is outside a region, otherwise use MiddleVPBB

}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
Expand Down Expand Up @@ -303,6 +310,7 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::PtrAdd:
case VPInstruction::ExplicitVectorLength:
case VPInstruction::ResumePhi:
return true;
default:
return false;
Expand Down Expand Up @@ -593,13 +601,35 @@ Value *VPInstruction::generatePerPart(VPTransformState &State, unsigned Part) {
Value *Addend = State.get(getOperand(1), Part, /* IsScalar */ true);
return Builder.CreatePtrAdd(Ptr, Addend, Name);
}
case VPInstruction::ResumePhi: {
if (Part != 0)
return State.get(this, 0, /*IsScalar*/ true);
Value *IncomingFromVPlanPred =
State.get(getOperand(0), Part, /* IsScalar */ true);
Value *IncomingFromOtherPreds =
State.get(getOperand(1), Part, /* IsScalar */ true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it conceivable for the scalar preheader to have a single predecessor, namely, VPlanPred/middle-block? I.e., when trip count is known to be larger that VFxUF and no (other) runtime checks are needed to bypass the vector loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be possible, but it should be handled correctly (generating a phi with a single incoming value from the VPlan predecessor)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the second operand (FOR->getStartValue()) is redundant in this case, but exists - so getOperand(1) works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

auto *NewPhi =
Builder.CreatePHI(IncomingFromOtherPreds->getType(), 2, Name);
BasicBlock *VPlanPred =
State.CFG
.VPBB2IRBB[cast<VPBasicBlock>(getParent()->getSinglePredecessor())];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that since middle block is a VPIRBasicBlock, its corresponding IRBB can be obtained directly via getIRBasicBlock().

NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred);
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) {
assert(OtherPred != VPlanPred &&
"VPlan predecessors should not be connected yet");
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred);
}
Comment on lines +665 to +670
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred);
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) {
assert(OtherPred != VPlanPred &&
"VPlan predecessors should not be connected yet");
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred);
}
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) {
assert(OtherPred != VPlanPred &&
"VPlan predecessors should not be connected yet");
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred);
}
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred);

a bit more logical to first connect all existing predecessors asserting to not be VPlanPred, and then connect the latter(?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but this would require updating a bunch of tests due to different incoming order. Better to be done as follow up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a look at this and this would mean we have different incoming orders for the induction resume values and FOR resume values until we moved induction resume values to also use ResumePhi. WDYT about only adjusting the order once both use ResumePhi?

return NewPhi;
}

default:
llvm_unreachable("Unsupported opcode for instruction");
}
}

bool VPInstruction::isVectorToScalar() const {
return getOpcode() == VPInstruction::ExtractFromEnd ||
getOpcode() == VPInstruction::ResumePhi ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, superseded by isSingleScalar.

getOpcode() == VPInstruction::ComputeReductionResult;
}

Expand Down Expand Up @@ -729,6 +759,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::ActiveLaneMask:
O << "active lane mask";
break;
case VPInstruction::ResumePhi:
O << "exit-phi";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
O << "exit-phi";
O << "resume-phi";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

break;
case VPInstruction::ExplicitVectorLength:
O << "EXPLICIT-VECTOR-LENGTH";
break;
Expand Down
12 changes: 3 additions & 9 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -937,22 +937,16 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
Type *IntTy = Plan.getCanonicalIV()->getScalarType();

// Extract the penultimate value of the recurrence and update VPLiveOut
// users of the recurrence splice.
// users of the recurrence splice. Note that the extract of the final value
// used to resume in the scalar loop is created earlier during VPlan
// construction.
auto *Penultimate = cast<VPInstruction>(MiddleBuilder.createNaryOp(
VPInstruction::ExtractFromEnd,
{FOR->getBackedgeValue(),
Plan.getOrAddLiveIn(ConstantInt::get(IntTy, 2))},
{}, "vector.recur.extract.for.phi"));
RecurSplice->replaceUsesWithIf(
Penultimate, [](VPUser &U, unsigned) { return isa<VPLiveOut>(&U); });

// Extract the resume value and create a new VPLiveOut for it.
auto *Resume = MiddleBuilder.createNaryOp(
VPInstruction::ExtractFromEnd,
{FOR->getBackedgeValue(),
Plan.getOrAddLiveIn(ConstantInt::get(IntTy, 1))},
{}, "vector.recur.extract");
Plan.addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), Resume);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this live-out corresponds to the scalar header and is fed a value (Resume) created in the middle block, i.e., w/o going through the scalar preheader. It is currently wired explicitly, rather than via VPLiveOut::fixPhi(), which as documented deals only with wiring directly to middle block live-outs that correspond to the exit block.

Worth leaving behind a comment that the s_resume value is created in the middle block and fed into the scalar header via the scalar preheader, at VPlan construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the comment above, thanks!

}
return true;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ define i32 @test_phi_iterator_invalidation(ptr %A, ptr noalias %B) {
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP24]], i32 3
; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 1004, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 1004, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ define void @firstorderrec(ptr nocapture noundef readonly %x, ptr noalias nocapt
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
Expand Down Expand Up @@ -160,10 +160,10 @@ define void @thirdorderrec(ptr nocapture noundef readonly %x, ptr noalias nocapt
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[SCALAR_RECUR_INIT10:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT9]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT7:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT6]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE44]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE45]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 3, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE45]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT7:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT6]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE44]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT10:%.*]] = phi i8 [ [[VECTOR_RECUR_EXTRACT9]], [[MIDDLE_BLOCK]] ], [ [[DOTPRE]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
Expand Down
12 changes: 6 additions & 6 deletions llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
Original file line number Diff line number Diff line change
Expand Up @@ -124,13 +124,13 @@ define i64 @pointer_induction_only(ptr %start, ptr %end) {
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY:%.*]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi ptr [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down Expand Up @@ -187,13 +187,13 @@ define i64 @int_and_pointer_iv(ptr %start, i32 %N) {
; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i64> [[TMP5]], i32 2
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i64> [[TMP5]], i32 3
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i64> [[TMP5]], i32 2
; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[START]], [[ENTRY]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down
Loading
Loading