Skip to content

[VPlan] Simplify branch on False in VPlan transform (NFC). #140409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 18 additions & 24 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2379,13 +2379,15 @@ void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) {
PreVectorPH->swapSuccessors();

// We just connected a new block to the scalar preheader. Update all
// ResumePhis by adding an incoming value for it, replicating the last value.
for (VPRecipeBase &R : *cast<VPBasicBlock>(ScalarPH)) {
auto *ResumePhi = dyn_cast<VPInstruction>(&R);
if (!ResumePhi || ResumePhi->getOpcode() != VPInstruction::ResumePhi)
continue;
// VPPhis by adding an incoming value for it, replicating the last value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: take the number of ScalarPH predecessors once, here, reuse it repeatedly inside the loop, asserting it is one more than the number of ResumePhi's operands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

unsigned NumPredecessors = ScalarPH->getNumPredecessors();
(void)NumPredecessors;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(void)NumPredecessors;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed thanks

for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
auto *ResumePhi = cast<VPPhi>(&R);
ResumePhi->addOperand(
ResumePhi->getOperand(ResumePhi->getNumOperands() - 1));
assert(ResumePhi->getNumIncoming() == NumPredecessors &&
"must have incoming values for all operands");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(void)NumPredecessors;
for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
auto *ResumePhi = cast<VPPhi>(&R);
ResumePhi->addOperand(
ResumePhi->getOperand(ResumePhi->getNumOperands() - 1));
assert(ResumePhi->getNumIncoming() == NumPredecessors &&
"must have incoming values for all operands");
for (VPRecipeBase &R : cast<VPBasicBlock>(ScalarPH)->phis()) {
assert(isa<VPPhi>(&R) && "Phi expected to be VPPhi");
assert(cast<VPPhi>(&R)->getNumIncoming() == NumPredecessors - 1 &&
"must have incoming values for all operands");
R->addOperand(R->getOperand(NumPredecessors - 1));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

}
}

Expand Down Expand Up @@ -2533,7 +2535,8 @@ BasicBlock *InnerLoopVectorizer::emitMemRuntimeChecks(BasicBlock *Bypass) {
static void replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, BasicBlock *IRBB) {
VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB);
for (auto &R : make_early_inc_range(*VPBB)) {
assert(!R.isPhi() && "Tried to move phi recipe to end of block");
assert((IRVPBB->empty() || IRVPBB->back().isPhi() || !R.isPhi()) &&
"Tried to move phi recipe after a non-phi recipe");
R.moveBefore(*IRVPBB, IRVPBB->end());
}

Expand Down Expand Up @@ -7535,10 +7538,7 @@ static void fixReductionScalarResumeWhenVectorizingEpilog(
// created a bc.merge.rdx Phi after the main vector body. Ensure that we carry
// over the incoming values correctly.
using namespace VPlanPatternMatch;
auto IsResumePhi = [](VPUser *U) {
auto *VPI = dyn_cast<VPInstruction>(U);
return VPI && VPI->getOpcode() == VPInstruction::ResumePhi;
};
auto IsResumePhi = [](VPUser *U) { return isa<VPPhi>(U); };
assert(count_if(EpiRedResult->users(), IsResumePhi) == 1 &&
"ResumePhi must have a single user");
auto *EpiResumePhiVPI =
Expand Down Expand Up @@ -8723,9 +8723,8 @@ static VPInstruction *addResumePhiRecipeForInduction(
WideIV->getDebugLoc());
}

auto *ResumePhiRecipe =
ScalarPHBuilder.createNaryOp(VPInstruction::ResumePhi, {EndValue, Start},
WideIV->getDebugLoc(), "bc.resume.val");
auto *ResumePhiRecipe = ScalarPHBuilder.createScalarPhi(
{EndValue, Start}, WideIV->getDebugLoc(), "bc.resume.val");
return ResumePhiRecipe;
}

Expand Down Expand Up @@ -8754,8 +8753,7 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan,
if (VPInstruction *ResumePhi = addResumePhiRecipeForInduction(
WideIVR, VectorPHBuilder, ScalarPHBuilder, TypeInfo,
&Plan.getVectorTripCount())) {
assert(ResumePhi->getOpcode() == VPInstruction::ResumePhi &&
"Expected a ResumePhi");
assert(isa<VPPhi>(ResumePhi) && "Expected a phi");
IVEndValues[WideIVR] = ResumePhi->getOperand(0);
ScalarPhiIRI->addOperand(ResumePhi);
continue;
Expand All @@ -8780,8 +8778,7 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan,
VPInstruction::ExtractLastElement, {ResumeFromVectorLoop}, {},
"vector.recur.extract");
StringRef Name = IsFOR ? "scalar.recur.init" : "bc.merge.rdx";
auto *ResumePhiR = ScalarPHBuilder.createNaryOp(
VPInstruction::ResumePhi,
auto *ResumePhiR = ScalarPHBuilder.createScalarPhi(
{ResumeFromVectorLoop, VectorPhiR->getStartValue()}, {}, Name);
ScalarPhiIRI->addOperand(ResumePhiR);
}
Expand Down Expand Up @@ -9962,9 +9959,7 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
VPI->setOperand(1, Freeze);
if (UpdateResumePhis)
OrigStart->replaceUsesWithIf(Freeze, [Freeze](VPUser &U, unsigned) {
return Freeze != &U && isa<VPInstruction>(&U) &&
cast<VPInstruction>(&U)->getOpcode() ==
VPInstruction::ResumePhi;
return Freeze != &U && isa<VPPhi>(&U);
});
}
};
Expand All @@ -9977,13 +9972,12 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
// scalar (which will become vector) epilogue loop we are done. Otherwise
// create it below.
if (any_of(*MainScalarPH, [VectorTC](VPRecipeBase &R) {
return match(&R, m_VPInstruction<VPInstruction::ResumePhi>(
m_Specific(VectorTC), m_SpecificInt(0)));
return match(&R, m_VPInstruction<Instruction::PHI>(m_Specific(VectorTC),
m_SpecificInt(0)));
}))
return;
VPBuilder ScalarPHBuilder(MainScalarPH, MainScalarPH->begin());
ScalarPHBuilder.createNaryOp(
VPInstruction::ResumePhi,
ScalarPHBuilder.createScalarPhi(
{VectorTC, MainPlan.getCanonicalIV()->getStartValue()}, {},
"vec.epilog.resume.val");
}
Expand Down
15 changes: 9 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -887,11 +887,6 @@ class VPInstruction : public VPRecipeWithIRFlags,
SLPStore,
ActiveLaneMask,
ExplicitVectorLength,
/// Creates a scalar phi in a leaf VPBB with a single predecessor in VPlan.
/// The first operand is the incoming value from the predecessor in VPlan,
/// the second operand is the incoming value for all other predecessors
/// (which are currently not modeled in VPlan).
ResumePhi,
CalculateTripCountMinusVF,
// Increment the canonical IV separately for each unrolled part.
CanonicalIVIncrementForPart,
Expand Down Expand Up @@ -1127,6 +1122,10 @@ class VPPhiAccessors {
return getAsRecipe()->getNumOperands();
}

/// Removes the incoming value corresponding to \p IncomingBlock, which must
/// be a predecessor.
void removeIncomingValue(VPBlockBase *IncomingBlock) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void removeIncomingValue(VPBlockBase *IncomingBlock) const;
void removeIncomingValueFor(VPBlockBase *IncomingBlock) const;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
void printPhiOperands(raw_ostream &O, VPSlotTracker &SlotTracker) const;
Expand All @@ -1137,11 +1136,15 @@ struct VPPhi : public VPInstruction, public VPPhiAccessors {
VPPhi(ArrayRef<VPValue *> Operands, DebugLoc DL, const Twine &Name = "")
: VPInstruction(Instruction::PHI, Operands, DL, Name) {}

static inline bool classof(const VPRecipeBase *U) {
static inline bool classof(const VPUser *U) {
auto *R = dyn_cast<VPInstruction>(U);
return R && R->getOpcode() == Instruction::PHI;
}

VPPhi *clone() override {
return new VPPhi(operands(), getDebugLoc(), getName());
}

void execute(VPTransformState &State) override;

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
Expand Down
1 change: 0 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,6 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
return inferScalarType(R->getOperand(0));
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::Not:
case VPInstruction::ResumePhi:
case VPInstruction::CalculateTripCountMinusVF:
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::AnyOf:
Expand Down
83 changes: 40 additions & 43 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -501,8 +501,10 @@ void VPlanTransforms::prepareForVectorization(
cast<VPBasicBlock>(HeaderVPB),
cast<VPBasicBlock>(LatchVPB), Range);
HandledUncountableEarlyExit = true;
} else {
for (VPRecipeBase &R : EB->phis())
cast<VPIRPhi>(&R)->removeIncomingValue(Pred);
}

cast<VPBasicBlock>(Pred)->getTerminator()->eraseFromParent();
VPBlockUtils::disconnectBlocks(Pred, EB);
}
Expand All @@ -526,56 +528,51 @@ void VPlanTransforms::prepareForVectorization(
VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Remove the recipes
// from the exit blocks.
// 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 3) Otherwise, construct a runtime check.

if (!RequiresScalarEpilogueCheck) {
if (auto *LatchExitVPB = MiddleVPBB->getSingleSuccessor())
VPBlockUtils::disconnectBlocks(MiddleVPBB, LatchExitVPB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
VPBlockUtils::connectBlocks(Plan.getEntry(), ScalarPH);
Plan.getEntry()->swapSuccessors();

// The exit blocks are unreachable, remove their recipes to make sure no
// users remain that may pessimize transforms.
for (auto *EB : Plan.getExitBlocks()) {
for (VPRecipeBase &R : make_early_inc_range(*EB))
R.eraseFromParent();
}
return;
}

// The connection order corresponds to the operands of the conditional branch,
// with the middle block already connected to the exit block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above two lines should be discarded? Or updated to end with "if needed"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped by accident, restored, thanks

VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
// Also connect the entry block to the scalar preheader.
// TODO: Also introduce a branch recipe together with the minimum trip count
// check.
VPBlockUtils::connectBlocks(Plan.getEntry(), ScalarPH);
Plan.getEntry()->swapSuccessors();

auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
// If MiddleVPBB has a single successor the original loop exits via the latch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If MiddleVPBB has a single successor the original loop exits via the latch
// If MiddleVPBB has a single successor then the original loop does not exit via the latch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// and the single successor must be the scalar preheader. There's no need to
// add a runtime check to MiddleVPBB.
if (MiddleVPBB->getNumSuccessors() == 1) {
assert(MiddleVPBB->getSingleSuccessor() == ScalarPH &&
"must have ScalarPH as single successor");
return;
}

assert(MiddleVPBB->getNumSuccessors() == 2 && "must have 2 successors");

// If needed, add a check in the middle block to see if we have completed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If needed, add a check in the middle block to see if we have completed
// Add a check in the middle block to see if we have completed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// all of the iterations in the first vector loop.
// Three cases:
// 1) If we require a scalar epilogue, the scalar ph must execute. Set the
// condition to false.
// 2) If (N - N%VF) == N, then we *don't* need to run the
// remainder. Thus if tail is to be folded, we know we don't need to run
// the remainder and we can set the condition to true.
// 3) Otherwise, construct a runtime check.

// We use the same DebugLoc as the scalar loop latch terminator instead of
// the corresponding compare because they may have ended up with different
// line numbers and we want to avoid awkward line stepping while debugging.
// E.g., if the compare has got a line number inside the loop.
DebugLoc LatchDL = TheLoop->getLoopLatch()->getTerminator()->getDebugLoc();
VPBuilder Builder(MiddleVPBB);
VPValue *Cmp =
TailFolded
? Plan.getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
: Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
&Plan.getVectorTripCount(),
ScalarLatchTerm->getDebugLoc(), "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
ScalarLatchTerm->getDebugLoc());
VPValue *Cmp;
if (!RequiresScalarEpilogueCheck)
Cmp = Plan.getOrAddLiveIn(ConstantInt::getFalse(
IntegerType::getInt1Ty(TripCount->getType()->getContext())));
else if (TailFolded)
Cmp = Plan.getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())));
else
Cmp = Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
&Plan.getVectorTripCount(), LatchDL, "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp}, LatchDL);
}

void VPlanTransforms::createLoopRegions(VPlan &Plan) {
Expand Down
59 changes: 34 additions & 25 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -733,17 +733,6 @@ Value *VPInstruction::generate(VPTransformState &State) {
Value *Addend = State.get(getOperand(1), VPLane(0));
return Builder.CreatePtrAdd(Ptr, Addend, Name, getGEPNoWrapFlags());
}
case VPInstruction::ResumePhi: {
auto *NewPhi =
Builder.CreatePHI(State.TypeAnalysis.inferScalarType(this), 2, Name);
for (const auto &[IncVPV, PredVPBB] :
zip(operands(), getParent()->getPredecessors())) {
Value *IncV = State.get(IncVPV, /* IsScalar */ true);
BasicBlock *PredBB = State.CFG.VPBB2IRBB.at(cast<VPBasicBlock>(PredVPBB));
NewPhi->addIncoming(IncV, PredBB);
}
return NewPhi;
}
case VPInstruction::AnyOf: {
Value *A = State.get(getOperand(0));
return Builder.CreateOrReduce(A);
Expand Down Expand Up @@ -847,8 +836,7 @@ bool VPInstruction::isVectorToScalar() const {
}

bool VPInstruction::isSingleScalar() const {
return getOpcode() == VPInstruction::ResumePhi ||
getOpcode() == Instruction::PHI;
return getOpcode() == Instruction::PHI;
}

void VPInstruction::execute(VPTransformState &State) {
Expand Down Expand Up @@ -934,7 +922,6 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::BranchOnCount:
case VPInstruction::BranchOnCond:
case VPInstruction::ResumePhi:
return true;
case VPInstruction::PtrAdd:
return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this);
Expand Down Expand Up @@ -991,9 +978,6 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::ActiveLaneMask:
O << "active lane mask";
break;
case VPInstruction::ResumePhi:
O << "resume-phi";
break;
case VPInstruction::ExplicitVectorLength:
O << "EXPLICIT-VECTOR-LENGTH";
break;
Expand Down Expand Up @@ -1101,21 +1085,36 @@ void VPInstructionWithType::print(raw_ostream &O, const Twine &Indent,

void VPPhi::execute(VPTransformState &State) {
State.setDebugLocFrom(getDebugLoc());
BasicBlock *VectorPH = State.CFG.VPBB2IRBB.at(getIncomingBlock(0));
Value *Start = State.get(getIncomingValue(0), VPLane(0));
PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, getName());
Phi->addIncoming(Start, VectorPH);
State.set(this, Phi, VPLane(0));
PHINode *NewPhi = State.Builder.CreatePHI(
State.TypeAnalysis.inferScalarType(this), 2, getName());
unsigned NumIncoming = getNumIncoming();
if (getParent() != getParent()->getPlan()->getScalarPreheader()) {
// TODO: Fixup all incoming values of header phis once recipes defining them
// are introduced.
NumIncoming = 1;
}
for (unsigned Idx = 0; Idx != NumIncoming; ++Idx) {
Value *IncV = State.get(getIncomingValue(Idx), VPLane(0));
BasicBlock *PredBB = State.CFG.VPBB2IRBB.at(getIncomingBlock(Idx));
NewPhi->addIncoming(IncV, PredBB);
}
State.set(this, NewPhi, VPLane(0));
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPPhi::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
O << Indent << "EMIT" << (isSingleScalar() ? "-SCALAR" : "") << " ";
printAsOperand(O, SlotTracker);
O << " = phi ";

printPhiOperands(O, SlotTracker);
const VPBasicBlock *Parent = getParent();
if (Parent == Parent->getPlan()->getScalarPreheader()) {
// TODO: Use regular printing for resume-phis as well
O << " = resume-phi ";
printOperands(O, SlotTracker);
} else {
O << " = phi ";
printPhiOperands(O, SlotTracker);
}
}
#endif

Expand Down Expand Up @@ -1186,6 +1185,16 @@ void VPIRPhi::execute(VPTransformState &State) {
State.Builder.SetInsertPoint(Phi->getParent(), std::next(Phi->getIterator()));
}

void VPPhiAccessors::removeIncomingValue(VPBlockBase *IncomingBlock) const {
VPRecipeBase *R = const_cast<VPRecipeBase *>(getAsRecipe());
auto &Preds = R->getParent()->getPredecessors();
assert(R->getNumOperands() == Preds.size() &&
"Number of phi operands must match number of predecessors");
unsigned Position = std::distance(Preds.begin(), find(Preds, IncomingBlock));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this part be getIndexFor(IncomingBlock)? Which is an API of a basic block rather than a phi recipe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off to 861502304c9fe79c01e3382c52652e87ac03713e, as there are already existing users for this utility, thanks.

R->getOperand(Position)->removeUser(*R);
R->removeOperand(Position);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to keep these two changes atomic, taking care of both ends when removing an edge from def/use graph. Perhaps removeOperandAndUser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, moved the removal also to removeOperand. I kept the name, as removing the operand requires also removing the user. Also documented the behavior.

}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPPhiAccessors::printPhiOperands(raw_ostream &O,
VPSlotTracker &SlotTracker) const {
Expand Down
Loading
Loading