-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LV] Use frozen start value for FindLastIV if needed. #132691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix llvm#126836.
Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed.
FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. Alive2 showing the issue: https://alive2.llvm.org/ce/z/mB4Jtb
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesFindLastIV introduces multiple uses of the start value, where in the Each use of undef may produce a different result, so introducing If the start value may be undef or poison, freeze it and use the frozen Alive2 showing the issue: https://alive2.llvm.org/ce/z/mB4Jtb Depends on #132689 Fixes #126836 Patch is 48.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132691.diff 10 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 193f505fb03fe..416a0a70325d1 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -423,7 +423,7 @@ Value *createAnyOfReduction(IRBuilderBase &B, Value *Src,
/// Create a reduction of the given vector \p Src for a reduction of the
/// kind RecurKind::IFindLastIV or RecurKind::FFindLastIV. The reduction
/// operation is described by \p Desc.
-Value *createFindLastIVReduction(IRBuilderBase &B, Value *Src,
+Value *createFindLastIVReduction(IRBuilderBase &B, Value *Src, Value *Start,
const RecurrenceDescriptor &Desc);
/// Create an ordered reduction intrinsic using the given recurrence
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 2e7685254f512..f57d95e7722dc 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1233,11 +1233,11 @@ Value *llvm::createAnyOfReduction(IRBuilderBase &Builder, Value *Src,
}
Value *llvm::createFindLastIVReduction(IRBuilderBase &Builder, Value *Src,
+ Value *Start,
const RecurrenceDescriptor &Desc) {
assert(RecurrenceDescriptor::isFindLastIVRecurrenceKind(
Desc.getRecurrenceKind()) &&
"Unexpected reduction kind");
- Value *StartVal = Desc.getRecurrenceStartValue();
Value *Sentinel = Desc.getSentinelValue();
Value *MaxRdx = Src->getType()->isVectorTy()
? Builder.CreateIntMaxReduce(Src, true)
@@ -1246,7 +1246,7 @@ Value *llvm::createFindLastIVReduction(IRBuilderBase &Builder, Value *Src,
// reduction is sentinel value.
Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, MaxRdx, Sentinel, "rdx.select.cmp");
- return Builder.CreateSelect(Cmp, MaxRdx, StartVal, "rdx.select");
+ return Builder.CreateSelect(Cmp, MaxRdx, Start, "rdx.select");
}
Value *llvm::getReductionIdentity(Intrinsic::ID RdxID, Type *Ty,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 92160a421e59c..d5949e3d52918 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7612,7 +7612,8 @@ static void fixReductionScalarResumeWhenVectorizingEpilog(
BasicBlock *BypassBlock) {
auto *EpiRedResult = dyn_cast<VPInstruction>(R);
if (!EpiRedResult ||
- EpiRedResult->getOpcode() != VPInstruction::ComputeReductionResult)
+ (EpiRedResult->getOpcode() != VPInstruction::ComputeReductionResult &&
+ EpiRedResult->getOpcode() != VPInstruction::ComputeFindLastIVResult))
return;
auto *EpiRedHeaderPhi =
@@ -7633,14 +7634,17 @@ static void fixReductionScalarResumeWhenVectorizingEpilog(
} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
using namespace llvm::PatternMatch;
- Value *Cmp, *OrigResumeV;
+ Value *Cmp, *OrigResumeV, *CmpOp;
bool IsExpectedPattern =
match(MainResumeValue, m_Select(m_OneUse(m_Value(Cmp)),
m_Specific(RdxDesc.getSentinelValue()),
m_Value(OrigResumeV))) &&
- match(Cmp,
- m_SpecificICmp(ICmpInst::ICMP_EQ, m_Specific(OrigResumeV),
- m_Specific(RdxDesc.getRecurrenceStartValue())));
+ (match(Cmp, m_SpecificICmp(ICmpInst::ICMP_EQ, m_Specific(OrigResumeV),
+ m_Value(CmpOp))) &&
+ (match(CmpOp,
+ m_Freeze(m_Specific(RdxDesc.getRecurrenceStartValue()))) ||
+ (CmpOp == RdxDesc.getRecurrenceStartValue() &&
+ isGuaranteedNotToBeUndefOrPoison(CmpOp))));
assert(IsExpectedPattern && "Unexpected reduction resume pattern");
(void)IsExpectedPattern;
MainResumeValue = OrigResumeV;
@@ -9817,8 +9821,10 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
Builder.createSelect(Cond, OrigExitingVPV, PhiR, {}, "", FMFs);
OrigExitingVPV->replaceUsesWithIf(NewExitingVPV, [](VPUser &U, unsigned) {
return isa<VPInstruction>(&U) &&
- cast<VPInstruction>(&U)->getOpcode() ==
- VPInstruction::ComputeReductionResult;
+ (cast<VPInstruction>(&U)->getOpcode() ==
+ VPInstruction::ComputeReductionResult ||
+ cast<VPInstruction>(&U)->getOpcode() ==
+ VPInstruction::ComputeFindLastIVResult);
});
if (CM.usePredicatedReductionSelect(
PhiR->getRecurrenceDescriptor().getOpcode(), PhiTy))
@@ -9861,10 +9867,36 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// bc.merge.rdx phi nodes, hence it needs to be created unconditionally here
// even for in-loop reductions, until the reduction resume value handling is
// also modeled in VPlan.
+ VPInstruction *FinalReductionResult;
VPBuilder::InsertPointGuard Guard(Builder);
- Builder.setInsertPoint(MiddleVPBB, IP);
- auto *FinalReductionResult = Builder.createNaryOp(
- VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
+ if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(
+ RdxDesc.getRecurrenceKind())) {
+ VPValue *Start = PhiR->getStartValue();
+ if (!isGuaranteedNotToBeUndefOrPoison(
+ PhiR->getStartValue()->getLiveInIRValue())) {
+ Builder.setInsertPoint(cast<VPBasicBlock>(Plan->getEntry()));
+ Start = Builder.createNaryOp(Instruction::Freeze, {Start}, {}, "fr");
+ }
+ Builder.setInsertPoint(MiddleVPBB, IP);
+ FinalReductionResult =
+ Builder.createNaryOp(VPInstruction::ComputeFindLastIVResult,
+ {PhiR, Start, NewExitingVPV}, ExitDL);
+ // Update all users outside the vector region.
+ for (VPUser *U : to_vector(OrigExitingVPV->users())) {
+ auto *R = cast<VPRecipeBase>(U);
+ if (R->getParent() && R->getParent()->getParent())
+ continue;
+
+ for (unsigned Idx = 0; Idx != R->getNumOperands(); ++Idx) {
+ if (R->getOperand(Idx) == PhiR->getStartValue())
+ R->setOperand(Idx, Start);
+ }
+ }
+ } else {
+ Builder.setInsertPoint(MiddleVPBB, IP);
+ FinalReductionResult = Builder.createNaryOp(
+ VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
+ }
// Update all users outside the vector region.
OrigExitingVPV->replaceUsesWithIf(
FinalReductionResult, [FinalReductionResult](VPUser &User, unsigned) {
@@ -10353,24 +10385,7 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();
Header->setName("vec.epilog.vector.body");
- // Re-use the trip count and steps expanded for the main loop, as
- // skeleton creation needs it as a value that dominates both the scalar
- // and vector epilogue loops
- // TODO: This is a workaround needed for epilogue vectorization and it
- // should be removed once induction resume value creation is done
- // directly in VPlan.
- for (auto &R : make_early_inc_range(*Plan.getEntry())) {
- auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
- if (!ExpandR)
- continue;
- auto *ExpandedVal =
- Plan.getOrAddLiveIn(ExpandedSCEVs.find(ExpandR->getSCEV())->second);
- ExpandR->replaceAllUsesWith(ExpandedVal);
- if (Plan.getTripCount() == ExpandR)
- Plan.resetTripCount(ExpandedVal);
- ExpandR->eraseFromParent();
- }
-
+ DenseMap<Value *, Value *> ToFrozen;
// Ensure that the start values for all header phi recipes are updated before
// vectorizing the epilogue loop.
for (VPRecipeBase &R : Header->phis()) {
@@ -10436,6 +10451,10 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
ResumeV =
Builder.CreateICmpNE(ResumeV, RdxDesc.getRecurrenceStartValue());
} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) {
+ ToFrozen[RdxDesc.getRecurrenceStartValue()] =
+ cast<PHINode>(ResumeV)->getIncomingValueForBlock(
+ EPI.MainLoopIterationCountCheck);
+
// VPReductionPHIRecipe for FindLastIV reductions requires an adjustment
// to the resume value. The resume value is adjusted to the sentinel
// value when the final value from the main vector loop equals the start
@@ -10444,8 +10463,8 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
// variable.
BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
- Value *Cmp =
- Builder.CreateICmpEQ(ResumeV, RdxDesc.getRecurrenceStartValue());
+ Value *Cmp = Builder.CreateICmpEQ(
+ ResumeV, ToFrozen[RdxDesc.getRecurrenceStartValue()]);
ResumeV =
Builder.CreateSelect(Cmp, RdxDesc.getSentinelValue(), ResumeV);
}
@@ -10461,6 +10480,31 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
VPValue *StartVal = Plan.getOrAddLiveIn(ResumeV);
cast<VPHeaderPHIRecipe>(&R)->setStartValue(StartVal);
}
+
+ // Re-use the trip count and steps expanded for the main loop, as
+ // skeleton creation needs it as a value that dominates both the scalar
+ // and vector epilogue loops
+ // TODO: This is a workaround needed for epilogue vectorization and it
+ // should be removed once induction resume value creation is done
+ // directly in VPlan.
+ for (auto &R : make_early_inc_range(*Plan.getEntry())) {
+ auto *VPI = dyn_cast<VPInstruction>(&R);
+ if (VPI) {
+ VPI->replaceAllUsesWith(Plan.getOrAddLiveIn(
+ ToFrozen[VPI->getOperand(0)->getLiveInIRValue()]));
+ continue;
+ }
+
+ auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
+ if (!ExpandR)
+ continue;
+ auto *ExpandedVal =
+ Plan.getOrAddLiveIn(ExpandedSCEVs.find(ExpandR->getSCEV())->second);
+ ExpandR->replaceAllUsesWith(ExpandedVal);
+ if (Plan.getTripCount() == ExpandR)
+ Plan.resetTripCount(ExpandedVal);
+ ExpandR->eraseFromParent();
+ }
}
// Generate bypass values from the additional bypass block. Note that when the
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 3059b87ae63c8..64e7f2bddb668 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -866,6 +866,7 @@ class VPInstruction : public VPRecipeWithIRFlags,
BranchOnCount,
BranchOnCond,
Broadcast,
+ ComputeFindLastIVResult,
ComputeReductionResult,
// Takes the VPValue to extract from as first operand and the lane or part
// to extract as second operand, counting from the end starting with 1 for
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 38bec733dbf73..24a166bd336d1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -51,6 +51,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
switch (Opcode) {
case Instruction::ExtractElement:
+ case Instruction::Freeze:
return inferScalarType(R->getOperand(0));
case Instruction::Select: {
Type *ResTy = inferScalarType(R->getOperand(1));
@@ -66,6 +67,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
inferScalarType(R->getOperand(1)) &&
"different types inferred for different operands");
return IntegerType::get(Ctx, 1);
+ case VPInstruction::ComputeFindLastIVResult:
case VPInstruction::ComputeReductionResult: {
auto *PhiR = cast<VPReductionPHIRecipe>(R->getOperand(0));
auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 8c11d93734667..3a727866a2875 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -216,6 +216,16 @@ using BinaryVPInstruction_match =
BinaryRecipe_match<Op0_t, Op1_t, Opcode, /*Commutative*/ false,
VPInstruction>;
+template <typename Op0_t, typename Op1_t, typename Op2_t, unsigned Opcode,
+ bool Commutative, typename... RecipeTys>
+using TernaryRecipe_match = Recipe_match<std::tuple<Op0_t, Op1_t, Op2_t>,
+ Opcode, Commutative, RecipeTys...>;
+
+template <typename Op0_t, typename Op1_t, typename Op2_t, unsigned Opcode>
+using TernaryVPInstruction_match =
+ TernaryRecipe_match<Op0_t, Op1_t, Op2_t, Opcode, /*Commutative*/ false,
+ VPInstruction>;
+
template <typename Op0_t, typename Op1_t, unsigned Opcode,
bool Commutative = false>
using AllBinaryRecipe_match =
@@ -234,6 +244,13 @@ m_VPInstruction(const Op0_t &Op0, const Op1_t &Op1) {
return BinaryVPInstruction_match<Op0_t, Op1_t, Opcode>(Op0, Op1);
}
+template <unsigned Opcode, typename Op0_t, typename Op1_t, typename Op2_t>
+inline TernaryVPInstruction_match<Op0_t, Op1_t, Op2_t, Opcode>
+m_VPInstruction(const Op0_t &Op0, const Op1_t &Op1, const Op2_t &Op2) {
+ return TernaryVPInstruction_match<Op0_t, Op1_t, Op2_t, Opcode>(
+ {Op0, Op1, Op2});
+}
+
template <typename Op0_t>
inline UnaryVPInstruction_match<Op0_t, VPInstruction::Not>
m_Not(const Op0_t &Op0) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c7190b3187d94..f78ced71e1260 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -423,6 +423,7 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
if (isSingleScalar() || isVectorToScalar())
return true;
switch (Opcode) {
+ case Instruction::Freeze:
case Instruction::ICmp:
case Instruction::PHI:
case Instruction::Select:
@@ -474,6 +475,12 @@ Value *VPInstruction::generate(VPTransformState &State) {
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true);
return Builder.CreateExtractElement(Vec, Idx, Name);
}
+ case Instruction::Freeze: {
+ if (State.hasVectorValue(this))
+ return State.get(this);
+ Value *Op = State.get(getOperand(0), true);
+ return Builder.CreateFreeze(Op, Name);
+ }
case Instruction::ICmp: {
bool OnlyFirstLaneUsed = vputils::onlyFirstLaneUsed(this);
Value *A = State.get(getOperand(0), OnlyFirstLaneUsed);
@@ -614,6 +621,28 @@ Value *VPInstruction::generate(VPTransformState &State) {
return Builder.CreateVectorSplat(
State.VF, State.get(getOperand(0), /*IsScalar*/ true), "broadcast");
}
+ case VPInstruction::ComputeFindLastIVResult: {
+ // The recipe's operands are the reduction phi, followed by one operand for
+ // each part of the reduction.
+ unsigned UF = getNumOperands() - 2;
+ Value *ReducedPartRdx = State.get(getOperand(2));
+ for (unsigned Part = 1; Part < UF; ++Part) {
+ ReducedPartRdx = createMinMaxOp(Builder, RecurKind::SMax, ReducedPartRdx,
+ State.get(getOperand(2 + Part)));
+ }
+
+ // FIXME: The cross-recipe dependency on VPReductionPHIRecipe is temporary
+ // and will be removed by breaking up the recipe further.
+ auto *PhiR = cast<VPReductionPHIRecipe>(getOperand(0));
+ // Get its reduction variable descriptor.
+ const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+ RecurKind RK = RdxDesc.getRecurrenceKind();
+
+ assert(RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK));
+ assert(!PhiR->isInLoop());
+ return createFindLastIVReduction(Builder, ReducedPartRdx,
+ State.get(getOperand(1), true), RdxDesc);
+ }
case VPInstruction::ComputeReductionResult: {
// FIXME: The cross-recipe dependency on VPReductionPHIRecipe is temporary
// and will be removed by breaking up the recipe further.
@@ -623,6 +652,8 @@ Value *VPInstruction::generate(VPTransformState &State) {
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
RecurKind RK = RdxDesc.getRecurrenceKind();
+ assert(!RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK) &&
+ "should be handled by ComputeFindLastIVResult");
Type *PhiTy = OrigPhi->getType();
// The recipe's operands are the reduction phi, followed by one operand for
@@ -658,9 +689,6 @@ Value *VPInstruction::generate(VPTransformState &State) {
if (Op != Instruction::ICmp && Op != Instruction::FCmp)
ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
- else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
- ReducedPartRdx =
- createMinMaxOp(Builder, RecurKind::SMax, ReducedPartRdx, RdxPart);
else
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
}
@@ -669,8 +697,7 @@ Value *VPInstruction::generate(VPTransformState &State) {
// Create the reduction after the loop. Note that inloop reductions create
// the target reduction in the loop using a Reduction recipe.
if ((State.VF.isVector() ||
- RecurrenceDescriptor::isAnyOfRecurrenceKind(RK) ||
- RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) &&
+ RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
!PhiR->isInLoop()) {
// TODO: Support in-order reductions based on the recurrence descriptor.
// All ops in the reduction inherit fast-math-flags from the recurrence
@@ -681,9 +708,6 @@ Value *VPInstruction::generate(VPTransformState &State) {
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
ReducedPartRdx =
createAnyOfReduction(Builder, ReducedPartRdx, RdxDesc, OrigPhi);
- else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
- ReducedPartRdx =
- createFindLastIVReduction(Builder, ReducedPartRdx, RdxDesc);
else
ReducedPartRdx = createSimpleReduction(Builder, ReducedPartRdx, RK);
@@ -829,6 +853,7 @@ bool VPInstruction::isVectorToScalar() const {
return getOpcode() == VPInstruction::ExtractFromEnd ||
getOpcode() == Instruction::ExtractElement ||
getOpcode() == VPInstruction::FirstActiveLane ||
+ getOpcode() == VPInstruction::ComputeFindLastIVResult ||
getOpcode() == VPInstruction::ComputeReductionResult ||
getOpcode() == VPInstruction::AnyOf;
}
@@ -921,6 +946,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
case Instruction::ICmp:
case Instruction::Select:
case Instruction::Or:
+ case Instruction::Freeze:
// TODO: Cover additional opcodes.
return vputils::onlyFirstLaneUsed(this);
case VPInstruction::ActiveLaneMask:
@@ -933,6 +959,8 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
return true;
case VPInstruction::PtrAdd:
return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this);
+ case VPInstruction::ComputeFindLastIVResult:
+ return Op == getOperand(1);
};
llvm_unreachable("switch should return");
}
@@ -1011,6 +1039,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::ExtractFromEnd:
O << "extract-from-end";
break;
+ case VPInstruction::ComputeFindLastIVResult:
+ O << "compute-find-last-iv-result";
+ break;
case VPInstruction::ComputeReductionResult:
O << "compute-reduction-result";
break;
@@ -1571,7 +1602,6 @@ void VPWidenRecipe::execute(VPTransformState &State) {
}
case Instruction::Freeze: {
Value *Op = State.get(getOperand(0));
-
Value *Freeze = Builder.CreateFreeze(Op);
State.set(this, Freeze);
break;
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index a36c2aeb3da5c..a513a255344cc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -348,7 +348,9 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {
// the pa...
[truncated]
|
Makes sense to me. Although I think AnyOf reductions might have the same problem too? With define i32 @select_i32_from_icmp(ptr %v, i32 %a, i32 %b, i64 %n) {
entry:
br label %loop
loop: ; preds = %entry, %loop
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%rdx = phi i32 [ %a, %entry ], [ %sel, %loop ]
%gep.v.iv = getelementptr inbounds i32, ptr %v, i64 %iv
%load.v.iv = load i32, ptr %gep.v.iv, align 4
%cmp.load.iv.3 = icmp eq i32 %load.v.iv, 3
%sel = select i1 %cmp.load.iv.3, i32 %rdx, i32 %b
%iv.next = add nuw nsw i64 %iv, 1
%exit.cond = icmp eq i64 %iv.next, %n
br i1 %exit.cond, label %exit, label %loop
exit: ; preds = %loop
ret i32 %sel
} Becomes define i32 @select_i32_from_icmp(ptr %v, i32 %a, i32 %b, i64 %n) {
entry:
%min.iters.check = icmp ult i64 %n, 4
br i1 %min.iters.check, label %scalar.ph, label %vector.ph
vector.ph: ; preds = %entry
%n.mod.vf = urem i64 %n, 4
%n.vec = sub i64 %n, %n.mod.vf
br label %vector.body
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%vec.phi = phi <4 x i1> [ zeroinitializer, %vector.ph ], [ %5, %vector.body ]
%0 = add i64 %index, 0
%1 = getelementptr inbounds i32, ptr %v, i64 %0
%2 = getelementptr inbounds i32, ptr %1, i32 0
%wide.load = load <4 x i32>, ptr %2, align 4
%3 = icmp eq <4 x i32> %wide.load, splat (i32 3)
%4 = xor <4 x i1> %3, splat (i1 true)
%5 = or <4 x i1> %vec.phi, %4
%index.next = add nuw i64 %index, 4
%6 = icmp eq i64 %index.next, %n.vec
br i1 %6, label %middle.block, label %vector.body, !llvm.loop !0
middle.block: ; preds = %vector.body
%7 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %5)
%8 = freeze i1 %7
%rdx.select = select i1 %8, i32 %b, i32 %a
%cmp.n = icmp eq i64 %n, %n.vec
br i1 %cmp.n, label %exit, label %scalar.ph
scalar.ph: ; preds = %entry, %middle.block
%bc.resume.val = phi i64 [ %n.vec, %middle.block ], [ 0, %entry ]
%bc.merge.rdx = phi i32 [ %rdx.select, %middle.block ], [ %a, %entry ]
br label %loop
loop: ; preds = %scalar.ph, %loop
%iv = phi i64 [ %bc.resume.val, %scalar.ph ], [ %iv.next, %loop ]
%rdx = phi i32 [ %bc.merge.rdx, %scalar.ph ], [ %sel, %loop ]
%gep.v.iv = getelementptr inbounds i32, ptr %v, i64 %iv
%load.v.iv = load i32, ptr %gep.v.iv, align 4
%cmp.load.iv.3 = icmp eq i32 %load.v.iv, 3
%sel = select i1 %cmp.load.iv.3, i32 %rdx, i32 %b
%iv.next = add nuw nsw i64 %iv, 1
%exit.cond = icmp eq i64 %iv.next, %n
br i1 %exit.cond, label %exit, label %loop, !llvm.loop !3
exit: ; preds = %middle.block, %loop
%sel.lcssa = phi i32 [ %sel, %loop ], [ %rdx.select, %middle.block ]
ret i32 %sel.lcssa
} We now have two uses of EDIT: Alive2 seems to confirm the transformation is incorrect: https://alive2.llvm.org/ce/z/XL3Yrk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix @fhahn!
// Update all users outside the vector region. | ||
for (VPUser *U : to_vector(OrigExitingVPV->users())) { | ||
auto *R = cast<VPRecipeBase>(U); | ||
if (R->getParent() && R->getParent()->getParent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be based on the assumption that the parent's parent is a vector loop region. Is it worth asserting this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to check that the parent's region is the vector loop region, thanks
if (R->getParent() && R->getParent()->getParent()) | ||
continue; | ||
|
||
for (unsigned Idx = 0; Idx != R->getNumOperands(); ++Idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a commonly-used idiom, something we probably have an equivalent IR helper for. Is it worth adding a helper in VPUser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VPUser::replaceUsesOfWith in line with User::replaceUsesOfWith
@@ -10448,6 +10451,10 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L, | |||
ResumeV = | |||
Builder.CreateICmpNE(ResumeV, RdxDesc.getRecurrenceStartValue()); | |||
} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) { | |||
ToFrozen[RdxDesc.getRecurrenceStartValue()] = | |||
cast<PHINode>(ResumeV)->getIncomingValueForBlock( | |||
EPI.MainLoopIterationCountCheck); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confused me at first because there is no obvious indication that EPI.MainLoopIterationCountCheck
is a Block pointer. Doesn't need doing in this PR, but it would be good to rename MainLoopIterationCountCheck
to something like MainLoopIterationCountCheckBlock
.
@@ -10365,24 +10385,7 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L, | |||
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock(); | |||
Header->setName("vec.epilog.vector.body"); | |||
|
|||
// Re-use the trip count and steps expanded for the main loop, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in theory this code could be moved to the end of the function in a separate NFC patch, with this patch then adding the new bits:
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI) {
VPI->replaceAllUsesWith(Plan.getOrAddLiveIn(
ToFrozen[VPI->getOperand(0)->getLiveInIRValue()]));
continue;
}
However, I don't want to create unnecessary burden as I know this is an important fix. I'll leave it up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it included in the patch for now, could also split it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. 👍
@@ -474,6 +475,12 @@ Value *VPInstruction::generate(VPTransformState &State) { | |||
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true); | |||
return Builder.CreateExtractElement(Vec, Idx, Name); | |||
} | |||
case Instruction::Freeze: { | |||
if (State.hasVectorValue(this)) | |||
return State.get(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we still generate a freeze instruction if we have a vector value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check was left over from earlier version, removed, thanks
@@ -939,6 +946,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { | |||
case Instruction::ICmp: | |||
case Instruction::Select: | |||
case Instruction::Or: | |||
case Instruction::Freeze: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also add this opcode to VPInstruction::opcodeMayReadOrWriteFromMemory
and return false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help!
@@ -6,6 +6,7 @@ define i64 @select_icmp_nuw_nsw(ptr %a, ptr %b, i64 %ii, i64 %n) { | |||
; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[II:%.*]], i64 [[N:%.*]]) { | |||
; CHECK-NEXT: [[ENTRY:.*]]: | |||
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4 | |||
; CHECK-NEXT: [[FR:%.*]] = freeze i64 [[II]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we insert the freeze in vector preheader, or the scalar loop also need the frozen start value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked a bit more and I think the only issue is with epilogue vectorization; Here's a proof where we go via the main vector loop to the scalar remainder loop https://alive2.llvm.org/ce/z/yFSBrq and here where we go from the vector loop to the exit block https://alive2.llvm.org/ce/z/D6Ppzr.
When vectorizing the epilogue, freeze is needed for the start value used in the selects and the resume phi from the main vector loop:
- Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr
- Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v
- Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV
- Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN
The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it's because
[[TMP11:%.*]] = icmp eq i8 [[BC_MERGE_RDX]], [[START]]
passes the undef start value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we reach vec.epilog.ph
without going through the main vector loop, %bc.merge.rdx
will be the start value. If it is not frozen we have 2 uses of undef (both operands of the compare), and each use may have a different concrete value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the incoming value of FindLastIV PHI in epilogue vectorization can be directly broadcasted with the sentinel value to avoid generating this icmp, I'm not sure if AnyOf has a similar approach to bypass the icmp that propagates undef/poison. Therefore, I think continuing with the approach of freezing the start value is reasonable, as it allows both FindLastIV and AnyOf to share the same approach. :)
✅ With the latest revision this PR passed the C/C++ code formatter. |
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll llvm/test/Transforms/LoopVectorize/epilog-iv-select-cmp.ll The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to current main, so now only the PR changes should be included.
Updated the PR to only freeze when vectorizing the epilogue, as this is the only case where it should be needed
@@ -939,6 +946,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { | |||
case Instruction::ICmp: | |||
case Instruction::Select: | |||
case Instruction::Or: | |||
case Instruction::Freeze: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
@@ -474,6 +475,12 @@ Value *VPInstruction::generate(VPTransformState &State) { | |||
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true); | |||
return Builder.CreateExtractElement(Vec, Idx, Name); | |||
} | |||
case Instruction::Freeze: { | |||
if (State.hasVectorValue(this)) | |||
return State.get(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check was left over from earlier version, removed, thanks
if (R->getParent() && R->getParent()->getParent()) | ||
continue; | ||
|
||
for (unsigned Idx = 0; Idx != R->getNumOperands(); ++Idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VPUser::replaceUsesOfWith in line with User::replaceUsesOfWith
// Update all users outside the vector region. | ||
for (VPUser *U : to_vector(OrigExitingVPV->users())) { | ||
auto *R = cast<VPRecipeBase>(U); | ||
if (R->getParent() && R->getParent()->getParent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to check that the parent's region is the vector loop region, thanks
@@ -10365,24 +10385,7 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L, | |||
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock(); | |||
Header->setName("vec.epilog.vector.body"); | |||
|
|||
// Re-use the trip count and steps expanded for the main loop, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it included in the patch for now, could also split it off.
@@ -6,6 +6,7 @@ define i64 @select_icmp_nuw_nsw(ptr %a, ptr %b, i64 %ii, i64 %n) { | |||
; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[II:%.*]], i64 [[N:%.*]]) { | |||
; CHECK-NEXT: [[ENTRY:.*]]: | |||
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4 | |||
; CHECK-NEXT: [[FR:%.*]] = freeze i64 [[II]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked a bit more and I think the only issue is with epilogue vectorization; Here's a proof where we go via the main vector loop to the scalar remainder loop https://alive2.llvm.org/ce/z/yFSBrq and here where we go from the vector loop to the exit block https://alive2.llvm.org/ce/z/D6Ppzr.
When vectorizing the epilogue, freeze is needed for the start value used in the selects and the resume phi from the main vector loop:
- Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr
- Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v
- Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV
- Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN
The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update - looks almost ready to go!
@@ -9892,14 +9895,22 @@ void LoopVectorizationPlanner::adjustRecipesForReductions( | |||
// also modeled in VPlan. | |||
VPInstruction *FinalReductionResult; | |||
VPBuilder::InsertPointGuard Guard(Builder); | |||
Builder.setInsertPoint(MiddleVPBB, IP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been sunk into the if/else cases when we are inserting into the same location in each case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored, this was left over after moving the code
auto *R = cast<VPRecipeBase>(U); | ||
if (R->getParent()->getParent() == VectorLoopRegion) | ||
continue; | ||
R->replaceUsesOfWith(PhiR->getStartValue(), Start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that the only users of PhiR->getStartValue()
we need to update are operands of the users of OrigExitingVPV
? I guess for FindLastIV that is true because it's usually going to be a select? Could you add a comment here explaining the logic?
Furthermore, what if PhiR->getStartValue() == Start
already? I guess it doesn't do any harm to call replaceUsesOfWith
- I just wondered if we should do something like:
if (PhiR->getStartValue() != Start)
R->replaceUsesOfWith(PhiR->getStartValue(), Start);
I believe the values could be equal if we managed to prove that Start
is not undef or poison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something here, isn't Start defined as VPValue *Start = PhiR->getStartValue();
just above? So isn't this always a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over after moving the logic to the epilogue code, removed now, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same doubt. PhiR->getStartValue()
and Start
looks the same.
@@ -1615,6 +1622,7 @@ void VPWidenRecipe::execute(VPTransformState &State) { | |||
} | |||
case Instruction::Freeze: { | |||
Value *Op = State.get(getOperand(0)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Stray new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
@@ -1372,6 +1372,7 @@ static bool isDefinedInsideLoopRegions(const VPValue *VPV) { | |||
bool VPValue::isDefinedOutsideLoopRegions() const { | |||
return !isDefinedInsideLoopRegions(this); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Stray new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripped, thanks
This is a vital patch. I'm sorry for this blatant question, but, is there any chance of landing it this week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -10377,6 +10380,36 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) { | |||
VPInstruction::ResumePhi, | |||
{VectorTC, MainPlan.getCanonicalIV()->getStartValue()}, {}, | |||
"vec.epilog.resume.val"); | |||
|
|||
// When vectorizing the epilogue, FindLastIV reductions can introduce multiple | |||
// uses of undef/poison. If the reduction start value is not guaranteed to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this makes it even wordier, but shouldn't this be If the reduction start value is not guaranteed to **not** be undef or poison
? In other words, if the start value could be either undef or poison we need to freeze it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I update the comment to say If the reduction start value may be undef or poison....
VPValue *OrigStart = VPI->getOperand(1); | ||
if (isGuaranteedNotToBeUndefOrPoison(OrigStart->getLiveInIRValue())) | ||
continue; | ||
VPBuilder Builder(Plan.getEntry()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe hoist this out of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
// directly in VPlan. | ||
for (auto &R : make_early_inc_range(*Plan.getEntry())) { | ||
auto *VPI = dyn_cast<VPInstruction>(&R); | ||
if (VPI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What guarantee is there that VPI corresponds to the frozen start value? Do we need to check for VPInstructions with opcode Instruction::Freeze? I assume this is supposed to match up with the VPInstructions added by AddFreezeForFindLastIVReductions
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, there can only be freeze VPInstruction in the header, updated to check though
@@ -10365,24 +10385,7 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L, | |||
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock(); | |||
Header->setName("vec.epilog.vector.body"); | |||
|
|||
// Re-use the trip count and steps expanded for the main loop, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after all current comments are fixed.
BTW, I can help update the llvm-testsuite to cover this case if there isn't already a unit test for it.
@@ -1403,6 +1403,13 @@ void VPValue::replaceUsesWithIf( | |||
} | |||
} | |||
|
|||
void VPUser::replaceUsesOfWith(VPValue *From, VPValue *To) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove this since we don't use the function in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed again
@@ -246,6 +246,9 @@ class VPUser { | |||
New->addUser(*this); | |||
} | |||
|
|||
/// Replaces all uses of \p From in the VPUser with \p To. | |||
void replaceUsesOfWith(VPValue *From, VPValue *To); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed again
@@ -6,6 +6,7 @@ define i64 @select_icmp_nuw_nsw(ptr %a, ptr %b, i64 %ii, i64 %n) { | |||
; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[II:%.*]], i64 [[N:%.*]]) { | |||
; CHECK-NEXT: [[ENTRY:.*]]: | |||
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 4 | |||
; CHECK-NEXT: [[FR:%.*]] = freeze i64 [[II]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the incoming value of FindLastIV PHI in epilogue vectorization can be directly broadcasted with the sentinel value to avoid generating this icmp, I'm not sure if AnyOf has a similar approach to bypass the icmp that propagates undef/poison. Therefore, I think continuing with the approach of freezing the start value is reasonable, as it allows both FindLastIV and AnyOf to share the same approach. :)
Sorry for the ignorant question in advance - I am not familiar with the mechanics of vectorization. How does this work with FP values? If the undef value happens to be NaN, then do we care that the comparison will return false (regardless of whether the initial value was updated or not by the vector loop)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for fixing this @fhahn.
for (auto &R : make_early_inc_range(*Plan.getEntry())) { | ||
auto *VPI = dyn_cast<VPInstruction>(&R); | ||
if (VPI && VPI->getOpcode() == Instruction::Freeze) { | ||
VPI->replaceAllUsesWith(Plan.getOrAddLiveIn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what you're doing here, but it looks a bit odd at first glance. You're essentially replacing one freeze in the epilogue entry block with another from the MainLoopIterationCountCheck block, right? Perhaps worth a comment explaining what's happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I also moved the comment about re-using the trip count inside the loop, thanks
This adds missing test coverage for #132691.
…ctorization. This adds missing test coverage for llvm/llvm-project#132691.
…2691) FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on llvm/llvm-project#132689 and llvm/llvm-project#132690. Fixes llvm/llvm-project#126836 PR: llvm/llvm-project#132691
@vzakhari Both AnyOf and FindLastIV do not support floating-point types so far. |
Thank you for the fix! I'd like to backport this fix to the release branch, but I'm not sure how to do it because it seems to depend on some other fixes. I was hoping that someone familiar with this fix could backport it. |
Hi @fhahn, I need to kindly ask you a question. A fix for the issue #126836 has been aimed for LLVM20. Do you have any plans to implement it for the LLVM20 release branch too? It seems the PRs that fix this issue depend too much on the current state of the main branch to make them cherry-pick as-is. |
This adds missing test coverage for llvm/llvm-project#132691.
FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on llvm/llvm-project#132689 and llvm/llvm-project#132690. Fixes llvm/llvm-project#126836 PR: llvm/llvm-project#132691
This adds missing test coverage for llvm#132691.
FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on llvm#132689 and llvm#132690. Fixes llvm#126836 PR: llvm#132691
FindLastIV introduces multiple uses of the start value, where in the
original source there was only a single use.
Each use of undef may produce a different result, so introducing
multiple uses can produce incorrect results when the input is
undef/poison.
If the start value may be undef or poison, freeze it and use the frozen
value, which will be the same at all uses.
Alive2 showing the issue: https://alive2.llvm.org/ce/z/mB4Jtb
Depends on #132689
(included in PR) and #132690
(included in PR)
Fixes #126836