Skip to content

[LV] Use frozen start value for FindLastIV if needed. #132691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
3e42683
[VPlan] Add ComputeFindLastIVResult opcode (NFC).
fhahn Mar 23, 2025
c551166
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC).
fhahn Mar 23, 2025
4591537
Match
fhahn Mar 23, 2025
ff11744
[LV] Use frozen start value for FindLastIV if needed.
fhahn Mar 22, 2025
ab4681a
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Mar 31, 2025
691d8c3
!fixup address latest comments, thanks
fhahn Mar 31, 2025
cc449a8
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Mar 31, 2025
ab70b60
!fixup limit to epilogue vectorization.
fhahn Mar 31, 2025
10d7b09
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Mar 31, 2025
4e2b58d
!fixup limit to epilogue
fhahn Mar 31, 2025
4fc8fe6
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Mar 31, 2025
094607c
!fixup update tests, unify code.
fhahn Mar 31, 2025
6554b64
!fixup fix formatting.
fhahn Mar 31, 2025
4a8163c
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 1, 2025
099676f
!fixup remove unneeded changes.
fhahn Apr 1, 2025
5979201
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 1, 2025
14362e3
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 2, 2025
3f3306b
!fixup address latest comments, thanks!
fhahn Apr 2, 2025
50598f1
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 2, 2025
db60a65
@fixup remove replaceUsesOfWith again.
fhahn Apr 2, 2025
177002a
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 3, 2025
557a5b9
Merge remote-tracking branch 'origin/main' into findlastiv-poison-safe
fhahn Apr 3, 2025
3df74b7
!fixup address latest comments, reorder to handle more IVs.
fhahn Apr 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 69 additions & 24 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7656,14 +7656,17 @@ static void fixReductionScalarResumeWhenVectorizingEpilog(
} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
using namespace llvm::PatternMatch;
Value *Cmp, *OrigResumeV;
Value *Cmp, *OrigResumeV, *CmpOp;
bool IsExpectedPattern =
match(MainResumeValue, m_Select(m_OneUse(m_Value(Cmp)),
m_Specific(RdxDesc.getSentinelValue()),
m_Value(OrigResumeV))) &&
match(Cmp,
m_SpecificICmp(ICmpInst::ICMP_EQ, m_Specific(OrigResumeV),
m_Specific(RdxDesc.getRecurrenceStartValue())));
(match(Cmp, m_SpecificICmp(ICmpInst::ICMP_EQ, m_Specific(OrigResumeV),
m_Value(CmpOp))) &&
(match(CmpOp,
m_Freeze(m_Specific(RdxDesc.getRecurrenceStartValue()))) ||
(CmpOp == RdxDesc.getRecurrenceStartValue() &&
isGuaranteedNotToBeUndefOrPoison(CmpOp))));
assert(IsExpectedPattern && "Unexpected reduction resume pattern");
(void)IsExpectedPattern;
MainResumeValue = OrigResumeV;
Expand Down Expand Up @@ -10377,6 +10380,36 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
VPInstruction::ResumePhi,
{VectorTC, MainPlan.getCanonicalIV()->getStartValue()}, {},
"vec.epilog.resume.val");

// When vectorizing the epilogue, FindLastIV reductions can introduce multiple
// uses of undef/poison. If the reduction start value is not guaranteed to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this makes it even wordier, but shouldn't this be If the reduction start value is not guaranteed to **not** be undef or poison? In other words, if the start value could be either undef or poison we need to freeze it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the comment to say If the reduction start value may be undef or poison....

// undef or poison, we need to freeze it and use the frozen start when
// computing the reduction result. We also need to use the frozen value in the
// resume phi generated by the main vector loop, as this is also used to
// compute the reduction result after the epilogue vector loop.
auto AddFreezeForFindLastIVReductions = [](VPlan &Plan,
bool UpdateResumePhis) {
for (VPRecipeBase &R : *Plan.getMiddleBlock()) {
auto *VPI = dyn_cast<VPInstruction>(&R);
if (!VPI || VPI->getOpcode() != VPInstruction::ComputeFindLastIVResult)
continue;
VPValue *OrigStart = VPI->getOperand(1);
if (isGuaranteedNotToBeUndefOrPoison(OrigStart->getLiveInIRValue()))
continue;
VPBuilder Builder(Plan.getEntry());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe hoist this out of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

VPInstruction *Freeze =
Builder.createNaryOp(Instruction::Freeze, {OrigStart}, {}, "fr");
VPI->setOperand(1, Freeze);
if (UpdateResumePhis)
OrigStart->replaceUsesWithIf(Freeze, [Freeze](VPUser &U, unsigned) {
return Freeze != &U && isa<VPInstruction>(&U) &&
cast<VPInstruction>(&U)->getOpcode() ==
VPInstruction::ResumePhi;
});
}
};
AddFreezeForFindLastIVReductions(MainPlan, true);
AddFreezeForFindLastIVReductions(EpiPlan, false);
}

/// Prepare \p Plan for vectorizing the epilogue loop. That is, re-use expanded
Expand All @@ -10389,24 +10422,7 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();
Header->setName("vec.epilog.vector.body");

// Re-use the trip count and steps expanded for the main loop, as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in theory this code could be moved to the end of the function in a separate NFC patch, with this patch then adding the new bits:

    auto *VPI = dyn_cast<VPInstruction>(&R);
    if (VPI) {
      VPI->replaceAllUsesWith(Plan.getOrAddLiveIn(
          ToFrozen[VPI->getOperand(0)->getLiveInIRValue()]));
      continue;
    }

However, I don't want to create unnecessary burden as I know this is an important fix. I'll leave it up to you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it included in the patch for now, could also split it off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. 👍

// skeleton creation needs it as a value that dominates both the scalar
// and vector epilogue loops
// TODO: This is a workaround needed for epilogue vectorization and it
// should be removed once induction resume value creation is done
// directly in VPlan.
for (auto &R : make_early_inc_range(*Plan.getEntry())) {
auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
if (!ExpandR)
continue;
auto *ExpandedVal =
Plan.getOrAddLiveIn(ExpandedSCEVs.find(ExpandR->getSCEV())->second);
ExpandR->replaceAllUsesWith(ExpandedVal);
if (Plan.getTripCount() == ExpandR)
Plan.resetTripCount(ExpandedVal);
ExpandR->eraseFromParent();
}

DenseMap<Value *, Value *> ToFrozen;
// Ensure that the start values for all header phi recipes are updated before
// vectorizing the epilogue loop.
for (VPRecipeBase &R : Header->phis()) {
Expand Down Expand Up @@ -10472,6 +10488,10 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
ResumeV =
Builder.CreateICmpNE(ResumeV, RdxDesc.getRecurrenceStartValue());
} else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK)) {
ToFrozen[RdxDesc.getRecurrenceStartValue()] =
cast<PHINode>(ResumeV)->getIncomingValueForBlock(
EPI.MainLoopIterationCountCheck);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confused me at first because there is no obvious indication that EPI.MainLoopIterationCountCheck is a Block pointer. Doesn't need doing in this PR, but it would be good to rename MainLoopIterationCountCheck to something like MainLoopIterationCountCheckBlock.


// VPReductionPHIRecipe for FindLastIV reductions requires an adjustment
// to the resume value. The resume value is adjusted to the sentinel
// value when the final value from the main vector loop equals the start
Expand All @@ -10480,8 +10500,8 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
// variable.
BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
Value *Cmp =
Builder.CreateICmpEQ(ResumeV, RdxDesc.getRecurrenceStartValue());
Value *Cmp = Builder.CreateICmpEQ(
ResumeV, ToFrozen[RdxDesc.getRecurrenceStartValue()]);
ResumeV =
Builder.CreateSelect(Cmp, RdxDesc.getSentinelValue(), ResumeV);
}
Expand All @@ -10497,6 +10517,31 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
VPValue *StartVal = Plan.getOrAddLiveIn(ResumeV);
cast<VPHeaderPHIRecipe>(&R)->setStartValue(StartVal);
}

// Re-use the trip count and steps expanded for the main loop, as
// skeleton creation needs it as a value that dominates both the scalar
// and vector epilogue loops
// TODO: This is a workaround needed for epilogue vectorization and it
// should be removed once induction resume value creation is done
// directly in VPlan.
for (auto &R : make_early_inc_range(*Plan.getEntry())) {
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What guarantee is there that VPI corresponds to the frozen start value? Do we need to check for VPInstructions with opcode Instruction::Freeze? I assume this is supposed to match up with the VPInstructions added by AddFreezeForFindLastIVReductions above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, there can only be freeze VPInstruction in the header, updated to check though

VPI->replaceAllUsesWith(Plan.getOrAddLiveIn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you're doing here, but it looks a bit odd at first glance. You're essentially replacing one freeze in the epilogue entry block with another from the MainLoopIterationCountCheck block, right? Perhaps worth a comment explaining what's happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I also moved the comment about re-using the trip count inside the loop, thanks

ToFrozen[VPI->getOperand(0)->getLiveInIRValue()]));
continue;
}

auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
if (!ExpandR)
continue;
auto *ExpandedVal =
Plan.getOrAddLiveIn(ExpandedSCEVs.find(ExpandR->getSCEV())->second);
ExpandR->replaceAllUsesWith(ExpandedVal);
if (Plan.getTripCount() == ExpandR)
Plan.resetTripCount(ExpandedVal);
ExpandR->eraseFromParent();
}
}

// Generate bypass values from the additional bypass block. Note that when the
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1403,6 +1403,13 @@ void VPValue::replaceUsesWithIf(
}
}

void VPUser::replaceUsesOfWith(VPValue *From, VPValue *To) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this since we don't use the function in this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed again

for (unsigned Idx = 0; Idx != getNumOperands(); ++Idx) {
if (getOperand(Idx) == From)
setOperand(Idx, To);
}
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPValue::printAsOperand(raw_ostream &OS, VPSlotTracker &Tracker) const {
OS << Tracker.getOrCreateName(this);
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,7 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
if (isSingleScalar() || isVectorToScalar())
return true;
switch (Opcode) {
case Instruction::Freeze:
case Instruction::ICmp:
case Instruction::PHI:
case Instruction::Select:
Expand Down Expand Up @@ -474,6 +475,10 @@ Value *VPInstruction::generate(VPTransformState &State) {
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true);
return Builder.CreateExtractElement(Vec, Idx, Name);
}
case Instruction::Freeze: {
Value *Op = State.get(getOperand(0), vputils::onlyFirstLaneUsed(this));
return Builder.CreateFreeze(Op, Name);
}
case Instruction::ICmp: {
bool OnlyFirstLaneUsed = vputils::onlyFirstLaneUsed(this);
Value *A = State.get(getOperand(0), OnlyFirstLaneUsed);
Expand Down Expand Up @@ -909,6 +914,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
return false;
switch (getOpcode()) {
case Instruction::ExtractElement:
case Instruction::Freeze:
case Instruction::ICmp:
case Instruction::Select:
case VPInstruction::AnyOf:
Expand Down Expand Up @@ -941,6 +947,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
case Instruction::ICmp:
case Instruction::Select:
case Instruction::Or:
case Instruction::Freeze:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also add this opcode to VPInstruction::opcodeMayReadOrWriteFromMemory and return false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

// TODO: Cover additional opcodes.
return vputils::onlyFirstLaneUsed(this);
case VPInstruction::ActiveLaneMask:
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanValue.h
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ class VPUser {
New->addUser(*this);
}

/// Replaces all uses of \p From in the VPUser with \p To.
void replaceUsesOfWith(VPValue *From, VPValue *To);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed again


typedef SmallVectorImpl<VPValue *>::iterator operand_iterator;
typedef SmallVectorImpl<VPValue *>::const_iterator const_operand_iterator;
typedef iterator_range<operand_iterator> operand_range;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP0]] to i32
; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[TMP1]], 1
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP2]], 8
; CHECK-NEXT: [[FR:%.*]] = freeze i8 [[START]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH:.*]], label %[[VECTOR_MAIN_LOOP_ITER_CHECK:.*]]
; CHECK: [[VECTOR_MAIN_LOOP_ITER_CHECK]]:
; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i32 [[TMP2]], 32
Expand Down Expand Up @@ -42,7 +43,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK-NEXT: [[RDX_MINMAX:%.*]] = call <16 x i8> @llvm.smax.v16i8(<16 x i8> [[TMP10]], <16 x i8> [[TMP11]])
; CHECK-NEXT: [[TMP13:%.*]] = call i8 @llvm.vector.reduce.smax.v16i8(<16 x i8> [[RDX_MINMAX]])
; CHECK-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp ne i8 [[TMP13]], -128
; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP12]], i8 [[TMP13]], i8 [[START]]
; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP12]], i8 [[TMP13]], i8 [[FR]]
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
Expand All @@ -53,8 +54,8 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK: [[VEC_EPILOG_PH]]:
; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ [[TMP3]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i8 [ [[RDX_SELECT]], %[[VEC_EPILOG_ITER_CHECK]] ], [ [[START]], %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i8 [[BC_MERGE_RDX]], [[START]]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i8 [ [[RDX_SELECT]], %[[VEC_EPILOG_ITER_CHECK]] ], [ [[FR]], %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i8 [[BC_MERGE_RDX]], [[FR]]
; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i8 -128, i8 [[BC_MERGE_RDX]]
; CHECK-NEXT: [[N_MOD_VF4:%.*]] = urem i32 [[TMP2]], 8
; CHECK-NEXT: [[N_VEC5:%.*]] = sub i32 [[TMP2]], [[N_MOD_VF4]]
Expand Down Expand Up @@ -82,7 +83,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; CHECK-NEXT: [[TMP22:%.*]] = call i8 @llvm.vector.reduce.smax.v8i8(<8 x i8> [[TMP20]])
; CHECK-NEXT: [[RDX_SELECT_CMP14:%.*]] = icmp ne i8 [[TMP22]], -128
; CHECK-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i8 [[TMP22]], i8 [[START]]
; CHECK-NEXT: [[RDX_SELECT15:%.*]] = select i1 [[RDX_SELECT_CMP14]], i8 [[TMP22]], i8 [[FR]]
; CHECK-NEXT: [[CMP_N16:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC5]]
; CHECK-NEXT: br i1 [[CMP_N16]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
Expand Down
9 changes: 5 additions & 4 deletions llvm/test/Transforms/LoopVectorize/epilog-iv-select-cmp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP0]] to i32
; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[TMP1]], 1
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP2]], 4
; CHECK-NEXT: [[FR:%.*]] = freeze i8 [[START]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH:.*]], label %[[VECTOR_MAIN_LOOP_ITER_CHECK:.*]]
; CHECK: [[VECTOR_MAIN_LOOP_ITER_CHECK]]:
; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i32 [[TMP2]], 4
Expand All @@ -243,7 +244,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: [[TMP10:%.*]] = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> [[TMP8]])
; CHECK-NEXT: [[RDX_SELECT_CMP:%.*]] = icmp ne i8 [[TMP10]], -128
; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i8 [[TMP10]], i8 [[START]]
; CHECK-NEXT: [[RDX_SELECT:%.*]] = select i1 [[RDX_SELECT_CMP]], i8 [[TMP10]], i8 [[FR]]
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
Expand All @@ -254,8 +255,8 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK: [[VEC_EPILOG_PH]]:
; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ [[TMP3]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i8 [ [[RDX_SELECT]], %[[VEC_EPILOG_ITER_CHECK]] ], [ [[START]], %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i8 [[BC_MERGE_RDX]], [[START]]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i8 [ [[RDX_SELECT]], %[[VEC_EPILOG_ITER_CHECK]] ], [ [[FR]], %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i8 [[BC_MERGE_RDX]], [[FR]]
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i8 -128, i8 [[BC_MERGE_RDX]]
; CHECK-NEXT: [[N_MOD_VF2:%.*]] = urem i32 [[TMP2]], 4
; CHECK-NEXT: [[N_VEC3:%.*]] = sub i32 [[TMP2]], [[N_MOD_VF2]]
Expand Down Expand Up @@ -283,7 +284,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK: [[VEC_EPILOG_MIDDLE_BLOCK]]:
; CHECK-NEXT: [[TMP19:%.*]] = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> [[TMP17]])
; CHECK-NEXT: [[RDX_SELECT_CMP12:%.*]] = icmp ne i8 [[TMP19]], -128
; CHECK-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i8 [[TMP19]], i8 [[START]]
; CHECK-NEXT: [[RDX_SELECT13:%.*]] = select i1 [[RDX_SELECT_CMP12]], i8 [[TMP19]], i8 [[FR]]
; CHECK-NEXT: [[CMP_N14:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC3]]
; CHECK-NEXT: br i1 [[CMP_N14]], label %[[EXIT]], label %[[VEC_EPILOG_SCALAR_PH]]
; CHECK: [[VEC_EPILOG_SCALAR_PH]]:
Expand Down
Loading