Skip to content

[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. #95305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8184,10 +8184,12 @@ createWidenInductionRecipes(PHINode *Phi, Instruction *PhiOrTrunc,
VPValue *Step =
vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE);
if (auto *TruncI = dyn_cast<TruncInst>(PhiOrTrunc)) {
return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, IndDesc, TruncI);
return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, &Plan.getVF(),
IndDesc, TruncI);
}
assert(isa<PHINode>(PhiOrTrunc) && "must be a phi node here");
return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, IndDesc);
return new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, &Plan.getVF(),
IndDesc);
}

VPHeaderPHIRecipe *VPRecipeBuilder::tryToOptimizeInductionPHI(
Expand Down
26 changes: 22 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -921,11 +921,11 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE,
void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
Value *CanonicalIVStartValue,
VPTransformState &State) {
Type *TCTy = TripCountV->getType();
// Check if the backedge taken count is needed, and if so build it.
if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
auto *TCMO = Builder.CreateSub(TripCountV,
ConstantInt::get(TripCountV->getType(), 1),
auto *TCMO = Builder.CreateSub(TripCountV, ConstantInt::get(TCTy, 1),
"trip.count.minus.1");
BackedgeTakenCount->setUnderlyingValue(TCMO);
}
Expand All @@ -935,8 +935,17 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
// FIXME: Model VF * UF computation completely in VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to handle this FIXME now that VF is assigned a VPValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would either require introducing a UF placeholder or introducing during explicit interleaving, either way probably better as follow-up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so FIXME remains until a symbolic UF VPValue placeholder is introduced, as follow-up, along with a VPInstruction multiplying it with VF (introduced here), possibly subject to constant folding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

assert(VFxUF.getNumUsers() && "VFxUF expected to always have users");
VFxUF.setUnderlyingValue(
createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF));
if (VF.getNumUsers()) {
Value *RuntimeVF = getRuntimeVF(Builder, TCTy, State.VF);
VF.setUnderlyingValue(RuntimeVF);
VFxUF.setUnderlyingValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given RuntimeVF is only non-null if VF.getNumUsers() != 0 wouldn't it be neater to simply fold this into the if (VF.getNumUsers()) { block? i.e.

  if (VF.getNumUsers()) {
    RuntimeVF = createStepForVF(Builder, TripCountV->getType(), State.VF, 1);
    VF.setUnderlyingValue(RuntimeVF);
    VFxUF.setUnderlyingValue(
        State.UF > 1 ? Builder.CreateMul(
                           VF.getLiveInIRValue(),
                           ConstantInt::get(TripCountV->getType(), State.UF))
                     : VF.getLiveInIRValue());
  } else {
    VFxUF.setUnderlyingValue(
        createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF));
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted, thanks!

State.UF > 1
? Builder.CreateMul(RuntimeVF, ConstantInt::get(TCTy, State.UF))
: RuntimeVF);
} else {
VFxUF.setUnderlyingValue(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VFxUF is set regardless of having users or not. Be consistent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see earlier that

... Code for VF is only generated if there are users of VF, to avoid unnecessary test changes.
Code for VF must be generated regardless of direct users, issues is its position and possible repetition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, VFxUF is used to increment the canonical induction, which is present in almost all cases, but could also be done only if users exist.

... Code for VF is only generated if there are users of VF, to avoid unnecessary test changes.
Code for VF must be generated regardless of direct users, issues is its position and possible repetition?

There are cases where VF separately isn't used, only as part of VFxUF. If only VFxUF is used, the constant multiply of VF * UF is folded, hence always generating VF separately would lead to additional test changes. Some alternatives are mentioned in my latest comment above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, VFxUF is used to increment the canonical induction, which is present in almost all cases, but could also be done only if users exist.

Exceptional use-less cases are loops whose trip count is VFxUF - where optimizeForVFAndUF() discards the canonical IV's increment by VFxUF?

So one way to improve consistency, w/o changing too many tests, would be to (also) set VFxUF only if used? Although logically it should always be used - to set the vector trip count(?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, it is always used (added assert in 1a5a1e9).

createStepForVF(Builder, TCTy, State.VF, State.UF));
}

// When vectorizing the epilogue loop, the canonical induction start value
// needs to be changed from zero to the value after the main vector loop.
Expand Down Expand Up @@ -1099,6 +1108,12 @@ InstructionCost VPlan::cost(ElementCount VF, VPCostContext &Ctx) {
void VPlan::printLiveIns(raw_ostream &O) const {
VPSlotTracker SlotTracker(this);

if (VF.getNumUsers() > 0) {
O << "\nLive-in ";
VF.printAsOperand(O, SlotTracker);
O << " = VF";
}

if (VFxUF.getNumUsers() > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this condition be an assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will adjust separately

O << "\nLive-in ";
VFxUF.printAsOperand(O, SlotTracker);
Expand Down Expand Up @@ -1241,6 +1256,7 @@ VPlan *VPlan::duplicate() {
NewPlan->getOrAddLiveIn(OldLiveIn->getLiveInIRValue());
}
Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount;
Old2NewVPValues[&VF] = &NewPlan->VF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be honest it's not obvious to me why we need a specialised version for the UF=1 case. Why can't we use VFxUF always given VFx1 is a valid use case? It seems to add a bit of extra complexity. I understand in a loop we may want to calculate both a runtime VF (for a second part or something) and a runtime VF x UF (for a canonical induction variable, etc), but if we've filled out Old2NewVPValues[&VFxUF] with both VFx1 and VFxUF we can just query accordingly, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut if we've filled out Old2NewVPValues[&VFxUF] with both VFx1 and VFxUF we can just query accordingly, right?

I am not sure what filling them out accordingly means here, perhaps using a pair instead of 2 separate fields?

There are different places that need both VF and VFxUF and to serve them we would need different values I think, but I might be missing something from your suggestion?

Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF;
if (BackedgeTakenCount) {
NewPlan->BackedgeTakenCount = new VPValue();
Expand Down Expand Up @@ -1551,6 +1567,8 @@ void VPSlotTracker::assignName(const VPValue *V) {
}

void VPSlotTracker::assignNames(const VPlan &Plan) {
if (Plan.VF.getNumUsers() > 0)
assignName(&Plan.VF);
if (Plan.VFxUF.getNumUsers() > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this condition be an assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will adjust separately

assignName(&Plan.VFxUF);
assignName(&Plan.VectorTripCount);
Expand Down
19 changes: 15 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -1855,25 +1855,27 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {

public:
VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
const InductionDescriptor &IndDesc)
VPValue *VF, const InductionDescriptor &IndDesc)
: VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV),
Trunc(nullptr), IndDesc(IndDesc) {
addOperand(Step);
addOperand(VF);
}

VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
const InductionDescriptor &IndDesc,
VPValue *VF, const InductionDescriptor &IndDesc,
TruncInst *Trunc)
: VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start),
IV(IV), Trunc(Trunc), IndDesc(IndDesc) {
addOperand(Step);
addOperand(VF);
}

~VPWidenIntOrFpInductionRecipe() override = default;

VPWidenIntOrFpInductionRecipe *clone() override {
return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(),
getStepValue(), IndDesc, Trunc);
return new VPWidenIntOrFpInductionRecipe(
IV, getStartValue(), getStepValue(), getVFValue(), IndDesc, Trunc);
}

VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
Expand Down Expand Up @@ -1906,6 +1908,9 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
VPValue *getStepValue() { return getOperand(1); }
const VPValue *getStepValue() const { return getOperand(1); }

VPValue *getVFValue() { return getOperand(2); }
const VPValue *getVFValue() const { return getOperand(2); }

/// Returns the first defined value as TruncInst, if it is one or nullptr
/// otherwise.
TruncInst *getTruncInst() { return Trunc; }
Expand Down Expand Up @@ -3376,6 +3381,9 @@ class VPlan {
/// Represents the vector trip count.
VPValue VectorTripCount;

/// Represents the vectorization factor of the loop.
VPValue VF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPValue VF;
/// Represents the vectorization factor of the loop.
VPValue VF;

(belongs to the loop Region rather than VPlan itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks!

VF may also be referenced outside the loop region, so probably should be defined at the top-level and used by the region?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is a follow-up thought, hence originally in brackets)
Regions (and blocks in general) currently model the HCFG only, leaving the def-use graph of values to recipes. A loop region has a canonical IV recipe which typically uses VFxUF as the canonical step controlling the loop. The latter could be a Mul recipe whose operands provide VF and UF, and a loop region could provide getVF() and getUF() to ease their retrieval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think ideally the loop region would be a user of both VF, UF (and possibly VF x UF).


/// Represents the loop-invariant VF * UF of the vector loop region.
VPValue VFxUF;

Expand Down Expand Up @@ -3471,6 +3479,9 @@ class VPlan {
/// The vector trip count.
VPValue &getVectorTripCount() { return VectorTripCount; }

/// Returns the VF of the vector loop region.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pointer rather than reference, as in VFxUF, null is not used/returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to return by reference, thanks!

VPValue &getVF() { return VF; };

/// Returns VF * UF of the vector loop region.
VPValue &getVFxUF() { return VFxUF; }

Expand Down
17 changes: 6 additions & 11 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1416,14 +1416,6 @@ static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
: ConstantFP::get(Ty, C);
}

static Value *getRuntimeVFAsFloat(IRBuilderBase &B, Type *FTy,
ElementCount VF) {
assert(FTy->isFloatingPointTy() && "Expected floating point type!");
Type *IntTy = IntegerType::get(FTy->getContext(), FTy->getScalarSizeInBits());
Value *RuntimeVF = getRuntimeVF(B, IntTy, VF);
return B.CreateUIToFP(RuntimeVF, FTy);
}

void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Int or FP induction being replicated.");

Expand Down Expand Up @@ -1481,11 +1473,11 @@ void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
// Multiply the vectorization factor by the step using integer or
// floating-point arithmetic as appropriate.
Type *StepType = Step->getType();
Value *RuntimeVF;
Value *RuntimeVF = State.get(getVFValue(), {0, 0});
if (Step->getType()->isFloatingPointTy())
RuntimeVF = getRuntimeVFAsFloat(Builder, StepType, State.VF);
RuntimeVF = Builder.CreateUIToFP(RuntimeVF, StepType);
else
RuntimeVF = getRuntimeVF(Builder, StepType, State.VF);
RuntimeVF = Builder.CreateZExtOrTrunc(RuntimeVF, StepType);
Value *Mul = Builder.CreateBinOp(MulOp, Step, RuntimeVF);

// Create a vector splat to use in the induction update.
Expand Down Expand Up @@ -1539,6 +1531,9 @@ void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,

O << ", ";
getStepValue()->printAsOperand(O, SlotTracker);

O << ", ";
getVFValue()->printAsOperand(O, SlotTracker);
}
#endif

Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ void VPlanTransforms::VPInstructionsToVPRecipes(
VPValue *Start = Plan->getOrAddLiveIn(II->getStartValue());
VPValue *Step =
vputils::getOrCreateVPValueForSCEVExpr(*Plan, II->getStep(), SE);
NewRecipe = new VPWidenIntOrFpInductionRecipe(Phi, Start, Step, *II);
NewRecipe = new VPWidenIntOrFpInductionRecipe(Phi, Start, Step,
&Plan->getVF(), *II);
} else {
assert(isa<VPInstruction>(&Ingredient) &&
"only VPInstructions expected here");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 4 x i64> [[TMP7]], zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 4 x i64> [[TMP8]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP10]], 4
; CHECK-NEXT: [[TMP12:%.*]] = mul i64 1, [[TMP11]]
; CHECK-NEXT: [[TMP12:%.*]] = mul i64 1, [[TMP6]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[VAL]], i64 0
Expand Down Expand Up @@ -112,9 +110,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 4 x i64> [[TMP7]], zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 4 x i64> [[TMP8]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP11:%.*]] = mul i64 [[TMP10]], 4
; CHECK-NEXT: [[TMP12:%.*]] = mul i64 1, [[TMP11]]
; CHECK-NEXT: [[TMP12:%.*]] = mul i64 1, [[TMP6]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[VAL]], i64 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,7 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-NEXT: [[TMP16:%.*]] = add <vscale x 2 x i64> [[TMP15]], zeroinitializer
; CHECK-NEXT: [[TMP17:%.*]] = mul <vscale x 2 x i64> [[TMP16]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i64> zeroinitializer, [[TMP17]]
; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 2
; CHECK-NEXT: [[TMP20:%.*]] = mul i64 1, [[TMP19]]
; CHECK-NEXT: [[TMP20:%.*]] = mul i64 1, [[TMP9]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP20]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[M]], i64 0
Expand Down Expand Up @@ -271,9 +269,7 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[TMP16:%.*]] = add <vscale x 2 x i64> [[TMP15]], zeroinitializer
; CHECK-NEXT: [[TMP17:%.*]] = mul <vscale x 2 x i64> [[TMP16]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i64> zeroinitializer, [[TMP17]]
; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 2
; CHECK-NEXT: [[TMP20:%.*]] = mul i64 1, [[TMP19]]
; CHECK-NEXT: [[TMP20:%.*]] = mul i64 1, [[TMP9]]
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP20]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[MUL_2_I]], i64 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,7 @@ define void @foo() {
; CHECK-NEXT: [[TMP5:%.*]] = add <vscale x 4 x i64> [[TMP4]], zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = mul <vscale x 4 x i64> [[TMP5]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> zeroinitializer, [[TMP6]]
; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
; CHECK-NEXT: [[TMP9:%.*]] = mul i64 1, [[TMP8]]
; CHECK-NEXT: [[TMP9:%.*]] = mul i64 1, [[TMP19]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP9]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@ define void @test_no_scalarization(ptr %a, ptr noalias %b, i32 %idx, i32 %n) #0
; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 2 x i32> [[TMP8]], zeroinitializer
; CHECK-NEXT: [[TMP10:%.*]] = mul <vscale x 2 x i32> [[TMP9]], shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 1, i64 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i32> [[DOTSPLAT]], [[TMP10]]
; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[TMP12:%.*]] = mul i32 [[TMP11]], 2
; CHECK-NEXT: [[TMP13:%.*]] = mul i32 1, [[TMP12]]
; CHECK-NEXT: [[TMP13:%.*]] = mul i32 1, [[TMP7]]
; CHECK-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 2 x i32> poison, i32 [[TMP13]], i64 0
; CHECK-NEXT: [[DOTSPLAT2:%.*]] = shufflevector <vscale x 2 x i32> [[DOTSPLATINSERT1]], <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -293,10 +293,9 @@ define void @gather_nxv4i32_ind64_stride2(ptr noalias nocapture %a, ptr noalias
; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i64 [[TMP2]], -8
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], [[DOTNEG]]
; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP3]], 2
; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP3]], 3
; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 2
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP7]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ define void @induction_i7(ptr %dst) #0 {
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]]
; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i7
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8()
; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i7>
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i7> [[TMP7]], zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i7> [[TMP8]], shufflevector (<vscale x 2 x i7> insertelement (<vscale x 2 x i7> poison, i7 1, i64 0), <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i7> zeroinitializer, [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call i7 @llvm.vscale.i7()
; CHECK-NEXT: [[TMP11:%.*]] = mul i7 [[TMP10]], 2
; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i7
; CHECK-NEXT: [[TMP12:%.*]] = mul i7 1, [[TMP11]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i7> poison, i7 [[TMP12]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i7> [[DOTSPLATINSERT]], <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer
Expand Down Expand Up @@ -92,14 +92,14 @@ define void @induction_i3_zext(ptr %dst) #0 {
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]]
; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i3
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I guess I was expecting that since this patch only creates a runtime VF on demand that total lines of IR would only decrease, rather than increase which is happening here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this was showing up as net increase was that the test case originally didn't check all instructions in vector.ph; now that it does we safe vscale & mul but need a trunc, so it is neutral overall in this case.

; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8()
; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i3>
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i3> [[TMP7]], zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i3> [[TMP8]], shufflevector (<vscale x 2 x i3> insertelement (<vscale x 2 x i3> poison, i3 1, i64 0), <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i3> zeroinitializer, [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call i3 @llvm.vscale.i3()
; CHECK-NEXT: [[TMP11:%.*]] = mul i3 [[TMP10]], 2
; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i3
; CHECK-NEXT: [[TMP12:%.*]] = mul i3 1, [[TMP11]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i3> poison, i3 [[TMP12]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i3> [[DOTSPLATINSERT]], <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer
Expand Down
4 changes: 1 addition & 3 deletions llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,7 @@ define void @cond_ind64(ptr noalias nocapture %a, ptr noalias nocapture readonly
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP5]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
Expand Down
Loading