-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. #95305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7ed9803
dbdd4f3
031e6b2
907c19f
77c1e61
7659683
4d495d0
4c0467c
2eaf251
9e95c10
4d5f2e5
4ea3c6f
ee8409c
330dde5
b8bcc2f
2b1bf7d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -921,11 +921,11 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE, | |
void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV, | ||
Value *CanonicalIVStartValue, | ||
VPTransformState &State) { | ||
Type *TCTy = TripCountV->getType(); | ||
// Check if the backedge taken count is needed, and if so build it. | ||
if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) { | ||
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator()); | ||
auto *TCMO = Builder.CreateSub(TripCountV, | ||
ConstantInt::get(TripCountV->getType(), 1), | ||
auto *TCMO = Builder.CreateSub(TripCountV, ConstantInt::get(TCTy, 1), | ||
"trip.count.minus.1"); | ||
BackedgeTakenCount->setUnderlyingValue(TCMO); | ||
} | ||
|
@@ -935,8 +935,17 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV, | |
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator()); | ||
// FIXME: Model VF * UF computation completely in VPlan. | ||
assert(VFxUF.getNumUsers() && "VFxUF expected to always have users"); | ||
VFxUF.setUnderlyingValue( | ||
createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF)); | ||
if (VF.getNumUsers()) { | ||
Value *RuntimeVF = getRuntimeVF(Builder, TCTy, State.VF); | ||
VF.setUnderlyingValue(RuntimeVF); | ||
VFxUF.setUnderlyingValue( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given RuntimeVF is only non-null if
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adjusted, thanks! |
||
State.UF > 1 | ||
? Builder.CreateMul(RuntimeVF, ConstantInt::get(TCTy, State.UF)) | ||
: RuntimeVF); | ||
} else { | ||
VFxUF.setUnderlyingValue( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. VFxUF is set regardless of having users or not. Be consistent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see earlier that
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes,
There are cases where VF separately isn't used, only as part of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exceptional use-less cases are loops whose trip count is VFxUF - where optimizeForVFAndUF() discards the canonical IV's increment by VFxUF? So one way to improve consistency, w/o changing too many tests, would be to (also) set VFxUF only if used? Although logically it should always be used - to set the vector trip count(?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the moment, it is always used (added assert in 1a5a1e9). |
||
createStepForVF(Builder, TCTy, State.VF, State.UF)); | ||
} | ||
|
||
// When vectorizing the epilogue loop, the canonical induction start value | ||
// needs to be changed from zero to the value after the main vector loop. | ||
|
@@ -1099,6 +1108,12 @@ InstructionCost VPlan::cost(ElementCount VF, VPCostContext &Ctx) { | |
void VPlan::printLiveIns(raw_ostream &O) const { | ||
VPSlotTracker SlotTracker(this); | ||
|
||
if (VF.getNumUsers() > 0) { | ||
O << "\nLive-in "; | ||
VF.printAsOperand(O, SlotTracker); | ||
O << " = VF"; | ||
} | ||
|
||
if (VFxUF.getNumUsers() > 0) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this condition be an assert? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, will adjust separately |
||
O << "\nLive-in "; | ||
VFxUF.printAsOperand(O, SlotTracker); | ||
|
@@ -1241,6 +1256,7 @@ VPlan *VPlan::duplicate() { | |
NewPlan->getOrAddLiveIn(OldLiveIn->getLiveInIRValue()); | ||
} | ||
Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount; | ||
Old2NewVPValues[&VF] = &NewPlan->VF; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I must be honest it's not obvious to me why we need a specialised version for the UF=1 case. Why can't we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not sure what filling them out accordingly means here, perhaps using a pair instead of 2 separate fields? There are different places that need both VF and VFxUF and to serve them we would need different values I think, but I might be missing something from your suggestion? |
||
Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF; | ||
if (BackedgeTakenCount) { | ||
NewPlan->BackedgeTakenCount = new VPValue(); | ||
|
@@ -1551,6 +1567,8 @@ void VPSlotTracker::assignName(const VPValue *V) { | |
} | ||
|
||
void VPSlotTracker::assignNames(const VPlan &Plan) { | ||
if (Plan.VF.getNumUsers() > 0) | ||
assignName(&Plan.VF); | ||
if (Plan.VFxUF.getNumUsers() > 0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this condition be an assert? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, will adjust separately |
||
assignName(&Plan.VFxUF); | ||
assignName(&Plan.VectorTripCount); | ||
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -1855,25 +1855,27 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe { | |||||||
|
||||||||
public: | ||||||||
VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step, | ||||||||
const InductionDescriptor &IndDesc) | ||||||||
VPValue *VF, const InductionDescriptor &IndDesc) | ||||||||
: VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV), | ||||||||
Trunc(nullptr), IndDesc(IndDesc) { | ||||||||
addOperand(Step); | ||||||||
addOperand(VF); | ||||||||
} | ||||||||
|
||||||||
VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step, | ||||||||
const InductionDescriptor &IndDesc, | ||||||||
VPValue *VF, const InductionDescriptor &IndDesc, | ||||||||
TruncInst *Trunc) | ||||||||
: VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start), | ||||||||
IV(IV), Trunc(Trunc), IndDesc(IndDesc) { | ||||||||
addOperand(Step); | ||||||||
addOperand(VF); | ||||||||
} | ||||||||
|
||||||||
~VPWidenIntOrFpInductionRecipe() override = default; | ||||||||
|
||||||||
VPWidenIntOrFpInductionRecipe *clone() override { | ||||||||
return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(), | ||||||||
getStepValue(), IndDesc, Trunc); | ||||||||
return new VPWidenIntOrFpInductionRecipe( | ||||||||
IV, getStartValue(), getStepValue(), getVFValue(), IndDesc, Trunc); | ||||||||
} | ||||||||
|
||||||||
VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC) | ||||||||
|
@@ -1906,6 +1908,9 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe { | |||||||
VPValue *getStepValue() { return getOperand(1); } | ||||||||
const VPValue *getStepValue() const { return getOperand(1); } | ||||||||
|
||||||||
VPValue *getVFValue() { return getOperand(2); } | ||||||||
const VPValue *getVFValue() const { return getOperand(2); } | ||||||||
|
||||||||
/// Returns the first defined value as TruncInst, if it is one or nullptr | ||||||||
/// otherwise. | ||||||||
TruncInst *getTruncInst() { return Trunc; } | ||||||||
|
@@ -3376,6 +3381,9 @@ class VPlan { | |||||||
/// Represents the vector trip count. | ||||||||
VPValue VectorTripCount; | ||||||||
|
||||||||
/// Represents the vectorization factor of the loop. | ||||||||
VPValue VF; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(belongs to the loop Region rather than VPlan itself) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added thanks! VF may also be referenced outside the loop region, so probably should be defined at the top-level and used by the region? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (This is a follow-up thought, hence originally in brackets) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I think ideally the loop region would be a user of both VF, UF (and possibly VF x UF). |
||||||||
|
||||||||
/// Represents the loop-invariant VF * UF of the vector loop region. | ||||||||
VPValue VFxUF; | ||||||||
|
||||||||
|
@@ -3471,6 +3479,9 @@ class VPlan { | |||||||
/// The vector trip count. | ||||||||
VPValue &getVectorTripCount() { return VectorTripCount; } | ||||||||
|
||||||||
/// Returns the VF of the vector loop region. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why pointer rather than reference, as in VFxUF, null is not used/returned. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adjusted to return by reference, thanks! |
||||||||
VPValue &getVF() { return VF; }; | ||||||||
|
||||||||
/// Returns VF * UF of the vector loop region. | ||||||||
VPValue &getVFxUF() { return VFxUF; } | ||||||||
|
||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,14 +18,14 @@ define void @induction_i7(ptr %dst) #0 { | |
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]] | ||
; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i7 | ||
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() | ||
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4 | ||
; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2 | ||
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2 | ||
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8() | ||
; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i7> | ||
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i7> [[TMP7]], zeroinitializer | ||
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i7> [[TMP8]], shufflevector (<vscale x 2 x i7> insertelement (<vscale x 2 x i7> poison, i7 1, i64 0), <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer) | ||
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i7> zeroinitializer, [[TMP9]] | ||
; CHECK-NEXT: [[TMP10:%.*]] = call i7 @llvm.vscale.i7() | ||
; CHECK-NEXT: [[TMP11:%.*]] = mul i7 [[TMP10]], 2 | ||
; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i7 | ||
; CHECK-NEXT: [[TMP12:%.*]] = mul i7 1, [[TMP11]] | ||
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i7> poison, i7 [[TMP12]], i64 0 | ||
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i7> [[DOTSPLATINSERT]], <vscale x 2 x i7> poison, <vscale x 2 x i32> zeroinitializer | ||
|
@@ -92,14 +92,14 @@ define void @induction_i3_zext(ptr %dst) #0 { | |
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 64, [[N_MOD_VF]] | ||
; CHECK-NEXT: [[IND_END:%.*]] = trunc i64 [[N_VEC]] to i3 | ||
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64() | ||
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4 | ||
; CHECK-NEXT: [[TMP40:%.*]] = mul i64 [[TMP4]], 2 | ||
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP40]], 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is interesting. I guess I was expecting that since this patch only creates a runtime VF on demand that total lines of IR would only decrease, rather than increase which is happening here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason this was showing up as net increase was that the test case originally didn't check all instructions in vector.ph; now that it does we safe vscale & mul but need a trunc, so it is neutral overall in this case. |
||
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 2 x i8> @llvm.stepvector.nxv2i8() | ||
; CHECK-NEXT: [[TMP7:%.*]] = trunc <vscale x 2 x i8> [[TMP6]] to <vscale x 2 x i3> | ||
; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 2 x i3> [[TMP7]], zeroinitializer | ||
; CHECK-NEXT: [[TMP9:%.*]] = mul <vscale x 2 x i3> [[TMP8]], shufflevector (<vscale x 2 x i3> insertelement (<vscale x 2 x i3> poison, i3 1, i64 0), <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer) | ||
; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i3> zeroinitializer, [[TMP9]] | ||
; CHECK-NEXT: [[TMP10:%.*]] = call i3 @llvm.vscale.i3() | ||
; CHECK-NEXT: [[TMP11:%.*]] = mul i3 [[TMP10]], 2 | ||
; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP40]] to i3 | ||
; CHECK-NEXT: [[TMP12:%.*]] = mul i3 1, [[TMP11]] | ||
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i3> poison, i3 [[TMP12]], i64 0 | ||
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i3> [[DOTSPLATINSERT]], <vscale x 2 x i3> poison, <vscale x 2 x i32> zeroinitializer | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to handle this FIXME now that VF is assigned a VPValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, would either require introducing a
UF
placeholder or introducing during explicit interleaving, either way probably better as follow-up?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so FIXME remains until a symbolic UF VPValue placeholder is introduced, as follow-up, along with a VPInstruction multiplying it with VF (introduced here), possibly subject to constant folding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep