-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[CostModel] Add a DstTy to getShuffleCost #141634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-backend-hexagon Author: David Green (davemgreen) ChangesA shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size. Patch is 103.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141634.diff 25 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3f639138d8b75..ac9fde06d691b 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1360,16 +1360,17 @@ class TargetTransformInfo {
const SmallBitVector &OpcodeMask,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;
- /// \return The cost of a shuffle instruction of kind Kind and of type Tp.
- /// The exact mask may be passed as Mask, or else the array will be empty.
- /// The index and subtype parameters are used by the subvector insertion and
- /// extraction shuffle kinds to show the insert/extract point and the type of
- /// the subvector being inserted/extracted. The operands of the shuffle can be
- /// passed through \p Args, which helps improve the cost estimation in some
- /// cases, like in broadcast loads.
- /// NOTE: For subvector extractions Tp represents the source type.
+ /// \return The cost of a shuffle instruction of kind Kind with inputs of type
+ /// SrcTy, producing a vector of type DstTy. The exact mask may be passed as
+ /// Mask, or else the array will be empty. The index and subtype parameters
+ /// are used by the subvector insertion and extraction shuffle kinds to show
+ /// the insert/extract point and the type of the subvector being
+ /// inserted/extracted. The operands of the shuffle can be passed through \p
+ /// Args, which helps improve the cost estimation in some cases, like in
+ /// broadcast loads.
InstructionCost
- getShuffleCost(ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask = {},
+ getShuffleCost(ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask = {},
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
int Index = 0, VectorType *SubTp = nullptr,
ArrayRef<const Value *> Args = {},
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index a80b4c5179bad..b226b6e9e129f 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -711,9 +711,9 @@ class TargetTransformInfoImplBase {
}
virtual InstructionCost
- getShuffleCost(TTI::ShuffleKind Kind, VectorType *Ty, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args = {},
+ getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask, TTI::TargetCostKind CostKind, int Index,
+ VectorType *SubTp, ArrayRef<const Value *> Args = {},
const Instruction *CxtI = nullptr) const {
return 1;
}
@@ -1545,13 +1545,14 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return 0;
if (Shuffle->isExtractSubvectorMask(SubIndex))
- return TargetTTI->getShuffleCost(TTI::SK_ExtractSubvector, VecSrcTy,
- Mask, CostKind, SubIndex, VecTy,
- Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_ExtractSubvector, VecTy,
+ VecSrcTy, Mask, CostKind, SubIndex,
+ VecTy, Operands, Shuffle);
if (Shuffle->isInsertSubvectorMask(NumSubElts, SubIndex))
return TargetTTI->getShuffleCost(
- TTI::SK_InsertSubvector, VecTy, Mask, CostKind, SubIndex,
+ TTI::SK_InsertSubvector, VecTy, VecSrcTy, Mask, CostKind,
+ SubIndex,
FixedVectorType::get(VecTy->getScalarType(), NumSubElts),
Operands, Shuffle);
@@ -1580,21 +1581,24 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return TargetTTI->getShuffleCost(
IsUnary ? TTI::SK_PermuteSingleSrc : TTI::SK_PermuteTwoSrc, VecTy,
- AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
+ VecTy, AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
}
// Narrowing shuffle - perform shuffle at original wider width and
// then extract the lower elements.
+ // FIXME: This can assume widening, which is not true of all vector
+ // architectures (and is not even the default).
AdjustMask.append(NumSubElts - Mask.size(), PoisonMaskElem);
InstructionCost ShuffleCost = TargetTTI->getShuffleCost(
IsUnary ? TTI::SK_PermuteSingleSrc : TTI::SK_PermuteTwoSrc,
- VecSrcTy, AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
+ VecSrcTy, VecSrcTy, AdjustMask, CostKind, 0, nullptr, Operands,
+ Shuffle);
SmallVector<int, 16> ExtractMask(Mask.size());
std::iota(ExtractMask.begin(), ExtractMask.end(), 0);
return ShuffleCost + TargetTTI->getShuffleCost(
- TTI::SK_ExtractSubvector, VecSrcTy,
+ TTI::SK_ExtractSubvector, VecTy, VecSrcTy,
ExtractMask, CostKind, 0, VecTy, {}, Shuffle);
}
@@ -1602,40 +1606,44 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return 0;
if (Shuffle->isReverse())
- return TargetTTI->getShuffleCost(TTI::SK_Reverse, VecTy, Mask, CostKind,
- 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Reverse, VecTy, VecSrcTy, Mask,
+ CostKind, 0, nullptr, Operands,
+ Shuffle);
if (Shuffle->isSelect())
- return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, Mask, CostKind,
- 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, VecSrcTy, Mask,
+ CostKind, 0, nullptr, Operands,
+ Shuffle);
if (Shuffle->isTranspose())
- return TargetTTI->getShuffleCost(TTI::SK_Transpose, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
+ return TargetTTI->getShuffleCost(TTI::SK_Transpose, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
Shuffle);
if (Shuffle->isZeroEltSplat())
- return TargetTTI->getShuffleCost(TTI::SK_Broadcast, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
+ return TargetTTI->getShuffleCost(TTI::SK_Broadcast, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
Shuffle);
if (Shuffle->isSingleSource())
- return TargetTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
- Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, VecTy,
+ VecSrcTy, Mask, CostKind, 0, nullptr,
+ Operands, Shuffle);
if (Shuffle->isInsertSubvectorMask(NumSubElts, SubIndex))
return TargetTTI->getShuffleCost(
- TTI::SK_InsertSubvector, VecTy, Mask, CostKind, SubIndex,
+ TTI::SK_InsertSubvector, VecTy, VecSrcTy, Mask, CostKind, SubIndex,
FixedVectorType::get(VecTy->getScalarType(), NumSubElts), Operands,
Shuffle);
if (Shuffle->isSplice(SubIndex))
- return TargetTTI->getShuffleCost(TTI::SK_Splice, VecTy, Mask, CostKind,
- SubIndex, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Splice, VecTy, VecSrcTy, Mask,
+ CostKind, SubIndex, nullptr, Operands,
+ Shuffle);
- return TargetTTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask,
- CostKind, 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
+ Shuffle);
}
case Instruction::ExtractElement: {
auto *EEI = dyn_cast<ExtractElementInst>(U);
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ff8778168686d..1d32b3190445b 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -329,11 +329,11 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// Cost the call + mask.
auto Cost =
thisT()->getCallInstrCost(nullptr, RetTy, ICA.getArgTypes(), CostKind);
- if (VD->isMasked())
- Cost += thisT()->getShuffleCost(
- TargetTransformInfo::SK_Broadcast,
- VectorType::get(IntegerType::getInt1Ty(Ctx), VF), {}, CostKind, 0,
- nullptr, {});
+ if (VD->isMasked()) {
+ auto VecTy = VectorType::get(IntegerType::getInt1Ty(Ctx), VF);
+ Cost += thisT()->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy,
+ VecTy, {}, CostKind, 0, nullptr, {});
+ }
// Lowering to a library call (with output pointers) may require us to emit
// reloads for the results.
@@ -1101,11 +1101,11 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
TTI::ShuffleKind improveShuffleKindFromMask(TTI::ShuffleKind Kind,
ArrayRef<int> Mask,
- VectorType *Ty, int &Index,
+ VectorType *SrcTy, int &Index,
VectorType *&SubTy) const {
if (Mask.empty())
return Kind;
- int NumSrcElts = Ty->getElementCount().getKnownMinValue();
+ int NumSrcElts = SrcTy->getElementCount().getKnownMinValue();
switch (Kind) {
case TTI::SK_PermuteSingleSrc: {
if (ShuffleVectorInst::isReverseMask(Mask, NumSrcElts))
@@ -1116,7 +1116,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return TTI::SK_Broadcast;
if (ShuffleVectorInst::isExtractSubvectorMask(Mask, NumSrcElts, Index) &&
(Index + Mask.size()) <= (size_t)NumSrcElts) {
- SubTy = FixedVectorType::get(Ty->getElementType(), Mask.size());
+ SubTy = FixedVectorType::get(SrcTy->getElementType(), Mask.size());
return TTI::SK_ExtractSubvector;
}
break;
@@ -1127,7 +1127,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
Mask, NumSrcElts, NumSubElts, Index)) {
if (Index + NumSubElts > NumSrcElts)
return Kind;
- SubTy = FixedVectorType::get(Ty->getElementType(), NumSubElts);
+ SubTy = FixedVectorType::get(SrcTy->getElementType(), NumSubElts);
return TTI::SK_InsertSubvector;
}
if (ShuffleVectorInst::isSelectMask(Mask, NumSrcElts))
@@ -1151,13 +1151,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
}
InstructionCost
- getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args = {},
+ getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask, TTI::TargetCostKind CostKind, int Index,
+ VectorType *SubTp, ArrayRef<const Value *> Args = {},
const Instruction *CxtI = nullptr) const override {
- switch (improveShuffleKindFromMask(Kind, Mask, Tp, Index, SubTp)) {
+ switch (improveShuffleKindFromMask(Kind, Mask, SrcTy, Index, SubTp)) {
case TTI::SK_Broadcast:
- if (auto *FVT = dyn_cast<FixedVectorType>(Tp))
+ if (auto *FVT = dyn_cast<FixedVectorType>(SrcTy))
return getBroadcastShuffleOverhead(FVT, CostKind);
return InstructionCost::getInvalid();
case TTI::SK_Select:
@@ -1166,14 +1166,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
case TTI::SK_Transpose:
case TTI::SK_PermuteSingleSrc:
case TTI::SK_PermuteTwoSrc:
- if (auto *FVT = dyn_cast<FixedVectorType>(Tp))
+ if (auto *FVT = dyn_cast<FixedVectorType>(SrcTy))
return getPermuteShuffleOverhead(FVT, CostKind);
return InstructionCost::getInvalid();
case TTI::SK_ExtractSubvector:
- return getExtractSubvectorOverhead(Tp, CostKind, Index,
+ return getExtractSubvectorOverhead(SrcTy, CostKind, Index,
cast<FixedVectorType>(SubTp));
case TTI::SK_InsertSubvector:
- return getInsertSubvectorOverhead(Tp, CostKind, Index,
+ return getInsertSubvectorOverhead(DstTy, CostKind, Index,
cast<FixedVectorType>(SubTp));
}
llvm_unreachable("Unknown TTI::ShuffleKind");
@@ -1910,6 +1910,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
unsigned Index = cast<ConstantInt>(Args[1])->getZExtValue();
return thisT()->getShuffleCost(TTI::SK_ExtractSubvector,
+ cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, Index, cast<VectorType>(RetTy));
}
@@ -1920,17 +1921,18 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
return thisT()->getShuffleCost(
- TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), {},
- CostKind, Index, cast<VectorType>(Args[1]->getType()));
+ TTI::SK_InsertSubvector, cast<VectorType>(RetTy),
+ cast<VectorType>(Args[0]->getType()), {}, CostKind, Index,
+ cast<VectorType>(Args[1]->getType()));
}
case Intrinsic::vector_reverse: {
- return thisT()->getShuffleCost(TTI::SK_Reverse,
+ return thisT()->getShuffleCost(TTI::SK_Reverse, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, 0, cast<VectorType>(RetTy));
}
case Intrinsic::vector_splice: {
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
- return thisT()->getShuffleCost(TTI::SK_Splice,
+ return thisT()->getShuffleCost(TTI::SK_Splice, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, Index, cast<VectorType>(RetTy));
}
@@ -2376,8 +2378,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
CostKind, 1, nullptr, nullptr);
Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, SearchTy,
CostKind, 0, nullptr, nullptr);
- Cost += thisT()->getShuffleCost(TTI::SK_Broadcast, SearchTy, std::nullopt,
- CostKind, 0, nullptr);
+ Cost += thisT()->getShuffleCost(TTI::SK_Broadcast, SearchTy, SearchTy,
+ std::nullopt, CostKind, 0, nullptr);
Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, SearchTy, RetTy,
CmpInst::ICMP_EQ, CostKind);
Cost +=
@@ -2956,8 +2958,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
while (NumVecElts > MVTLen) {
NumVecElts /= 2;
VectorType *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);
- ShuffleCost += thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, {},
- CostKind, NumVecElts, SubTy);
+ ShuffleCost += thisT()->getShuffleCost(
+ TTI::SK_ExtractSubvector, SubTy, Ty, {}, CostKind, NumVecElts, SubTy);
ArithCost += thisT()->getArithmeticInstrCost(Opcode, SubTy, CostKind);
Ty = SubTy;
++LongVectorCount;
@@ -2973,7 +2975,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// By default reductions need one shuffle per reduction level.
ShuffleCost +=
NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
- {}, CostKind, 0, Ty);
+ Ty, {}, CostKind, 0, Ty);
ArithCost +=
NumReduxLevels * thisT()->getArithmeticInstrCost(Opcode, Ty, CostKind);
return ShuffleCost + ArithCost +
@@ -3047,8 +3049,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
NumVecElts /= 2;
auto *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);
- ShuffleCost += thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, {},
- CostKind, NumVecElts, SubTy);
+ ShuffleCost += thisT()->getShuffleCost(
+ TTI::SK_ExtractSubvector, SubTy, Ty, {}, CostKind, NumVecElts, SubTy);
IntrinsicCostAttributes Attrs(IID, SubTy, {SubTy, SubTy}, FMF);
MinMaxCost += getIntrinsicInstrCost(Attrs, CostKind);
@@ -3064,7 +3066,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// architecture-dependent length.
ShuffleCost +=
NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
- {}, CostKind, 0, Ty);
+ Ty, {}, CostKind, 0, Ty);
IntrinsicCostAttributes Attrs(IID, Ty, {Ty, Ty}, FMF);
MinMaxCost += NumReduxLevels * getIntrinsicInstrCost(Attrs, CostKind);
// The last min/max should be in vector registers and we counted it above.
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 0f857399660fe..b8b8a131c8448 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -980,11 +980,16 @@ InstructionCost TargetTransformInfo::getAltInstrCost(
}
InstructionCost TargetTransformInfo::getShuffleCost(
- ShuffleKind Kind, VectorType *Ty, ArrayRef<int> Mask,
+ ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy, ArrayRef<int> Mask,
TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
ArrayRef<const Value *> Args, const Instruction *CxtI) const {
- InstructionCost Cost = TTIImpl->getShuffleCost(Kind, Ty, Mask, CostKind,
- Index, SubTp, Args, CxtI);
+ assert((Mask.empty() || DstTy->isScalableTy() ||
+ Mask.size() == DstTy->getElementCount().getKnownMinValue()) &&
+ "Expected the Mask to match the return size if given");
+ assert(SrcTy->getScalarType() == DstTy->getScalarType() &&
+ "Expected the same scalar types");
+ InstructionCost Cost = TTIImpl->getShuffleCost(
+ Kind, DstTy, SrcTy, Mask, CostKind, Index, SubTp, Args, CxtI);
assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 97e4993d52b4f..2a8f307307be3 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5439,19 +5439,25 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
return Cost;
}
-InstructionCost AArch64TTIImpl::getShuffleCost(
- TTI::ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args, const Instruction *CxtI) co...
[truncated]
|
@llvm/pr-subscribers-backend-powerpc Author: David Green (davemgreen) ChangesA shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size. Patch is 103.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141634.diff 25 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3f639138d8b75..ac9fde06d691b 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1360,16 +1360,17 @@ class TargetTransformInfo {
const SmallBitVector &OpcodeMask,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;
- /// \return The cost of a shuffle instruction of kind Kind and of type Tp.
- /// The exact mask may be passed as Mask, or else the array will be empty.
- /// The index and subtype parameters are used by the subvector insertion and
- /// extraction shuffle kinds to show the insert/extract point and the type of
- /// the subvector being inserted/extracted. The operands of the shuffle can be
- /// passed through \p Args, which helps improve the cost estimation in some
- /// cases, like in broadcast loads.
- /// NOTE: For subvector extractions Tp represents the source type.
+ /// \return The cost of a shuffle instruction of kind Kind with inputs of type
+ /// SrcTy, producing a vector of type DstTy. The exact mask may be passed as
+ /// Mask, or else the array will be empty. The index and subtype parameters
+ /// are used by the subvector insertion and extraction shuffle kinds to show
+ /// the insert/extract point and the type of the subvector being
+ /// inserted/extracted. The operands of the shuffle can be passed through \p
+ /// Args, which helps improve the cost estimation in some cases, like in
+ /// broadcast loads.
InstructionCost
- getShuffleCost(ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask = {},
+ getShuffleCost(ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask = {},
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
int Index = 0, VectorType *SubTp = nullptr,
ArrayRef<const Value *> Args = {},
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index a80b4c5179bad..b226b6e9e129f 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -711,9 +711,9 @@ class TargetTransformInfoImplBase {
}
virtual InstructionCost
- getShuffleCost(TTI::ShuffleKind Kind, VectorType *Ty, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args = {},
+ getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask, TTI::TargetCostKind CostKind, int Index,
+ VectorType *SubTp, ArrayRef<const Value *> Args = {},
const Instruction *CxtI = nullptr) const {
return 1;
}
@@ -1545,13 +1545,14 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return 0;
if (Shuffle->isExtractSubvectorMask(SubIndex))
- return TargetTTI->getShuffleCost(TTI::SK_ExtractSubvector, VecSrcTy,
- Mask, CostKind, SubIndex, VecTy,
- Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_ExtractSubvector, VecTy,
+ VecSrcTy, Mask, CostKind, SubIndex,
+ VecTy, Operands, Shuffle);
if (Shuffle->isInsertSubvectorMask(NumSubElts, SubIndex))
return TargetTTI->getShuffleCost(
- TTI::SK_InsertSubvector, VecTy, Mask, CostKind, SubIndex,
+ TTI::SK_InsertSubvector, VecTy, VecSrcTy, Mask, CostKind,
+ SubIndex,
FixedVectorType::get(VecTy->getScalarType(), NumSubElts),
Operands, Shuffle);
@@ -1580,21 +1581,24 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return TargetTTI->getShuffleCost(
IsUnary ? TTI::SK_PermuteSingleSrc : TTI::SK_PermuteTwoSrc, VecTy,
- AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
+ VecTy, AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
}
// Narrowing shuffle - perform shuffle at original wider width and
// then extract the lower elements.
+ // FIXME: This can assume widening, which is not true of all vector
+ // architectures (and is not even the default).
AdjustMask.append(NumSubElts - Mask.size(), PoisonMaskElem);
InstructionCost ShuffleCost = TargetTTI->getShuffleCost(
IsUnary ? TTI::SK_PermuteSingleSrc : TTI::SK_PermuteTwoSrc,
- VecSrcTy, AdjustMask, CostKind, 0, nullptr, Operands, Shuffle);
+ VecSrcTy, VecSrcTy, AdjustMask, CostKind, 0, nullptr, Operands,
+ Shuffle);
SmallVector<int, 16> ExtractMask(Mask.size());
std::iota(ExtractMask.begin(), ExtractMask.end(), 0);
return ShuffleCost + TargetTTI->getShuffleCost(
- TTI::SK_ExtractSubvector, VecSrcTy,
+ TTI::SK_ExtractSubvector, VecTy, VecSrcTy,
ExtractMask, CostKind, 0, VecTy, {}, Shuffle);
}
@@ -1602,40 +1606,44 @@ class TargetTransformInfoImplCRTPBase : public TargetTransformInfoImplBase {
return 0;
if (Shuffle->isReverse())
- return TargetTTI->getShuffleCost(TTI::SK_Reverse, VecTy, Mask, CostKind,
- 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Reverse, VecTy, VecSrcTy, Mask,
+ CostKind, 0, nullptr, Operands,
+ Shuffle);
if (Shuffle->isSelect())
- return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, Mask, CostKind,
- 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Select, VecTy, VecSrcTy, Mask,
+ CostKind, 0, nullptr, Operands,
+ Shuffle);
if (Shuffle->isTranspose())
- return TargetTTI->getShuffleCost(TTI::SK_Transpose, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
+ return TargetTTI->getShuffleCost(TTI::SK_Transpose, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
Shuffle);
if (Shuffle->isZeroEltSplat())
- return TargetTTI->getShuffleCost(TTI::SK_Broadcast, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
+ return TargetTTI->getShuffleCost(TTI::SK_Broadcast, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
Shuffle);
if (Shuffle->isSingleSource())
- return TargetTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, VecTy, Mask,
- CostKind, 0, nullptr, Operands,
- Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, VecTy,
+ VecSrcTy, Mask, CostKind, 0, nullptr,
+ Operands, Shuffle);
if (Shuffle->isInsertSubvectorMask(NumSubElts, SubIndex))
return TargetTTI->getShuffleCost(
- TTI::SK_InsertSubvector, VecTy, Mask, CostKind, SubIndex,
+ TTI::SK_InsertSubvector, VecTy, VecSrcTy, Mask, CostKind, SubIndex,
FixedVectorType::get(VecTy->getScalarType(), NumSubElts), Operands,
Shuffle);
if (Shuffle->isSplice(SubIndex))
- return TargetTTI->getShuffleCost(TTI::SK_Splice, VecTy, Mask, CostKind,
- SubIndex, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_Splice, VecTy, VecSrcTy, Mask,
+ CostKind, SubIndex, nullptr, Operands,
+ Shuffle);
- return TargetTTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, Mask,
- CostKind, 0, nullptr, Operands, Shuffle);
+ return TargetTTI->getShuffleCost(TTI::SK_PermuteTwoSrc, VecTy, VecSrcTy,
+ Mask, CostKind, 0, nullptr, Operands,
+ Shuffle);
}
case Instruction::ExtractElement: {
auto *EEI = dyn_cast<ExtractElementInst>(U);
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ff8778168686d..1d32b3190445b 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -329,11 +329,11 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// Cost the call + mask.
auto Cost =
thisT()->getCallInstrCost(nullptr, RetTy, ICA.getArgTypes(), CostKind);
- if (VD->isMasked())
- Cost += thisT()->getShuffleCost(
- TargetTransformInfo::SK_Broadcast,
- VectorType::get(IntegerType::getInt1Ty(Ctx), VF), {}, CostKind, 0,
- nullptr, {});
+ if (VD->isMasked()) {
+ auto VecTy = VectorType::get(IntegerType::getInt1Ty(Ctx), VF);
+ Cost += thisT()->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy,
+ VecTy, {}, CostKind, 0, nullptr, {});
+ }
// Lowering to a library call (with output pointers) may require us to emit
// reloads for the results.
@@ -1101,11 +1101,11 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
TTI::ShuffleKind improveShuffleKindFromMask(TTI::ShuffleKind Kind,
ArrayRef<int> Mask,
- VectorType *Ty, int &Index,
+ VectorType *SrcTy, int &Index,
VectorType *&SubTy) const {
if (Mask.empty())
return Kind;
- int NumSrcElts = Ty->getElementCount().getKnownMinValue();
+ int NumSrcElts = SrcTy->getElementCount().getKnownMinValue();
switch (Kind) {
case TTI::SK_PermuteSingleSrc: {
if (ShuffleVectorInst::isReverseMask(Mask, NumSrcElts))
@@ -1116,7 +1116,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return TTI::SK_Broadcast;
if (ShuffleVectorInst::isExtractSubvectorMask(Mask, NumSrcElts, Index) &&
(Index + Mask.size()) <= (size_t)NumSrcElts) {
- SubTy = FixedVectorType::get(Ty->getElementType(), Mask.size());
+ SubTy = FixedVectorType::get(SrcTy->getElementType(), Mask.size());
return TTI::SK_ExtractSubvector;
}
break;
@@ -1127,7 +1127,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
Mask, NumSrcElts, NumSubElts, Index)) {
if (Index + NumSubElts > NumSrcElts)
return Kind;
- SubTy = FixedVectorType::get(Ty->getElementType(), NumSubElts);
+ SubTy = FixedVectorType::get(SrcTy->getElementType(), NumSubElts);
return TTI::SK_InsertSubvector;
}
if (ShuffleVectorInst::isSelectMask(Mask, NumSrcElts))
@@ -1151,13 +1151,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
}
InstructionCost
- getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args = {},
+ getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
+ ArrayRef<int> Mask, TTI::TargetCostKind CostKind, int Index,
+ VectorType *SubTp, ArrayRef<const Value *> Args = {},
const Instruction *CxtI = nullptr) const override {
- switch (improveShuffleKindFromMask(Kind, Mask, Tp, Index, SubTp)) {
+ switch (improveShuffleKindFromMask(Kind, Mask, SrcTy, Index, SubTp)) {
case TTI::SK_Broadcast:
- if (auto *FVT = dyn_cast<FixedVectorType>(Tp))
+ if (auto *FVT = dyn_cast<FixedVectorType>(SrcTy))
return getBroadcastShuffleOverhead(FVT, CostKind);
return InstructionCost::getInvalid();
case TTI::SK_Select:
@@ -1166,14 +1166,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
case TTI::SK_Transpose:
case TTI::SK_PermuteSingleSrc:
case TTI::SK_PermuteTwoSrc:
- if (auto *FVT = dyn_cast<FixedVectorType>(Tp))
+ if (auto *FVT = dyn_cast<FixedVectorType>(SrcTy))
return getPermuteShuffleOverhead(FVT, CostKind);
return InstructionCost::getInvalid();
case TTI::SK_ExtractSubvector:
- return getExtractSubvectorOverhead(Tp, CostKind, Index,
+ return getExtractSubvectorOverhead(SrcTy, CostKind, Index,
cast<FixedVectorType>(SubTp));
case TTI::SK_InsertSubvector:
- return getInsertSubvectorOverhead(Tp, CostKind, Index,
+ return getInsertSubvectorOverhead(DstTy, CostKind, Index,
cast<FixedVectorType>(SubTp));
}
llvm_unreachable("Unknown TTI::ShuffleKind");
@@ -1910,6 +1910,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
unsigned Index = cast<ConstantInt>(Args[1])->getZExtValue();
return thisT()->getShuffleCost(TTI::SK_ExtractSubvector,
+ cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, Index, cast<VectorType>(RetTy));
}
@@ -1920,17 +1921,18 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
return thisT()->getShuffleCost(
- TTI::SK_InsertSubvector, cast<VectorType>(Args[0]->getType()), {},
- CostKind, Index, cast<VectorType>(Args[1]->getType()));
+ TTI::SK_InsertSubvector, cast<VectorType>(RetTy),
+ cast<VectorType>(Args[0]->getType()), {}, CostKind, Index,
+ cast<VectorType>(Args[1]->getType()));
}
case Intrinsic::vector_reverse: {
- return thisT()->getShuffleCost(TTI::SK_Reverse,
+ return thisT()->getShuffleCost(TTI::SK_Reverse, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, 0, cast<VectorType>(RetTy));
}
case Intrinsic::vector_splice: {
unsigned Index = cast<ConstantInt>(Args[2])->getZExtValue();
- return thisT()->getShuffleCost(TTI::SK_Splice,
+ return thisT()->getShuffleCost(TTI::SK_Splice, cast<VectorType>(RetTy),
cast<VectorType>(Args[0]->getType()), {},
CostKind, Index, cast<VectorType>(RetTy));
}
@@ -2376,8 +2378,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
CostKind, 1, nullptr, nullptr);
Cost += thisT()->getVectorInstrCost(Instruction::InsertElement, SearchTy,
CostKind, 0, nullptr, nullptr);
- Cost += thisT()->getShuffleCost(TTI::SK_Broadcast, SearchTy, std::nullopt,
- CostKind, 0, nullptr);
+ Cost += thisT()->getShuffleCost(TTI::SK_Broadcast, SearchTy, SearchTy,
+ std::nullopt, CostKind, 0, nullptr);
Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, SearchTy, RetTy,
CmpInst::ICMP_EQ, CostKind);
Cost +=
@@ -2956,8 +2958,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
while (NumVecElts > MVTLen) {
NumVecElts /= 2;
VectorType *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);
- ShuffleCost += thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, {},
- CostKind, NumVecElts, SubTy);
+ ShuffleCost += thisT()->getShuffleCost(
+ TTI::SK_ExtractSubvector, SubTy, Ty, {}, CostKind, NumVecElts, SubTy);
ArithCost += thisT()->getArithmeticInstrCost(Opcode, SubTy, CostKind);
Ty = SubTy;
++LongVectorCount;
@@ -2973,7 +2975,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// By default reductions need one shuffle per reduction level.
ShuffleCost +=
NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
- {}, CostKind, 0, Ty);
+ Ty, {}, CostKind, 0, Ty);
ArithCost +=
NumReduxLevels * thisT()->getArithmeticInstrCost(Opcode, Ty, CostKind);
return ShuffleCost + ArithCost +
@@ -3047,8 +3049,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
NumVecElts /= 2;
auto *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);
- ShuffleCost += thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, {},
- CostKind, NumVecElts, SubTy);
+ ShuffleCost += thisT()->getShuffleCost(
+ TTI::SK_ExtractSubvector, SubTy, Ty, {}, CostKind, NumVecElts, SubTy);
IntrinsicCostAttributes Attrs(IID, SubTy, {SubTy, SubTy}, FMF);
MinMaxCost += getIntrinsicInstrCost(Attrs, CostKind);
@@ -3064,7 +3066,7 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// architecture-dependent length.
ShuffleCost +=
NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
- {}, CostKind, 0, Ty);
+ Ty, {}, CostKind, 0, Ty);
IntrinsicCostAttributes Attrs(IID, Ty, {Ty, Ty}, FMF);
MinMaxCost += NumReduxLevels * getIntrinsicInstrCost(Attrs, CostKind);
// The last min/max should be in vector registers and we counted it above.
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 0f857399660fe..b8b8a131c8448 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -980,11 +980,16 @@ InstructionCost TargetTransformInfo::getAltInstrCost(
}
InstructionCost TargetTransformInfo::getShuffleCost(
- ShuffleKind Kind, VectorType *Ty, ArrayRef<int> Mask,
+ ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy, ArrayRef<int> Mask,
TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
ArrayRef<const Value *> Args, const Instruction *CxtI) const {
- InstructionCost Cost = TTIImpl->getShuffleCost(Kind, Ty, Mask, CostKind,
- Index, SubTp, Args, CxtI);
+ assert((Mask.empty() || DstTy->isScalableTy() ||
+ Mask.size() == DstTy->getElementCount().getKnownMinValue()) &&
+ "Expected the Mask to match the return size if given");
+ assert(SrcTy->getScalarType() == DstTy->getScalarType() &&
+ "Expected the same scalar types");
+ InstructionCost Cost = TTIImpl->getShuffleCost(
+ Kind, DstTy, SrcTy, Mask, CostKind, Index, SubTp, Args, CxtI);
assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 97e4993d52b4f..2a8f307307be3 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5439,19 +5439,25 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
return Cost;
}
-InstructionCost AArch64TTIImpl::getShuffleCost(
- TTI::ShuffleKind Kind, VectorType *Tp, ArrayRef<int> Mask,
- TTI::TargetCostKind CostKind, int Index, VectorType *SubTp,
- ArrayRef<const Value *> Args, const Instruction *CxtI) co...
[truncated]
|
/// Mask, or else the array will be empty. The index and subtype parameters | ||
/// are used by the subvector insertion and extraction shuffle kinds to show | ||
/// the insert/extract point and the type of the subvector being | ||
/// inserted/extracted. The operands of the shuffle can be passed through \p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the description to explain that SubTp should only be used for SK_InsertSubvector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment (but it might not be true quite yet as we still set and use the SubTp for ExtractSubRegs at the moment).
A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.
723c45a
to
3735b0b
Compare
A shuffle will take two input vectors and a mask, to produce a new vector of size . Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors.
This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example).
Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.