Skip to content

[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 9, 2024

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Dec 4, 2024

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).

@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Simon Pilgrim (RKSimon)

Changes

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).


Patch is 29.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118652.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+99-99)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index b9caf8c0df9be1..385b2d1e802a81 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -91,38 +91,40 @@ class VectorCombine {
   // TODO: Direct calls from the top-level "run" loop use a plain "Instruction"
   //       parameter. That should be updated to specific sub-classes because the
   //       run loop was changed to dispatch on opcode.
-  bool vectorizeLoadInsert(Instruction &I);
-  bool widenSubvectorLoad(Instruction &I);
+  bool vectorizeLoadInsert(Instruction &I, TTI::TargetCostKind CostKind);
+  bool widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind);
   ExtractElementInst *getShuffleExtract(ExtractElementInst *Ext0,
                                         ExtractElementInst *Ext1,
+                                        TTI::TargetCostKind CostKind,
                                         unsigned PreferredExtractIndex) const;
   bool isExtractExtractCheap(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                              const Instruction &I,
                              ExtractElementInst *&ConvertToShuffle,
+                             TTI::TargetCostKind CostKind,
                              unsigned PreferredExtractIndex);
   void foldExtExtCmp(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                      Instruction &I);
   void foldExtExtBinop(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                        Instruction &I);
-  bool foldExtractExtract(Instruction &I);
-  bool foldInsExtFNeg(Instruction &I);
-  bool foldInsExtVectorToShuffle(Instruction &I);
-  bool foldBitcastShuffle(Instruction &I);
-  bool scalarizeBinopOrCmp(Instruction &I);
-  bool scalarizeVPIntrinsic(Instruction &I);
-  bool foldExtractedCmps(Instruction &I);
+  bool foldExtractExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtFNeg(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtVectorToShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldBitcastShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeBinopOrCmp(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeVPIntrinsic(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldExtractedCmps(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSingleElementStore(Instruction &I);
-  bool scalarizeLoadExtract(Instruction &I);
-  bool foldPermuteOfBinops(Instruction &I);
-  bool foldShuffleOfBinops(Instruction &I);
-  bool foldShuffleOfCastops(Instruction &I);
-  bool foldShuffleOfShuffles(Instruction &I);
-  bool foldShuffleOfIntrinsics(Instruction &I);
-  bool foldShuffleToIdentity(Instruction &I);
+  bool scalarizeLoadExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldPermuteOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfCastops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfShuffles(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfIntrinsics(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleToIdentity(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldShuffleFromReductions(Instruction &I);
-  bool foldCastFromReductions(Instruction &I);
+  bool foldCastFromReductions(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
-  bool shrinkType(Instruction &I);
+  bool shrinkType(Instruction &I, TTI::TargetCostKind CostKind);
 
   void replaceValue(Value &Old, Value &New) {
     Old.replaceAllUsesWith(&New);
@@ -172,7 +174,8 @@ static bool canWidenLoad(LoadInst *Load, const TargetTransformInfo &TTI) {
   return true;
 }
 
-bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
+bool VectorCombine::vectorizeLoadInsert(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   // Match insert into fixed vector of scalar value.
   // TODO: Handle non-zero insert index.
   Value *Scalar;
@@ -249,7 +252,6 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
   InstructionCost OldCost =
       TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
   APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   OldCost +=
       TTI.getScalarizationOverhead(MinVecTy, DemandedElts,
                                    /* Insert */ true, HasExtract, CostKind);
@@ -293,7 +295,7 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
 /// If we are loading a vector and then inserting it into a larger vector with
 /// undefined elements, try to load the larger vector and eliminate the insert.
 /// This removes a shuffle in IR and may allow combining of other loaded values.
-bool VectorCombine::widenSubvectorLoad(Instruction &I) {
+bool VectorCombine::widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind) {
   // Match subvector insert of fixed vector.
   auto *Shuf = cast<ShuffleVectorInst>(&I);
   if (!Shuf->isIdentityWithPadding())
@@ -329,11 +331,11 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
   // undef value is 0. We could add that cost if the cost model accurately
   // reflects the real cost of that operation.
   InstructionCost OldCost =
-      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS, CostKind);
 
   // New pattern: load PtrOp
   InstructionCost NewCost =
-      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS, CostKind);
 
   // We can aggressively convert to the vector form because the backend can
   // invert this transform if it does not result in a performance win.
@@ -353,6 +355,7 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
 /// followed by extract from a different index.
 ExtractElementInst *VectorCombine::getShuffleExtract(
     ExtractElementInst *Ext0, ExtractElementInst *Ext1,
+    TTI::TargetCostKind CostKind,
     unsigned PreferredExtractIndex = InvalidIndex) const {
   auto *Index0C = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Index1C = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -366,7 +369,6 @@ ExtractElementInst *VectorCombine::getShuffleExtract(
     return nullptr;
 
   Type *VecTy = Ext0->getVectorOperand()->getType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");
   InstructionCost Cost0 =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0);
@@ -405,6 +407,7 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
                                           ExtractElementInst *Ext1,
                                           const Instruction &I,
                                           ExtractElementInst *&ConvertToShuffle,
+                                          TTI::TargetCostKind CostKind,
                                           unsigned PreferredExtractIndex) {
   auto *Ext0IndexC = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Ext1IndexC = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -436,7 +439,6 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
   // both sequences.
   unsigned Ext0Index = Ext0IndexC->getZExtValue();
   unsigned Ext1Index = Ext1IndexC->getZExtValue();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
 
   InstructionCost Extract0Cost =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Ext0Index);
@@ -475,7 +477,8 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
               !Ext1->hasOneUse() * Extract1Cost;
   }
 
-  ConvertToShuffle = getShuffleExtract(Ext0, Ext1, PreferredExtractIndex);
+  ConvertToShuffle =
+      getShuffleExtract(Ext0, Ext1, CostKind, PreferredExtractIndex);
   if (ConvertToShuffle) {
     if (IsBinOp && DisableBinopExtractShuffle)
       return true;
@@ -589,7 +592,8 @@ void VectorCombine::foldExtExtBinop(ExtractElementInst *Ext0,
 }
 
 /// Match an instruction with extracted vector operands.
-bool VectorCombine::foldExtractExtract(Instruction &I) {
+bool VectorCombine::foldExtractExtract(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   // It is not safe to transform things like div, urem, etc. because we may
   // create undefined behavior when executing those on unknown vector elements.
   if (!isSafeToSpeculativelyExecute(&I))
@@ -621,7 +625,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
           m_InsertElt(m_Value(), m_Value(), m_ConstantInt(InsertIndex)));
 
   ExtractElementInst *ExtractToChange;
-  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, InsertIndex))
+  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, CostKind,
+                            InsertIndex))
     return false;
 
   if (ExtractToChange) {
@@ -648,7 +653,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
 
 /// Try to replace an extract + scalar fneg + insert with a vector fneg +
 /// shuffle.
-bool VectorCombine::foldInsExtFNeg(Instruction &I) {
+bool VectorCombine::foldInsExtFNeg(Instruction &I,
+                                   TTI::TargetCostKind CostKind) {
   // Match an insert (op (extract)) pattern.
   Value *DestVec;
   uint64_t Index;
@@ -683,7 +689,6 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   Mask[Index] = Index + NumElts;
 
   Type *ScalarTy = VecTy->getScalarType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Instruction::FNeg, ScalarTy) +
       TTI.getVectorInstrCost(I, VecTy, CostKind, Index);
@@ -712,7 +717,8 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
 /// If this is a bitcast of a shuffle, try to bitcast the source vector to the
 /// destination type followed by shuffle. This can enable further transforms by
 /// moving bitcasts or shuffles together.
-bool VectorCombine::foldBitcastShuffle(Instruction &I) {
+bool VectorCombine::foldBitcastShuffle(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> Mask;
   if (!match(&I, m_BitCast(m_OneUse(
@@ -772,21 +778,20 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
   unsigned NumOps = IsUnary ? 1 : 2;
 
   // The new shuffle must not cost more than the old shuffle.
-  TargetTransformInfo::TargetCostKind CK =
-      TargetTransformInfo::TCK_RecipThroughput;
   TargetTransformInfo::ShuffleKind SK =
       IsUnary ? TargetTransformInfo::SK_PermuteSingleSrc
               : TargetTransformInfo::SK_PermuteTwoSrc;
 
   InstructionCost DestCost =
-      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CK) +
+      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CostKind) +
       (NumOps * TTI.getCastInstrCost(Instruction::BitCast, NewShuffleTy, SrcTy,
                                      TargetTransformInfo::CastContextHint::None,
-                                     CK));
+                                     CostKind));
   InstructionCost SrcCost =
-      TTI.getShuffleCost(SK, SrcTy, Mask, CK) +
+      TTI.getShuffleCost(SK, SrcTy, Mask, CostKind) +
       TTI.getCastInstrCost(Instruction::BitCast, DestTy, OldShuffleTy,
-                           TargetTransformInfo::CastContextHint::None, CK);
+                           TargetTransformInfo::CastContextHint::None,
+                           CostKind);
   if (DestCost > SrcCost || !DestCost.isValid())
     return false;
 
@@ -802,7 +807,8 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
 /// VP Intrinsics whose vector operands are both splat values may be simplified
 /// into the scalar version of the operation and the result splatted. This
 /// can lead to scalarization down the line.
-bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
+bool VectorCombine::scalarizeVPIntrinsic(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   if (!isa<VPIntrinsic>(I))
     return false;
   VPIntrinsic &VPI = cast<VPIntrinsic>(I);
@@ -841,7 +847,6 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
   // Calculate cost of splatting both operands into vectors and the vector
   // intrinsic
   VectorType *VecTy = cast<VectorType>(VPI.getType());
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   SmallVector<int> Mask;
   if (auto *FVTy = dyn_cast<FixedVectorType>(VecTy))
     Mask.resize(FVTy->getNumElements(), 0);
@@ -923,7 +928,8 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
 
 /// Match a vector binop or compare instruction with at least one inserted
 /// scalar operand and convert to scalar binop/cmp followed by insertelement.
-bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
+bool VectorCombine::scalarizeBinopOrCmp(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
   Value *Ins0, *Ins1;
   if (!match(&I, m_BinOp(m_Value(Ins0), m_Value(Ins1))) &&
@@ -1003,7 +1009,6 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 
   // Get cost estimate for the insert element. This cost will factor into
   // both sequences.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost InsertCost = TTI.getVectorInstrCost(
       Instruction::InsertElement, VecTy, CostKind, Index);
   InstructionCost OldCost =
@@ -1052,7 +1057,8 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 /// Try to combine a scalar binop + 2 scalar compares of extracted elements of
 /// a vector into vector operations followed by extract. Note: The SLP pass
 /// may miss this pattern because of implementation problems.
-bool VectorCombine::foldExtractedCmps(Instruction &I) {
+bool VectorCombine::foldExtractedCmps(Instruction &I,
+                                      TTI::TargetCostKind CostKind) {
   auto *BI = dyn_cast<BinaryOperator>(&I);
 
   // We are looking for a scalar binop of booleans.
@@ -1080,7 +1086,7 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
 
   auto *Ext0 = cast<ExtractElementInst>(I0);
   auto *Ext1 = cast<ExtractElementInst>(I1);
-  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1);
+  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1, CostKind);
   if (!ConvertToShuf)
     return false;
   assert((ConvertToShuf == Ext0 || ConvertToShuf == Ext1) &&
@@ -1089,13 +1095,12 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
   // The original scalar pattern is:
   // binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
   CmpInst::Predicate Pred = P0;
-  unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp
-                                                    : Instruction::ICmp;
+  unsigned CmpOpcode =
+      CmpInst::isFPPredicate(Pred) ? Instruction::FCmp : Instruction::ICmp;
   auto *VecTy = dyn_cast<FixedVectorType>(X->getType());
   if (!VecTy)
     return false;
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost Ext0Cost =
                       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0),
                   Ext1Cost =
@@ -1331,7 +1336,8 @@ bool VectorCombine::foldSingleElementStore(Instruction &I) {
 }
 
 /// Try to scalarize vector loads feeding extractelement instructions.
-bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
+bool VectorCombine::scalarizeLoadExtract(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *Ptr;
   if (!match(&I, m_Load(m_Value(Ptr))))
     return false;
@@ -1386,7 +1392,6 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
     }
 
     auto *Index = dyn_cast<ConstantInt>(UI->getOperand(1));
-    TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     OriginalCost +=
         TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, CostKind,
                                Index ? Index->getZExtValue() : -1);
@@ -1428,7 +1433,8 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
 
 /// Try to convert "shuffle (binop (shuffle, shuffle)), undef"
 ///           -->  "binop (shuffle), (shuffle)".
-bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
+bool VectorCombine::foldPermuteOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *BinOp;
   ArrayRef<int> OuterMask;
   if (!match(&I,
@@ -1480,8 +1486,6 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
   }
 
   // Try to merge shuffles across the binop if the new shuffles are not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Opcode, BinOpTy, CostKind) +
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, BinOpTy,
@@ -1523,7 +1527,8 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
 }
 
 /// Try to convert "shuffle (binop), (binop)" into "binop (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
+bool VectorCombine::foldShuffleOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *B0, *B1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_BinOp(B0)), m_OneUse(m_BinOp(B1)),
@@ -1575,8 +1580,6 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
   }
 
   // Try to replace a binop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(B0->getOpcode(), BinOpTy, CostKind) +
       TTI.getArithmeticInstrCost(B1->getOpcode(), BinOpTy, CostKind) +
@@ -1612,7 +1615,8 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
 
 /// Try to convert "shuffle (castop), (castop)" with a shared castop operand
 /// into "castop (shuffle)".
-bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
+bool VectorCombine::foldShuffleOfCastops(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
@@ -1672,8 +1676,6 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
       FixedVectorType::get(CastSrcTy->getScalarType(), NewMask.size());
 
   // Try to replace a castop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
@@ -1717,7 +1719,8 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
 
 /// Try to convert "shuffle (shuffle x, undef), (shuffle y, undef)"
 /// into "shuffle x, y".
-bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
+bool VectorCombine::foldShuffleOfShuffles(Instruction &I,
+                                          TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   UndefValue *U0, *U1;
   ArrayRef<int> OuterMask, InnerMask0, InnerMask1;
@@ -1767,8 +1770,6 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
   }
 
   // Try to merge the shuffles if the new shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost InnerCost0 =
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, ShuffleSrcTy,
                          InnerMask0, CostKind, 0, nullptr, {V0, U0}, ShufI0);
@@ -1807,7 +1808,8 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
 
 /// Try to convert
 /// "shuffle (intrinsic), (intrinsic)" into "intrinsic (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I) {
+bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I,
+                                            TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_Value(V0)), m_OneUse(m_Value(V1)),
@@ -1837,12 +1839,10 @@ bool VectorCombine::foldShuffleOfIntrinsics(Instruction...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 4, 2024

@llvm/pr-subscribers-vectorizers

Author: Simon Pilgrim (RKSimon)

Changes

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).


Patch is 29.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118652.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+99-99)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index b9caf8c0df9be1..385b2d1e802a81 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -91,38 +91,40 @@ class VectorCombine {
   // TODO: Direct calls from the top-level "run" loop use a plain "Instruction"
   //       parameter. That should be updated to specific sub-classes because the
   //       run loop was changed to dispatch on opcode.
-  bool vectorizeLoadInsert(Instruction &I);
-  bool widenSubvectorLoad(Instruction &I);
+  bool vectorizeLoadInsert(Instruction &I, TTI::TargetCostKind CostKind);
+  bool widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind);
   ExtractElementInst *getShuffleExtract(ExtractElementInst *Ext0,
                                         ExtractElementInst *Ext1,
+                                        TTI::TargetCostKind CostKind,
                                         unsigned PreferredExtractIndex) const;
   bool isExtractExtractCheap(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                              const Instruction &I,
                              ExtractElementInst *&ConvertToShuffle,
+                             TTI::TargetCostKind CostKind,
                              unsigned PreferredExtractIndex);
   void foldExtExtCmp(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                      Instruction &I);
   void foldExtExtBinop(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                        Instruction &I);
-  bool foldExtractExtract(Instruction &I);
-  bool foldInsExtFNeg(Instruction &I);
-  bool foldInsExtVectorToShuffle(Instruction &I);
-  bool foldBitcastShuffle(Instruction &I);
-  bool scalarizeBinopOrCmp(Instruction &I);
-  bool scalarizeVPIntrinsic(Instruction &I);
-  bool foldExtractedCmps(Instruction &I);
+  bool foldExtractExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtFNeg(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtVectorToShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldBitcastShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeBinopOrCmp(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeVPIntrinsic(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldExtractedCmps(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSingleElementStore(Instruction &I);
-  bool scalarizeLoadExtract(Instruction &I);
-  bool foldPermuteOfBinops(Instruction &I);
-  bool foldShuffleOfBinops(Instruction &I);
-  bool foldShuffleOfCastops(Instruction &I);
-  bool foldShuffleOfShuffles(Instruction &I);
-  bool foldShuffleOfIntrinsics(Instruction &I);
-  bool foldShuffleToIdentity(Instruction &I);
+  bool scalarizeLoadExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldPermuteOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfCastops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfShuffles(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfIntrinsics(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleToIdentity(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldShuffleFromReductions(Instruction &I);
-  bool foldCastFromReductions(Instruction &I);
+  bool foldCastFromReductions(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
-  bool shrinkType(Instruction &I);
+  bool shrinkType(Instruction &I, TTI::TargetCostKind CostKind);
 
   void replaceValue(Value &Old, Value &New) {
     Old.replaceAllUsesWith(&New);
@@ -172,7 +174,8 @@ static bool canWidenLoad(LoadInst *Load, const TargetTransformInfo &TTI) {
   return true;
 }
 
-bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
+bool VectorCombine::vectorizeLoadInsert(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   // Match insert into fixed vector of scalar value.
   // TODO: Handle non-zero insert index.
   Value *Scalar;
@@ -249,7 +252,6 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
   InstructionCost OldCost =
       TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
   APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   OldCost +=
       TTI.getScalarizationOverhead(MinVecTy, DemandedElts,
                                    /* Insert */ true, HasExtract, CostKind);
@@ -293,7 +295,7 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
 /// If we are loading a vector and then inserting it into a larger vector with
 /// undefined elements, try to load the larger vector and eliminate the insert.
 /// This removes a shuffle in IR and may allow combining of other loaded values.
-bool VectorCombine::widenSubvectorLoad(Instruction &I) {
+bool VectorCombine::widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind) {
   // Match subvector insert of fixed vector.
   auto *Shuf = cast<ShuffleVectorInst>(&I);
   if (!Shuf->isIdentityWithPadding())
@@ -329,11 +331,11 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
   // undef value is 0. We could add that cost if the cost model accurately
   // reflects the real cost of that operation.
   InstructionCost OldCost =
-      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS, CostKind);
 
   // New pattern: load PtrOp
   InstructionCost NewCost =
-      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS, CostKind);
 
   // We can aggressively convert to the vector form because the backend can
   // invert this transform if it does not result in a performance win.
@@ -353,6 +355,7 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
 /// followed by extract from a different index.
 ExtractElementInst *VectorCombine::getShuffleExtract(
     ExtractElementInst *Ext0, ExtractElementInst *Ext1,
+    TTI::TargetCostKind CostKind,
     unsigned PreferredExtractIndex = InvalidIndex) const {
   auto *Index0C = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Index1C = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -366,7 +369,6 @@ ExtractElementInst *VectorCombine::getShuffleExtract(
     return nullptr;
 
   Type *VecTy = Ext0->getVectorOperand()->getType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");
   InstructionCost Cost0 =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0);
@@ -405,6 +407,7 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
                                           ExtractElementInst *Ext1,
                                           const Instruction &I,
                                           ExtractElementInst *&ConvertToShuffle,
+                                          TTI::TargetCostKind CostKind,
                                           unsigned PreferredExtractIndex) {
   auto *Ext0IndexC = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Ext1IndexC = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -436,7 +439,6 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
   // both sequences.
   unsigned Ext0Index = Ext0IndexC->getZExtValue();
   unsigned Ext1Index = Ext1IndexC->getZExtValue();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
 
   InstructionCost Extract0Cost =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Ext0Index);
@@ -475,7 +477,8 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
               !Ext1->hasOneUse() * Extract1Cost;
   }
 
-  ConvertToShuffle = getShuffleExtract(Ext0, Ext1, PreferredExtractIndex);
+  ConvertToShuffle =
+      getShuffleExtract(Ext0, Ext1, CostKind, PreferredExtractIndex);
   if (ConvertToShuffle) {
     if (IsBinOp && DisableBinopExtractShuffle)
       return true;
@@ -589,7 +592,8 @@ void VectorCombine::foldExtExtBinop(ExtractElementInst *Ext0,
 }
 
 /// Match an instruction with extracted vector operands.
-bool VectorCombine::foldExtractExtract(Instruction &I) {
+bool VectorCombine::foldExtractExtract(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   // It is not safe to transform things like div, urem, etc. because we may
   // create undefined behavior when executing those on unknown vector elements.
   if (!isSafeToSpeculativelyExecute(&I))
@@ -621,7 +625,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
           m_InsertElt(m_Value(), m_Value(), m_ConstantInt(InsertIndex)));
 
   ExtractElementInst *ExtractToChange;
-  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, InsertIndex))
+  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, CostKind,
+                            InsertIndex))
     return false;
 
   if (ExtractToChange) {
@@ -648,7 +653,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
 
 /// Try to replace an extract + scalar fneg + insert with a vector fneg +
 /// shuffle.
-bool VectorCombine::foldInsExtFNeg(Instruction &I) {
+bool VectorCombine::foldInsExtFNeg(Instruction &I,
+                                   TTI::TargetCostKind CostKind) {
   // Match an insert (op (extract)) pattern.
   Value *DestVec;
   uint64_t Index;
@@ -683,7 +689,6 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   Mask[Index] = Index + NumElts;
 
   Type *ScalarTy = VecTy->getScalarType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Instruction::FNeg, ScalarTy) +
       TTI.getVectorInstrCost(I, VecTy, CostKind, Index);
@@ -712,7 +717,8 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
 /// If this is a bitcast of a shuffle, try to bitcast the source vector to the
 /// destination type followed by shuffle. This can enable further transforms by
 /// moving bitcasts or shuffles together.
-bool VectorCombine::foldBitcastShuffle(Instruction &I) {
+bool VectorCombine::foldBitcastShuffle(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> Mask;
   if (!match(&I, m_BitCast(m_OneUse(
@@ -772,21 +778,20 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
   unsigned NumOps = IsUnary ? 1 : 2;
 
   // The new shuffle must not cost more than the old shuffle.
-  TargetTransformInfo::TargetCostKind CK =
-      TargetTransformInfo::TCK_RecipThroughput;
   TargetTransformInfo::ShuffleKind SK =
       IsUnary ? TargetTransformInfo::SK_PermuteSingleSrc
               : TargetTransformInfo::SK_PermuteTwoSrc;
 
   InstructionCost DestCost =
-      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CK) +
+      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CostKind) +
       (NumOps * TTI.getCastInstrCost(Instruction::BitCast, NewShuffleTy, SrcTy,
                                      TargetTransformInfo::CastContextHint::None,
-                                     CK));
+                                     CostKind));
   InstructionCost SrcCost =
-      TTI.getShuffleCost(SK, SrcTy, Mask, CK) +
+      TTI.getShuffleCost(SK, SrcTy, Mask, CostKind) +
       TTI.getCastInstrCost(Instruction::BitCast, DestTy, OldShuffleTy,
-                           TargetTransformInfo::CastContextHint::None, CK);
+                           TargetTransformInfo::CastContextHint::None,
+                           CostKind);
   if (DestCost > SrcCost || !DestCost.isValid())
     return false;
 
@@ -802,7 +807,8 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
 /// VP Intrinsics whose vector operands are both splat values may be simplified
 /// into the scalar version of the operation and the result splatted. This
 /// can lead to scalarization down the line.
-bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
+bool VectorCombine::scalarizeVPIntrinsic(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   if (!isa<VPIntrinsic>(I))
     return false;
   VPIntrinsic &VPI = cast<VPIntrinsic>(I);
@@ -841,7 +847,6 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
   // Calculate cost of splatting both operands into vectors and the vector
   // intrinsic
   VectorType *VecTy = cast<VectorType>(VPI.getType());
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   SmallVector<int> Mask;
   if (auto *FVTy = dyn_cast<FixedVectorType>(VecTy))
     Mask.resize(FVTy->getNumElements(), 0);
@@ -923,7 +928,8 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
 
 /// Match a vector binop or compare instruction with at least one inserted
 /// scalar operand and convert to scalar binop/cmp followed by insertelement.
-bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
+bool VectorCombine::scalarizeBinopOrCmp(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
   Value *Ins0, *Ins1;
   if (!match(&I, m_BinOp(m_Value(Ins0), m_Value(Ins1))) &&
@@ -1003,7 +1009,6 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 
   // Get cost estimate for the insert element. This cost will factor into
   // both sequences.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost InsertCost = TTI.getVectorInstrCost(
       Instruction::InsertElement, VecTy, CostKind, Index);
   InstructionCost OldCost =
@@ -1052,7 +1057,8 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 /// Try to combine a scalar binop + 2 scalar compares of extracted elements of
 /// a vector into vector operations followed by extract. Note: The SLP pass
 /// may miss this pattern because of implementation problems.
-bool VectorCombine::foldExtractedCmps(Instruction &I) {
+bool VectorCombine::foldExtractedCmps(Instruction &I,
+                                      TTI::TargetCostKind CostKind) {
   auto *BI = dyn_cast<BinaryOperator>(&I);
 
   // We are looking for a scalar binop of booleans.
@@ -1080,7 +1086,7 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
 
   auto *Ext0 = cast<ExtractElementInst>(I0);
   auto *Ext1 = cast<ExtractElementInst>(I1);
-  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1);
+  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1, CostKind);
   if (!ConvertToShuf)
     return false;
   assert((ConvertToShuf == Ext0 || ConvertToShuf == Ext1) &&
@@ -1089,13 +1095,12 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
   // The original scalar pattern is:
   // binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
   CmpInst::Predicate Pred = P0;
-  unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp
-                                                    : Instruction::ICmp;
+  unsigned CmpOpcode =
+      CmpInst::isFPPredicate(Pred) ? Instruction::FCmp : Instruction::ICmp;
   auto *VecTy = dyn_cast<FixedVectorType>(X->getType());
   if (!VecTy)
     return false;
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost Ext0Cost =
                       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0),
                   Ext1Cost =
@@ -1331,7 +1336,8 @@ bool VectorCombine::foldSingleElementStore(Instruction &I) {
 }
 
 /// Try to scalarize vector loads feeding extractelement instructions.
-bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
+bool VectorCombine::scalarizeLoadExtract(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *Ptr;
   if (!match(&I, m_Load(m_Value(Ptr))))
     return false;
@@ -1386,7 +1392,6 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
     }
 
     auto *Index = dyn_cast<ConstantInt>(UI->getOperand(1));
-    TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     OriginalCost +=
         TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, CostKind,
                                Index ? Index->getZExtValue() : -1);
@@ -1428,7 +1433,8 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
 
 /// Try to convert "shuffle (binop (shuffle, shuffle)), undef"
 ///           -->  "binop (shuffle), (shuffle)".
-bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
+bool VectorCombine::foldPermuteOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *BinOp;
   ArrayRef<int> OuterMask;
   if (!match(&I,
@@ -1480,8 +1486,6 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
   }
 
   // Try to merge shuffles across the binop if the new shuffles are not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Opcode, BinOpTy, CostKind) +
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, BinOpTy,
@@ -1523,7 +1527,8 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
 }
 
 /// Try to convert "shuffle (binop), (binop)" into "binop (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
+bool VectorCombine::foldShuffleOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *B0, *B1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_BinOp(B0)), m_OneUse(m_BinOp(B1)),
@@ -1575,8 +1580,6 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
   }
 
   // Try to replace a binop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(B0->getOpcode(), BinOpTy, CostKind) +
       TTI.getArithmeticInstrCost(B1->getOpcode(), BinOpTy, CostKind) +
@@ -1612,7 +1615,8 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
 
 /// Try to convert "shuffle (castop), (castop)" with a shared castop operand
 /// into "castop (shuffle)".
-bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
+bool VectorCombine::foldShuffleOfCastops(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
@@ -1672,8 +1676,6 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
       FixedVectorType::get(CastSrcTy->getScalarType(), NewMask.size());
 
   // Try to replace a castop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
@@ -1717,7 +1719,8 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
 
 /// Try to convert "shuffle (shuffle x, undef), (shuffle y, undef)"
 /// into "shuffle x, y".
-bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
+bool VectorCombine::foldShuffleOfShuffles(Instruction &I,
+                                          TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   UndefValue *U0, *U1;
   ArrayRef<int> OuterMask, InnerMask0, InnerMask1;
@@ -1767,8 +1770,6 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
   }
 
   // Try to merge the shuffles if the new shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost InnerCost0 =
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, ShuffleSrcTy,
                          InnerMask0, CostKind, 0, nullptr, {V0, U0}, ShufI0);
@@ -1807,7 +1808,8 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
 
 /// Try to convert
 /// "shuffle (intrinsic), (intrinsic)" into "intrinsic (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I) {
+bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I,
+                                            TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_Value(V0)), m_OneUse(m_Value(V1)),
@@ -1837,12 +1839,10 @@ bool VectorCombine::foldShuffleOfIntrinsics(Instruction...
[truncated]

Copy link

github-actions bot commented Dec 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

…t cost kind value

Some prep work to allow a future patch to potentially use VectorCombine to target code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).
@RKSimon RKSimon force-pushed the vector-combine-costkinds branch from 094e288 to ce7152a Compare December 4, 2024 15:25
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds OK to me. Did you consider making it a class member, to avoid all the extra args to the fold functions?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 9, 2024

Sounds OK to me. Did you consider making it a class member, to avoid all the extra args to the fold functions?

That would be easy enough to do - I wasn't sure if we needed the flexibility of a per-function call instead or not though.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@RKSimon RKSimon merged commit 7831c5e into llvm:main Dec 9, 2024
6 of 8 checks passed
@RKSimon RKSimon deleted the vector-combine-costkinds branch December 9, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants