[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

RKSimon · 2024-12-04T15:02:46Z

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).

llvmbot · 2024-12-04T15:03:24Z

@llvm/pr-subscribers-llvm-transforms

Author: Simon Pilgrim (RKSimon)

Changes

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).

Patch is 29.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118652.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+99-99)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index b9caf8c0df9be1..385b2d1e802a81 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -91,38 +91,40 @@ class VectorCombine {
   // TODO: Direct calls from the top-level "run" loop use a plain "Instruction"
   //       parameter. That should be updated to specific sub-classes because the
   //       run loop was changed to dispatch on opcode.
-  bool vectorizeLoadInsert(Instruction &I);
-  bool widenSubvectorLoad(Instruction &I);
+  bool vectorizeLoadInsert(Instruction &I, TTI::TargetCostKind CostKind);
+  bool widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind);
   ExtractElementInst *getShuffleExtract(ExtractElementInst *Ext0,
                                         ExtractElementInst *Ext1,
+                                        TTI::TargetCostKind CostKind,
                                         unsigned PreferredExtractIndex) const;
   bool isExtractExtractCheap(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                              const Instruction &I,
                              ExtractElementInst *&ConvertToShuffle,
+                             TTI::TargetCostKind CostKind,
                              unsigned PreferredExtractIndex);
   void foldExtExtCmp(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                      Instruction &I);
   void foldExtExtBinop(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                        Instruction &I);
-  bool foldExtractExtract(Instruction &I);
-  bool foldInsExtFNeg(Instruction &I);
-  bool foldInsExtVectorToShuffle(Instruction &I);
-  bool foldBitcastShuffle(Instruction &I);
-  bool scalarizeBinopOrCmp(Instruction &I);
-  bool scalarizeVPIntrinsic(Instruction &I);
-  bool foldExtractedCmps(Instruction &I);
+  bool foldExtractExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtFNeg(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtVectorToShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldBitcastShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeBinopOrCmp(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeVPIntrinsic(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldExtractedCmps(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSingleElementStore(Instruction &I);
-  bool scalarizeLoadExtract(Instruction &I);
-  bool foldPermuteOfBinops(Instruction &I);
-  bool foldShuffleOfBinops(Instruction &I);
-  bool foldShuffleOfCastops(Instruction &I);
-  bool foldShuffleOfShuffles(Instruction &I);
-  bool foldShuffleOfIntrinsics(Instruction &I);
-  bool foldShuffleToIdentity(Instruction &I);
+  bool scalarizeLoadExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldPermuteOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfCastops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfShuffles(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfIntrinsics(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleToIdentity(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldShuffleFromReductions(Instruction &I);
-  bool foldCastFromReductions(Instruction &I);
+  bool foldCastFromReductions(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
-  bool shrinkType(Instruction &I);
+  bool shrinkType(Instruction &I, TTI::TargetCostKind CostKind);
 
   void replaceValue(Value &Old, Value &New) {
     Old.replaceAllUsesWith(&New);
@@ -172,7 +174,8 @@ static bool canWidenLoad(LoadInst *Load, const TargetTransformInfo &TTI) {
   return true;
 }
 
-bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
+bool VectorCombine::vectorizeLoadInsert(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   // Match insert into fixed vector of scalar value.
   // TODO: Handle non-zero insert index.
   Value *Scalar;
@@ -249,7 +252,6 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
   InstructionCost OldCost =
       TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
   APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   OldCost +=
       TTI.getScalarizationOverhead(MinVecTy, DemandedElts,
                                    /* Insert */ true, HasExtract, CostKind);
@@ -293,7 +295,7 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
 /// If we are loading a vector and then inserting it into a larger vector with
 /// undefined elements, try to load the larger vector and eliminate the insert.
 /// This removes a shuffle in IR and may allow combining of other loaded values.
-bool VectorCombine::widenSubvectorLoad(Instruction &I) {
+bool VectorCombine::widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind) {
   // Match subvector insert of fixed vector.
   auto *Shuf = cast<ShuffleVectorInst>(&I);
   if (!Shuf->isIdentityWithPadding())
@@ -329,11 +331,11 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
   // undef value is 0. We could add that cost if the cost model accurately
   // reflects the real cost of that operation.
   InstructionCost OldCost =
-      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS, CostKind);
 
   // New pattern: load PtrOp
   InstructionCost NewCost =
-      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS, CostKind);
 
   // We can aggressively convert to the vector form because the backend can
   // invert this transform if it does not result in a performance win.
@@ -353,6 +355,7 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
 /// followed by extract from a different index.
 ExtractElementInst *VectorCombine::getShuffleExtract(
     ExtractElementInst *Ext0, ExtractElementInst *Ext1,
+    TTI::TargetCostKind CostKind,
     unsigned PreferredExtractIndex = InvalidIndex) const {
   auto *Index0C = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Index1C = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -366,7 +369,6 @@ ExtractElementInst *VectorCombine::getShuffleExtract(
     return nullptr;
 
   Type *VecTy = Ext0->getVectorOperand()->getType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");
   InstructionCost Cost0 =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0);
@@ -405,6 +407,7 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
                                           ExtractElementInst *Ext1,
                                           const Instruction &I,
                                           ExtractElementInst *&ConvertToShuffle,
+                                          TTI::TargetCostKind CostKind,
                                           unsigned PreferredExtractIndex) {
   auto *Ext0IndexC = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Ext1IndexC = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -436,7 +439,6 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
   // both sequences.
   unsigned Ext0Index = Ext0IndexC->getZExtValue();
   unsigned Ext1Index = Ext1IndexC->getZExtValue();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
 
   InstructionCost Extract0Cost =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Ext0Index);
@@ -475,7 +477,8 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
               !Ext1->hasOneUse() * Extract1Cost;
   }
 
-  ConvertToShuffle = getShuffleExtract(Ext0, Ext1, PreferredExtractIndex);
+  ConvertToShuffle =
+      getShuffleExtract(Ext0, Ext1, CostKind, PreferredExtractIndex);
   if (ConvertToShuffle) {
     if (IsBinOp && DisableBinopExtractShuffle)
       return true;
@@ -589,7 +592,8 @@ void VectorCombine::foldExtExtBinop(ExtractElementInst *Ext0,
 }
 
 /// Match an instruction with extracted vector operands.
-bool VectorCombine::foldExtractExtract(Instruction &I) {
+bool VectorCombine::foldExtractExtract(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   // It is not safe to transform things like div, urem, etc. because we may
   // create undefined behavior when executing those on unknown vector elements.
   if (!isSafeToSpeculativelyExecute(&I))
@@ -621,7 +625,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
           m_InsertElt(m_Value(), m_Value(), m_ConstantInt(InsertIndex)));
 
   ExtractElementInst *ExtractToChange;
-  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, InsertIndex))
+  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, CostKind,
+                            InsertIndex))
     return false;
 
   if (ExtractToChange) {
@@ -648,7 +653,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
 
 /// Try to replace an extract + scalar fneg + insert with a vector fneg +
 /// shuffle.
-bool VectorCombine::foldInsExtFNeg(Instruction &I) {
+bool VectorCombine::foldInsExtFNeg(Instruction &I,
+                                   TTI::TargetCostKind CostKind) {
   // Match an insert (op (extract)) pattern.
   Value *DestVec;
   uint64_t Index;
@@ -683,7 +689,6 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   Mask[Index] = Index + NumElts;
 
   Type *ScalarTy = VecTy->getScalarType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Instruction::FNeg, ScalarTy) +
       TTI.getVectorInstrCost(I, VecTy, CostKind, Index);
@@ -712,7 +717,8 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
 /// If this is a bitcast of a shuffle, try to bitcast the source vector to the
 /// destination type followed by shuffle. This can enable further transforms by
 /// moving bitcasts or shuffles together.
-bool VectorCombine::foldBitcastShuffle(Instruction &I) {
+bool VectorCombine::foldBitcastShuffle(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> Mask;
   if (!match(&I, m_BitCast(m_OneUse(
@@ -772,21 +778,20 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
   unsigned NumOps = IsUnary ? 1 : 2;
 
   // The new shuffle must not cost more than the old shuffle.
-  TargetTransformInfo::TargetCostKind CK =
-      TargetTransformInfo::TCK_RecipThroughput;
   TargetTransformInfo::ShuffleKind SK =
       IsUnary ? TargetTransformInfo::SK_PermuteSingleSrc
               : TargetTransformInfo::SK_PermuteTwoSrc;
 
   InstructionCost DestCost =
-      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CK) +
+      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CostKind) +
       (NumOps * TTI.getCastInstrCost(Instruction::BitCast, NewShuffleTy, SrcTy,
                                      TargetTransformInfo::CastContextHint::None,
-                                     CK));
+                                     CostKind));
   InstructionCost SrcCost =
-      TTI.getShuffleCost(SK, SrcTy, Mask, CK) +
+      TTI.getShuffleCost(SK, SrcTy, Mask, CostKind) +
       TTI.getCastInstrCost(Instruction::BitCast, DestTy, OldShuffleTy,
-                           TargetTransformInfo::CastContextHint::None, CK);
+                           TargetTransformInfo::CastContextHint::None,
+                           CostKind);
   if (DestCost > SrcCost || !DestCost.isValid())
     return false;
 
@@ -802,7 +807,8 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
 /// VP Intrinsics whose vector operands are both splat values may be simplified
 /// into the scalar version of the operation and the result splatted. This
 /// can lead to scalarization down the line.
-bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
+bool VectorCombine::scalarizeVPIntrinsic(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   if (!isa<VPIntrinsic>(I))
     return false;
   VPIntrinsic &VPI = cast<VPIntrinsic>(I);
@@ -841,7 +847,6 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
   // Calculate cost of splatting both operands into vectors and the vector
   // intrinsic
   VectorType *VecTy = cast<VectorType>(VPI.getType());
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   SmallVector<int> Mask;
   if (auto *FVTy = dyn_cast<FixedVectorType>(VecTy))
     Mask.resize(FVTy->getNumElements(), 0);
@@ -923,7 +928,8 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
 
 /// Match a vector binop or compare instruction with at least one inserted
 /// scalar operand and convert to scalar binop/cmp followed by insertelement.
-bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
+bool VectorCombine::scalarizeBinopOrCmp(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
   Value *Ins0, *Ins1;
   if (!match(&I, m_BinOp(m_Value(Ins0), m_Value(Ins1))) &&
@@ -1003,7 +1009,6 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 
   // Get cost estimate for the insert element. This cost will factor into
   // both sequences.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost InsertCost = TTI.getVectorInstrCost(
       Instruction::InsertElement, VecTy, CostKind, Index);
   InstructionCost OldCost =
@@ -1052,7 +1057,8 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 /// Try to combine a scalar binop + 2 scalar compares of extracted elements of
 /// a vector into vector operations followed by extract. Note: The SLP pass
 /// may miss this pattern because of implementation problems.
-bool VectorCombine::foldExtractedCmps(Instruction &I) {
+bool VectorCombine::foldExtractedCmps(Instruction &I,
+                                      TTI::TargetCostKind CostKind) {
   auto *BI = dyn_cast<BinaryOperator>(&I);
 
   // We are looking for a scalar binop of booleans.
@@ -1080,7 +1086,7 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
 
   auto *Ext0 = cast<ExtractElementInst>(I0);
   auto *Ext1 = cast<ExtractElementInst>(I1);
-  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1);
+  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1, CostKind);
   if (!ConvertToShuf)
     return false;
   assert((ConvertToShuf == Ext0 || ConvertToShuf == Ext1) &&
@@ -1089,13 +1095,12 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
   // The original scalar pattern is:
   // binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
   CmpInst::Predicate Pred = P0;
-  unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp
-                                                    : Instruction::ICmp;
+  unsigned CmpOpcode =
+      CmpInst::isFPPredicate(Pred) ? Instruction::FCmp : Instruction::ICmp;
   auto *VecTy = dyn_cast<FixedVectorType>(X->getType());
   if (!VecTy)
     return false;
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost Ext0Cost =
                       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0),
                   Ext1Cost =
@@ -1331,7 +1336,8 @@ bool VectorCombine::foldSingleElementStore(Instruction &I) {
 }
 
 /// Try to scalarize vector loads feeding extractelement instructions.
-bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
+bool VectorCombine::scalarizeLoadExtract(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *Ptr;
   if (!match(&I, m_Load(m_Value(Ptr))))
     return false;
@@ -1386,7 +1392,6 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
     }
 
     auto *Index = dyn_cast<ConstantInt>(UI->getOperand(1));
-    TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     OriginalCost +=
         TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, CostKind,
                                Index ? Index->getZExtValue() : -1);
@@ -1428,7 +1433,8 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
 
 /// Try to convert "shuffle (binop (shuffle, shuffle)), undef"
 ///           -->  "binop (shuffle), (shuffle)".
-bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
+bool VectorCombine::foldPermuteOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *BinOp;
   ArrayRef<int> OuterMask;
   if (!match(&I,
@@ -1480,8 +1486,6 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
   }
 
   // Try to merge shuffles across the binop if the new shuffles are not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Opcode, BinOpTy, CostKind) +
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, BinOpTy,
@@ -1523,7 +1527,8 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
 }
 
 /// Try to convert "shuffle (binop), (binop)" into "binop (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
+bool VectorCombine::foldShuffleOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *B0, *B1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_BinOp(B0)), m_OneUse(m_BinOp(B1)),
@@ -1575,8 +1580,6 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
   }
 
   // Try to replace a binop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(B0->getOpcode(), BinOpTy, CostKind) +
       TTI.getArithmeticInstrCost(B1->getOpcode(), BinOpTy, CostKind) +
@@ -1612,7 +1615,8 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
 
 /// Try to convert "shuffle (castop), (castop)" with a shared castop operand
 /// into "castop (shuffle)".
-bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
+bool VectorCombine::foldShuffleOfCastops(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
@@ -1672,8 +1676,6 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
       FixedVectorType::get(CastSrcTy->getScalarType(), NewMask.size());
 
   // Try to replace a castop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
@@ -1717,7 +1719,8 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
 
 /// Try to convert "shuffle (shuffle x, undef), (shuffle y, undef)"
 /// into "shuffle x, y".
-bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
+bool VectorCombine::foldShuffleOfShuffles(Instruction &I,
+                                          TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   UndefValue *U0, *U1;
   ArrayRef<int> OuterMask, InnerMask0, InnerMask1;
@@ -1767,8 +1770,6 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
   }
 
   // Try to merge the shuffles if the new shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost InnerCost0 =
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, ShuffleSrcTy,
                          InnerMask0, CostKind, 0, nullptr, {V0, U0}, ShufI0);
@@ -1807,7 +1808,8 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
 
 /// Try to convert
 /// "shuffle (intrinsic), (intrinsic)" into "intrinsic (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I) {
+bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I,
+                                            TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_Value(V0)), m_OneUse(m_Value(V1)),
@@ -1837,12 +1839,10 @@ bool VectorCombine::foldShuffleOfIntrinsics(Instruction...
[truncated]

llvmbot · 2024-12-04T15:03:25Z

@llvm/pr-subscribers-vectorizers

Author: Simon Pilgrim (RKSimon)

Changes

Don't use TCK_RecipThroughput independently in every VectorCombine fold.

Some prep work to allow a potential future patch to use VectorCombine to optimise for code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput).

There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).

Patch is 29.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118652.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+99-99)

diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index b9caf8c0df9be1..385b2d1e802a81 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -91,38 +91,40 @@ class VectorCombine {
   // TODO: Direct calls from the top-level "run" loop use a plain "Instruction"
   //       parameter. That should be updated to specific sub-classes because the
   //       run loop was changed to dispatch on opcode.
-  bool vectorizeLoadInsert(Instruction &I);
-  bool widenSubvectorLoad(Instruction &I);
+  bool vectorizeLoadInsert(Instruction &I, TTI::TargetCostKind CostKind);
+  bool widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind);
   ExtractElementInst *getShuffleExtract(ExtractElementInst *Ext0,
                                         ExtractElementInst *Ext1,
+                                        TTI::TargetCostKind CostKind,
                                         unsigned PreferredExtractIndex) const;
   bool isExtractExtractCheap(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                              const Instruction &I,
                              ExtractElementInst *&ConvertToShuffle,
+                             TTI::TargetCostKind CostKind,
                              unsigned PreferredExtractIndex);
   void foldExtExtCmp(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                      Instruction &I);
   void foldExtExtBinop(ExtractElementInst *Ext0, ExtractElementInst *Ext1,
                        Instruction &I);
-  bool foldExtractExtract(Instruction &I);
-  bool foldInsExtFNeg(Instruction &I);
-  bool foldInsExtVectorToShuffle(Instruction &I);
-  bool foldBitcastShuffle(Instruction &I);
-  bool scalarizeBinopOrCmp(Instruction &I);
-  bool scalarizeVPIntrinsic(Instruction &I);
-  bool foldExtractedCmps(Instruction &I);
+  bool foldExtractExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtFNeg(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldInsExtVectorToShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldBitcastShuffle(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeBinopOrCmp(Instruction &I, TTI::TargetCostKind CostKind);
+  bool scalarizeVPIntrinsic(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldExtractedCmps(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSingleElementStore(Instruction &I);
-  bool scalarizeLoadExtract(Instruction &I);
-  bool foldPermuteOfBinops(Instruction &I);
-  bool foldShuffleOfBinops(Instruction &I);
-  bool foldShuffleOfCastops(Instruction &I);
-  bool foldShuffleOfShuffles(Instruction &I);
-  bool foldShuffleOfIntrinsics(Instruction &I);
-  bool foldShuffleToIdentity(Instruction &I);
+  bool scalarizeLoadExtract(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldPermuteOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfBinops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfCastops(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfShuffles(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleOfIntrinsics(Instruction &I, TTI::TargetCostKind CostKind);
+  bool foldShuffleToIdentity(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldShuffleFromReductions(Instruction &I);
-  bool foldCastFromReductions(Instruction &I);
+  bool foldCastFromReductions(Instruction &I, TTI::TargetCostKind CostKind);
   bool foldSelectShuffle(Instruction &I, bool FromReduction = false);
-  bool shrinkType(Instruction &I);
+  bool shrinkType(Instruction &I, TTI::TargetCostKind CostKind);
 
   void replaceValue(Value &Old, Value &New) {
     Old.replaceAllUsesWith(&New);
@@ -172,7 +174,8 @@ static bool canWidenLoad(LoadInst *Load, const TargetTransformInfo &TTI) {
   return true;
 }
 
-bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
+bool VectorCombine::vectorizeLoadInsert(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   // Match insert into fixed vector of scalar value.
   // TODO: Handle non-zero insert index.
   Value *Scalar;
@@ -249,7 +252,6 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
   InstructionCost OldCost =
       TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
   APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   OldCost +=
       TTI.getScalarizationOverhead(MinVecTy, DemandedElts,
                                    /* Insert */ true, HasExtract, CostKind);
@@ -293,7 +295,7 @@ bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
 /// If we are loading a vector and then inserting it into a larger vector with
 /// undefined elements, try to load the larger vector and eliminate the insert.
 /// This removes a shuffle in IR and may allow combining of other loaded values.
-bool VectorCombine::widenSubvectorLoad(Instruction &I) {
+bool VectorCombine::widenSubvectorLoad(Instruction &I, TTI::TargetCostKind CostKind) {
   // Match subvector insert of fixed vector.
   auto *Shuf = cast<ShuffleVectorInst>(&I);
   if (!Shuf->isIdentityWithPadding())
@@ -329,11 +331,11 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
   // undef value is 0. We could add that cost if the cost model accurately
   // reflects the real cost of that operation.
   InstructionCost OldCost =
-      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS, CostKind);
 
   // New pattern: load PtrOp
   InstructionCost NewCost =
-      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS);
+      TTI.getMemoryOpCost(Instruction::Load, Ty, Alignment, AS, CostKind);
 
   // We can aggressively convert to the vector form because the backend can
   // invert this transform if it does not result in a performance win.
@@ -353,6 +355,7 @@ bool VectorCombine::widenSubvectorLoad(Instruction &I) {
 /// followed by extract from a different index.
 ExtractElementInst *VectorCombine::getShuffleExtract(
     ExtractElementInst *Ext0, ExtractElementInst *Ext1,
+    TTI::TargetCostKind CostKind,
     unsigned PreferredExtractIndex = InvalidIndex) const {
   auto *Index0C = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Index1C = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -366,7 +369,6 @@ ExtractElementInst *VectorCombine::getShuffleExtract(
     return nullptr;
 
   Type *VecTy = Ext0->getVectorOperand()->getType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");
   InstructionCost Cost0 =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0);
@@ -405,6 +407,7 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
                                           ExtractElementInst *Ext1,
                                           const Instruction &I,
                                           ExtractElementInst *&ConvertToShuffle,
+                                          TTI::TargetCostKind CostKind,
                                           unsigned PreferredExtractIndex) {
   auto *Ext0IndexC = dyn_cast<ConstantInt>(Ext0->getIndexOperand());
   auto *Ext1IndexC = dyn_cast<ConstantInt>(Ext1->getIndexOperand());
@@ -436,7 +439,6 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
   // both sequences.
   unsigned Ext0Index = Ext0IndexC->getZExtValue();
   unsigned Ext1Index = Ext1IndexC->getZExtValue();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
 
   InstructionCost Extract0Cost =
       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Ext0Index);
@@ -475,7 +477,8 @@ bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
               !Ext1->hasOneUse() * Extract1Cost;
   }
 
-  ConvertToShuffle = getShuffleExtract(Ext0, Ext1, PreferredExtractIndex);
+  ConvertToShuffle =
+      getShuffleExtract(Ext0, Ext1, CostKind, PreferredExtractIndex);
   if (ConvertToShuffle) {
     if (IsBinOp && DisableBinopExtractShuffle)
       return true;
@@ -589,7 +592,8 @@ void VectorCombine::foldExtExtBinop(ExtractElementInst *Ext0,
 }
 
 /// Match an instruction with extracted vector operands.
-bool VectorCombine::foldExtractExtract(Instruction &I) {
+bool VectorCombine::foldExtractExtract(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   // It is not safe to transform things like div, urem, etc. because we may
   // create undefined behavior when executing those on unknown vector elements.
   if (!isSafeToSpeculativelyExecute(&I))
@@ -621,7 +625,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
           m_InsertElt(m_Value(), m_Value(), m_ConstantInt(InsertIndex)));
 
   ExtractElementInst *ExtractToChange;
-  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, InsertIndex))
+  if (isExtractExtractCheap(Ext0, Ext1, I, ExtractToChange, CostKind,
+                            InsertIndex))
     return false;
 
   if (ExtractToChange) {
@@ -648,7 +653,8 @@ bool VectorCombine::foldExtractExtract(Instruction &I) {
 
 /// Try to replace an extract + scalar fneg + insert with a vector fneg +
 /// shuffle.
-bool VectorCombine::foldInsExtFNeg(Instruction &I) {
+bool VectorCombine::foldInsExtFNeg(Instruction &I,
+                                   TTI::TargetCostKind CostKind) {
   // Match an insert (op (extract)) pattern.
   Value *DestVec;
   uint64_t Index;
@@ -683,7 +689,6 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
   Mask[Index] = Index + NumElts;
 
   Type *ScalarTy = VecTy->getScalarType();
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Instruction::FNeg, ScalarTy) +
       TTI.getVectorInstrCost(I, VecTy, CostKind, Index);
@@ -712,7 +717,8 @@ bool VectorCombine::foldInsExtFNeg(Instruction &I) {
 /// If this is a bitcast of a shuffle, try to bitcast the source vector to the
 /// destination type followed by shuffle. This can enable further transforms by
 /// moving bitcasts or shuffles together.
-bool VectorCombine::foldBitcastShuffle(Instruction &I) {
+bool VectorCombine::foldBitcastShuffle(Instruction &I,
+                                       TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> Mask;
   if (!match(&I, m_BitCast(m_OneUse(
@@ -772,21 +778,20 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
   unsigned NumOps = IsUnary ? 1 : 2;
 
   // The new shuffle must not cost more than the old shuffle.
-  TargetTransformInfo::TargetCostKind CK =
-      TargetTransformInfo::TCK_RecipThroughput;
   TargetTransformInfo::ShuffleKind SK =
       IsUnary ? TargetTransformInfo::SK_PermuteSingleSrc
               : TargetTransformInfo::SK_PermuteTwoSrc;
 
   InstructionCost DestCost =
-      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CK) +
+      TTI.getShuffleCost(SK, NewShuffleTy, NewMask, CostKind) +
       (NumOps * TTI.getCastInstrCost(Instruction::BitCast, NewShuffleTy, SrcTy,
                                      TargetTransformInfo::CastContextHint::None,
-                                     CK));
+                                     CostKind));
   InstructionCost SrcCost =
-      TTI.getShuffleCost(SK, SrcTy, Mask, CK) +
+      TTI.getShuffleCost(SK, SrcTy, Mask, CostKind) +
       TTI.getCastInstrCost(Instruction::BitCast, DestTy, OldShuffleTy,
-                           TargetTransformInfo::CastContextHint::None, CK);
+                           TargetTransformInfo::CastContextHint::None,
+                           CostKind);
   if (DestCost > SrcCost || !DestCost.isValid())
     return false;
 
@@ -802,7 +807,8 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
 /// VP Intrinsics whose vector operands are both splat values may be simplified
 /// into the scalar version of the operation and the result splatted. This
 /// can lead to scalarization down the line.
-bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
+bool VectorCombine::scalarizeVPIntrinsic(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   if (!isa<VPIntrinsic>(I))
     return false;
   VPIntrinsic &VPI = cast<VPIntrinsic>(I);
@@ -841,7 +847,6 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
   // Calculate cost of splatting both operands into vectors and the vector
   // intrinsic
   VectorType *VecTy = cast<VectorType>(VPI.getType());
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   SmallVector<int> Mask;
   if (auto *FVTy = dyn_cast<FixedVectorType>(VecTy))
     Mask.resize(FVTy->getNumElements(), 0);
@@ -923,7 +928,8 @@ bool VectorCombine::scalarizeVPIntrinsic(Instruction &I) {
 
 /// Match a vector binop or compare instruction with at least one inserted
 /// scalar operand and convert to scalar binop/cmp followed by insertelement.
-bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
+bool VectorCombine::scalarizeBinopOrCmp(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
   Value *Ins0, *Ins1;
   if (!match(&I, m_BinOp(m_Value(Ins0), m_Value(Ins1))) &&
@@ -1003,7 +1009,6 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 
   // Get cost estimate for the insert element. This cost will factor into
   // both sequences.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost InsertCost = TTI.getVectorInstrCost(
       Instruction::InsertElement, VecTy, CostKind, Index);
   InstructionCost OldCost =
@@ -1052,7 +1057,8 @@ bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
 /// Try to combine a scalar binop + 2 scalar compares of extracted elements of
 /// a vector into vector operations followed by extract. Note: The SLP pass
 /// may miss this pattern because of implementation problems.
-bool VectorCombine::foldExtractedCmps(Instruction &I) {
+bool VectorCombine::foldExtractedCmps(Instruction &I,
+                                      TTI::TargetCostKind CostKind) {
   auto *BI = dyn_cast<BinaryOperator>(&I);
 
   // We are looking for a scalar binop of booleans.
@@ -1080,7 +1086,7 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
 
   auto *Ext0 = cast<ExtractElementInst>(I0);
   auto *Ext1 = cast<ExtractElementInst>(I1);
-  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1);
+  ExtractElementInst *ConvertToShuf = getShuffleExtract(Ext0, Ext1, CostKind);
   if (!ConvertToShuf)
     return false;
   assert((ConvertToShuf == Ext0 || ConvertToShuf == Ext1) &&
@@ -1089,13 +1095,12 @@ bool VectorCombine::foldExtractedCmps(Instruction &I) {
   // The original scalar pattern is:
   // binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
   CmpInst::Predicate Pred = P0;
-  unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp
-                                                    : Instruction::ICmp;
+  unsigned CmpOpcode =
+      CmpInst::isFPPredicate(Pred) ? Instruction::FCmp : Instruction::ICmp;
   auto *VecTy = dyn_cast<FixedVectorType>(X->getType());
   if (!VecTy)
     return false;
 
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
   InstructionCost Ext0Cost =
                       TTI.getVectorInstrCost(*Ext0, VecTy, CostKind, Index0),
                   Ext1Cost =
@@ -1331,7 +1336,8 @@ bool VectorCombine::foldSingleElementStore(Instruction &I) {
 }
 
 /// Try to scalarize vector loads feeding extractelement instructions.
-bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
+bool VectorCombine::scalarizeLoadExtract(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *Ptr;
   if (!match(&I, m_Load(m_Value(Ptr))))
     return false;
@@ -1386,7 +1392,6 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
     }
 
     auto *Index = dyn_cast<ConstantInt>(UI->getOperand(1));
-    TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     OriginalCost +=
         TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, CostKind,
                                Index ? Index->getZExtValue() : -1);
@@ -1428,7 +1433,8 @@ bool VectorCombine::scalarizeLoadExtract(Instruction &I) {
 
 /// Try to convert "shuffle (binop (shuffle, shuffle)), undef"
 ///           -->  "binop (shuffle), (shuffle)".
-bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
+bool VectorCombine::foldPermuteOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *BinOp;
   ArrayRef<int> OuterMask;
   if (!match(&I,
@@ -1480,8 +1486,6 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
   }
 
   // Try to merge shuffles across the binop if the new shuffles are not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(Opcode, BinOpTy, CostKind) +
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, BinOpTy,
@@ -1523,7 +1527,8 @@ bool VectorCombine::foldPermuteOfBinops(Instruction &I) {
 }
 
 /// Try to convert "shuffle (binop), (binop)" into "binop (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
+bool VectorCombine::foldShuffleOfBinops(Instruction &I,
+                                        TTI::TargetCostKind CostKind) {
   BinaryOperator *B0, *B1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_BinOp(B0)), m_OneUse(m_BinOp(B1)),
@@ -1575,8 +1580,6 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
   }
 
   // Try to replace a binop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost OldCost =
       TTI.getArithmeticInstrCost(B0->getOpcode(), BinOpTy, CostKind) +
       TTI.getArithmeticInstrCost(B1->getOpcode(), BinOpTy, CostKind) +
@@ -1612,7 +1615,8 @@ bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
 
 /// Try to convert "shuffle (castop), (castop)" with a shared castop operand
 /// into "castop (shuffle)".
-bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
+bool VectorCombine::foldShuffleOfCastops(Instruction &I,
+                                         TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_Value(V0), m_Value(V1), m_Mask(OldMask))))
@@ -1672,8 +1676,6 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
       FixedVectorType::get(CastSrcTy->getScalarType(), NewMask.size());
 
   // Try to replace a castop with a shuffle if the shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost CostC0 =
       TTI.getCastInstrCost(C0->getOpcode(), CastDstTy, CastSrcTy,
                            TTI::CastContextHint::None, CostKind);
@@ -1717,7 +1719,8 @@ bool VectorCombine::foldShuffleOfCastops(Instruction &I) {
 
 /// Try to convert "shuffle (shuffle x, undef), (shuffle y, undef)"
 /// into "shuffle x, y".
-bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
+bool VectorCombine::foldShuffleOfShuffles(Instruction &I,
+                                          TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   UndefValue *U0, *U1;
   ArrayRef<int> OuterMask, InnerMask0, InnerMask1;
@@ -1767,8 +1770,6 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
   }
 
   // Try to merge the shuffles if the new shuffle is not costly.
-  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
-
   InstructionCost InnerCost0 =
       TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, ShuffleSrcTy,
                          InnerMask0, CostKind, 0, nullptr, {V0, U0}, ShufI0);
@@ -1807,7 +1808,8 @@ bool VectorCombine::foldShuffleOfShuffles(Instruction &I) {
 
 /// Try to convert
 /// "shuffle (intrinsic), (intrinsic)" into "intrinsic (shuffle), (shuffle)".
-bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I) {
+bool VectorCombine::foldShuffleOfIntrinsics(Instruction &I,
+                                            TTI::TargetCostKind CostKind) {
   Value *V0, *V1;
   ArrayRef<int> OldMask;
   if (!match(&I, m_Shuffle(m_OneUse(m_Value(V0)), m_OneUse(m_Value(V1)),
@@ -1837,12 +1839,10 @@ bool VectorCombine::foldShuffleOfIntrinsics(Instruction...
[truncated]

github-actions · 2024-12-04T15:06:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…t cost kind value Some prep work to allow a future patch to potentially use VectorCombine to target code size for -Os/Oz builds (setting TCK_CodeSize instead of TCK_RecipThroughput). There's still more cleanup to do as a lot of get*Cost calls are relying on the default TargetCostKind value (usually TCK_RecipThroughput but not always).

davemgreen

Sounds OK to me. Did you consider making it a class member, to avoid all the extra args to the fold functions?

RKSimon · 2024-12-09T09:42:19Z

Sounds OK to me. Did you consider making it a class member, to avoid all the extra args to the fold functions?

That would be easy enough to do - I wasn't sure if we needed the flexibility of a per-function call instead or not though.

…class member

davemgreen

Thanks. LGTM

RKSimon requested a review from davemgreen December 4, 2024 15:02

llvmbot added vectorizers llvm:transforms labels Dec 4, 2024

RKSimon force-pushed the vector-combine-costkinds branch from 094e288 to ce7152a Compare December 4, 2024 15:25

davemgreen requested a review from alexey-bataev December 9, 2024 08:30

davemgreen reviewed Dec 9, 2024

View reviewed changes

[VectorCombine] Replace TargetCostKind function argument with common …

ee80ee8

…class member

davemgreen approved these changes Dec 9, 2024

View reviewed changes

RKSimon merged commit 7831c5e into llvm:main Dec 9, 2024
6 of 8 checks passed

RKSimon deleted the vector-combine-costkinds branch December 9, 2024 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

Uh oh!

RKSimon commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

davemgreen left a comment

Uh oh!

RKSimon commented Dec 9, 2024

Uh oh!

davemgreen left a comment

Uh oh!

Uh oh!

Uh oh!

[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

[VectorCombine] Pull out TargetCostKind argument to allow globally set cost kind value #118652

Uh oh!

Conversation

RKSimon commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon commented Dec 9, 2024

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading