Skip to content

[VPlan] Add opcode to create step for wide inductions. #119284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 14, 2025
Merged

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Dec 9, 2024

This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 3 operands

  • the vector step
  • the scale of the vector step

The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale.

This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to
#108378 with an increment of the wide IV step.

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch adds a WideIVStep opcode that can be used to create a vector with the steps to increment a wide induction. The opcode has 3 operands

  • the vector step
  • the scale of the vector step
  • a constant indicating the target type of the VPInstruction (this is working around having explicit types for VPInstructions, we could also introduce a dedicated recipe, at the cost of a lot more scaffolding)

The opcode is later converted into a sequence of recipes that convert the scale and step to the target type, if needed, and then multiply vector step by scale.

This simplifies code that needs to materialize step vectors, e.g. replacing wide IVs as follow up to
#108378 with an increment of the wide IV step.


Full diff: https://github.com/llvm/llvm-project/pull/119284.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+52-11)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7-25)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e1d828f038f9a2..d3cd1a9b128048 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1220,6 +1220,7 @@ class VPInstruction : public VPRecipeWithIRFlags,
     CalculateTripCountMinusVF,
     // Increment the canonical IV separately for each unrolled part.
     CanonicalIVIncrementForPart,
+    WideIVStep,
     BranchOnCount,
     BranchOnCond,
     ComputeReductionResult,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5903ad29af7602..8bf9b5194932b4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -661,7 +661,8 @@ bool VPInstruction::isFPMathOp() const {
   return Opcode == Instruction::FAdd || Opcode == Instruction::FMul ||
          Opcode == Instruction::FNeg || Opcode == Instruction::FSub ||
          Opcode == Instruction::FDiv || Opcode == Instruction::FRem ||
-         Opcode == Instruction::FCmp || Opcode == Instruction::Select;
+         Opcode == Instruction::FCmp || Opcode == Instruction::Select ||
+         Opcode == VPInstruction::WideIVStep;
 }
 #endif
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 6d77173735c9b8..41ed8b65b00fd6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1821,20 +1821,61 @@ void VPlanTransforms::createInterleaveGroups(
 }
 
 void VPlanTransforms::convertToConcreteRecipes(VPlan &Plan) {
+  Type *CanonicalIVType = Plan.getCanonicalIV()->getScalarType();
+  VPTypeAnalysis TypeInfo(CanonicalIVType);
+
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_deep(Plan.getEntry()))) {
-    for (VPRecipeBase &R : make_early_inc_range(VPBB->phis())) {
-      if (!isa<VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe>(&R))
+    for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+      if (isa<VPCanonicalIVPHIRecipe, VPEVLBasedIVPHIRecipe>(&R)) {
+        auto *PhiR = cast<VPHeaderPHIRecipe>(&R);
+        StringRef Name =
+            isa<VPCanonicalIVPHIRecipe>(PhiR) ? "index" : "evl.based.iv";
+        auto *ScalarR = new VPScalarPHIRecipe(PhiR->getStartValue(),
+                                              PhiR->getBackedgeValue(),
+                                              PhiR->getDebugLoc(), Name);
+        ScalarR->insertBefore(PhiR);
+        PhiR->replaceAllUsesWith(ScalarR);
+        PhiR->eraseFromParent();
         continue;
-      auto *PhiR = cast<VPHeaderPHIRecipe>(&R);
-      StringRef Name =
-          isa<VPCanonicalIVPHIRecipe>(PhiR) ? "index" : "evl.based.iv";
-      auto *ScalarR =
-          new VPScalarPHIRecipe(PhiR->getStartValue(), PhiR->getBackedgeValue(),
-                                PhiR->getDebugLoc(), Name);
-      ScalarR->insertBefore(PhiR);
-      PhiR->replaceAllUsesWith(ScalarR);
-      PhiR->eraseFromParent();
+      }
+
+      auto *VPI = dyn_cast<VPInstruction>(&R);
+      if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
+        VPBuilder Builder(VPI->getParent(), VPI->getIterator());
+        VPValue *VectorStep = VPI->getOperand(0);
+        Type *IVTy = TypeInfo.inferScalarType(VPI->getOperand(2));
+        if (TypeInfo.inferScalarType(VectorStep) != IVTy) {
+          Instruction::CastOps CastOp = IVTy->isFloatingPointTy()
+                                            ? Instruction::UIToFP
+                                            : Instruction::Trunc;
+          VectorStep = Builder.createWidenCast(CastOp, VectorStep, IVTy);
+        }
+
+        VPValue *ScalarStep = VPI->getOperand(1);
+        auto *ConstStep =
+            ScalarStep->isLiveIn()
+                ? dyn_cast<ConstantInt>(ScalarStep->getLiveInIRValue())
+                : nullptr;
+        if (!ConstStep || ConstStep->getValue() != 1) {
+          if (TypeInfo.inferScalarType(ScalarStep) != IVTy) {
+            ScalarStep =
+                Builder.createWidenCast(Instruction::Trunc, ScalarStep, IVTy);
+          }
+
+          std::optional<FastMathFlags> FMFs;
+          if (IVTy->isFloatingPointTy())
+            FMFs = VPI->getFastMathFlags();
+
+          unsigned MulOpc =
+              IVTy->isFloatingPointTy() ? Instruction::FMul : Instruction::Mul;
+          VPInstruction *Mul = Builder.createNaryOp(
+              MulOpc, {VectorStep, ScalarStep}, FMFs, R.getDebugLoc());
+          VectorStep = Mul;
+        }
+        VPI->replaceAllUsesWith(VectorStep);
+        VPI->eraseFromParent();
+      }
     }
   }
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index ff6c9295ee2057..7c1bb98c1a021f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -155,33 +155,15 @@ void UnrollState::unrollWidenInductionByUF(
   if (isa_and_present<FPMathOperator>(ID.getInductionBinOp()))
     FMFs = ID.getInductionBinOp()->getFastMathFlags();
 
-  VPValue *VectorStep = &Plan.getVF();
-  VPBuilder Builder(PH);
-  if (TypeInfo.inferScalarType(VectorStep) != IVTy) {
-    Instruction::CastOps CastOp =
-        IVTy->isFloatingPointTy() ? Instruction::UIToFP : Instruction::Trunc;
-    VectorStep = Builder.createWidenCast(CastOp, VectorStep, IVTy);
-    ToSkip.insert(VectorStep->getDefiningRecipe());
-  }
-
   VPValue *ScalarStep = IV->getStepValue();
-  auto *ConstStep = ScalarStep->isLiveIn()
-                        ? dyn_cast<ConstantInt>(ScalarStep->getLiveInIRValue())
-                        : nullptr;
-  if (!ConstStep || ConstStep->getValue() != 1) {
-    if (TypeInfo.inferScalarType(ScalarStep) != IVTy) {
-      ScalarStep =
-          Builder.createWidenCast(Instruction::Trunc, ScalarStep, IVTy);
-      ToSkip.insert(ScalarStep->getDefiningRecipe());
-    }
+  VPBuilder Builder(PH);
+  VPInstruction *VectorStep =
+      Builder.createNaryOp(VPInstruction::WideIVStep,
+                           {&Plan.getVF(), ScalarStep,
+                            Plan.getOrAddLiveIn(Constant::getNullValue(IVTy))},
+                           FMFs, IV->getDebugLoc());
 
-    unsigned MulOpc =
-        IVTy->isFloatingPointTy() ? Instruction::FMul : Instruction::Mul;
-    VPInstruction *Mul = Builder.createNaryOp(MulOpc, {VectorStep, ScalarStep},
-                                              FMFs, IV->getDebugLoc());
-    VectorStep = Mul;
-    ToSkip.insert(Mul);
-  }
+  ToSkip.insert(VectorStep);
 
   // Now create recipes to compute the induction steps for part 1 .. UF. Part 0
   // remains the header phi. Parts > 0 are computed by adding Step to the

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return true for WideIVStep in VPInstruction::onlyFirstLaneUsed if both inputs are scalar?

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I was able to use this in #118638 but it looks like there's difference with how this widens the VF + step before multiplying, as opposed to after e.g.:

--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -20,9 +20,9 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 8 x i64> @llvm.stepvector.nxv8i64()
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul <vscale x 8 x i64> [[TMP8]], splat (i64 1)
 ; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP7]]
-; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 1, [[TMP6]]
-; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP12]], i64 0
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[TMP6]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[DOTSPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP9:%.*]] = mul <vscale x 8 x i64> [[DOTSPLAT]], splat (i64 1)
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]

Does this still work if the createWidenCasts are replaced with createScalarCast and then splatted afterwards?

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests or make it NFC

Comment on lines 1843 to 1844
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
if (auto *VPI = dyn_cast<VPInstruction>(&R); VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
auto *VPI = dyn_cast<VPInstruction>(&R);
if (!VPI || VPI->getOpcode()!= VPInstruction::WideIVStep)
continue;

sinking the original early continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use pattern matching and have early continue, thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return true for WideIVStep in VPInstruction::onlyFirstLaneUsed if both inputs are scalar?

I haven't checked if there are any such cases, are there by the uses in the follow-up PR?

@lukel97
Copy link
Contributor

lukel97 commented Dec 17, 2024

Do we need to return true for WideIVStep in VPInstruction::onlyFirstLaneUsed if both inputs are scalar?

I haven't checked if there are any such cases, are there by the uses in the follow-up PR?

In the follow-up it truncates the Step itself with a VPScalarCastRecipe, and when the scalar cast was used by WideIVStep it caused this assertion to trigger:

Value *VPScalarCastRecipe ::generate(VPTransformState &State) {
assert(vputils::onlyFirstLaneUsed(this) &&
"Codegen only implemented for first lane.");
switch (Opcode) {

I guess WidenIntOrFpInduction does its own truncation of the step because it wants to reuse the truncation when generating the widened start value too, not just the widened step.

But I was able to work around it by just passing the non-truncated step to WideIVStep, and that seemed to work fine. So not a blocking comment!

Here's the commit where I plugged it in for reference: lukel97@6f85b48

ScalarStep->isLiveIn()
? dyn_cast<ConstantInt>(ScalarStep->getLiveInIRValue())
: nullptr;
if (!ConstStep || ConstStep->getValue() != 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this redundant case be folded away by some simplifyRecipes before reaching convertToConcrete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to do so, this requires running simplification after unrolling (WideIVStep is introduced there) which can also help remove redundant SCALAR-STEPS and possibly others, put up #123655 to do that separately. (And included it in this PR for now)

Comment on lines 1872 to 1873
VPInstruction *Mul = Builder.createNaryOp(
MulOpc, {VectorStep, ScalarStep}, FMFs, R.getDebugLoc());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recipe essentially represents a multiplication producing a product of desired type, with potential type conversion or casts of its operands introduced later when lowered, including UIToFP and truncation only.

What are the assumptions about supported operand and result types, e.g., can both operands be FPs but desired product be integer.

Is this folding of casts into "generalized" operations desired in general, beyond multiplication? In the long-run should VPlan represent casts consistently, either (all) explicit or (all) delayed, at suitable VPlan transformation times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment VPWidenIntOrFpInductionRecipe already takes care of the casts early on and there are other cases where modeling the casts implicitly early on helps to allow more accurate cost-modeling (.e.g #113903)

Comment on lines 1862 to 1863
ScalarStep =
Builder.createWidenCast(Instruction::Trunc, ScalarStep, IVTy);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should UIToFP be considered here too instead of Trunc if IVTy is float, or is ScalarStep expected to be float whenever IVTy is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step type can only be float if the induction itself is float

Type *IVTy = TypeInfo.inferScalarType(VPI->getOperand(2));
if (TypeInfo.inferScalarType(VectorStep) != IVTy) {
Instruction::CastOps CastOp = IVTy->isFloatingPointTy()
? Instruction::UIToFP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorStep assumed unsigned, if integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but trunc should preserve the sign

Comment on lines 1843 to 1844
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VPI = dyn_cast<VPInstruction>(&R);
if (VPI && VPI->getOpcode() == VPInstruction::WideIVStep) {
auto *VPI = dyn_cast<VPInstruction>(&R);
if (!VPI || VPI->getOpcode()!= VPInstruction::WideIVStep)
continue;

sinking the original early continue?

if (TypeInfo.inferScalarType(VectorStep) != IVTy) {
Instruction::CastOps CastOp = IVTy->isFloatingPointTy()
? Instruction::UIToFP
: Instruction::Trunc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorStep assumed integer, if IVTy is? I.e., FPToUI not considered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 20, 2025
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part
where only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
llvm#119284.
fhahn added 3 commits January 21, 2025 11:13
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 3 operands
* the vector step
* the scale of the vector step
* a constant indicating the target type of the VPInstruction (this is
  working around having explicit types for VPInstructions, we could also
  introduce a dedicated recipe, at the cost of a lot more scaffolding)

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
llvm#108378 with an increment of
the wide IV step.
Copy link

github-actions bot commented Jan 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn
Copy link
Contributor Author

fhahn commented Feb 5, 2025

I think I forgot to mentioned this earlier, I split off the change to run recipe simplification late to #123655. The commit is included here.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Mar 3, 2025
…uctionRecipe

Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan.

The type of the element is specified through a sentinel value as is done in llvm#119284.

This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 4, 2025
…(NFC)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* llvm#129508
* llvm#119284
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 4, 2025
…(NFC)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* llvm#129508
* llvm#119284
fhahn added a commit that referenced this pull request Mar 25, 2025
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
#119284.

PR: #123655
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 25, 2025
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
llvm/llvm-project#119284.

PR: llvm/llvm-project#123655
fhahn added a commit that referenced this pull request Apr 10, 2025
…NFC) (#129706)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* #129508
* #119284

PR: #129706
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 10, 2025
…ScalarCast(NFC) (#129706)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* llvm/llvm-project#129508
* llvm/llvm-project#119284

PR: llvm/llvm-project#129706
YonahGoldberg pushed a commit to YonahGoldberg/llvm-project that referenced this pull request Apr 10, 2025
…NFC) (llvm#129706)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* llvm#129508
* llvm#119284

PR: llvm#129706
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping :)

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits

@@ -2381,6 +2379,7 @@ void VPlanTransforms::convertToConcreteRecipes(VPlan &Plan,
VPTypeAnalysis &TypeInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just pass in the canonical IV and construct the TypeInfo in convertToConcreteRecipes, since I think it'll be invalid once convertToConcreteRecipes returns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it would be clearer to pass the type analysis interface wise, but undone for now, until your patch lands adding support for invalidating entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I can break that off into a separate patch if you'd like

@fhahn fhahn merged commit 54b33eb into llvm:main Apr 14, 2025
7 of 11 checks passed
@fhahn fhahn deleted the wide-iv-step branch April 14, 2025 21:20
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 14, 2025
…19284)

This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
llvm/llvm-project#108378 with an increment of
the wide IV step.

PR: llvm/llvm-project#119284
@kazutakahirata
Copy link
Contributor

I've landed 888b3ed to fix a warning from this PR. Thanks!

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…NFC) (llvm#129706)

There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* llvm#129508
* llvm#119284

PR: llvm#129706
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
This patch adds a WideIVStep opcode that can be used to create a vector
with the steps to increment a wide induction. The opcode has 2 operands
* the vector step
* the scale of the vector step

The opcode is later converted into a sequence of recipes that convert
the scale and step to the target type, if needed, and then multiply
vector step by scale.

This simplifies code that needs to materialize step vectors, e.g.
replacing wide IVs as follow up to
llvm#108378 with an increment of
the wide IV step.

PR: llvm#119284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants