Skip to content

[LV] Support binary and unary operations with EVL-vectorization #93854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 66 additions & 3 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -923,6 +923,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenCastSC:
case VPRecipeBase::VPWidenGEPSC:
case VPRecipeBase::VPWidenSC:
case VPRecipeBase::VPWidenEVLSC:
case VPRecipeBase::VPWidenSelectSC:
case VPRecipeBase::VPBlendSC:
case VPRecipeBase::VPPredInstPHISC:
Expand Down Expand Up @@ -1107,6 +1108,7 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
static inline bool classof(const VPRecipeBase *R) {
return R->getVPDefID() == VPRecipeBase::VPInstructionSC ||
R->getVPDefID() == VPRecipeBase::VPWidenSC ||
R->getVPDefID() == VPRecipeBase::VPWidenEVLSC ||
R->getVPDefID() == VPRecipeBase::VPWidenGEPSC ||
R->getVPDefID() == VPRecipeBase::VPWidenCastSC ||
R->getVPDefID() == VPRecipeBase::VPReplicateSC ||
Expand Down Expand Up @@ -1410,11 +1412,16 @@ class VPInstruction : public VPRecipeWithIRFlags {
class VPWidenRecipe : public VPRecipeWithIRFlags {
unsigned Opcode;

protected:
template <typename IterT>
VPWidenRecipe(unsigned VPDefOpcode, Instruction &I,
iterator_range<IterT> Operands)
: VPRecipeWithIRFlags(VPDefOpcode, Operands, I), Opcode(I.getOpcode()) {}

public:
template <typename IterT>
VPWidenRecipe(Instruction &I, iterator_range<IterT> Operands)
: VPRecipeWithIRFlags(VPDef::VPWidenSC, Operands, I),
Opcode(I.getOpcode()) {}
: VPWidenRecipe(VPDef::VPWidenSC, I, Operands) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have a custom classof is isa<VPWidenRecipe> returns true for both VPWidenRecipe and VPWidenEVLRecipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? Was the goal of having dedicated EVL-recipes to prevent treating them as non-EVL ?
Should VPWidenLoad also return true for VPWidenLoadEVL ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit from having a shared base-class is so analyses don't have to handle all recipes when it makes sense.

I think analyses that apply to VPWidenRecipe should also conservatively apply to WPWidenEVLRecipe, as the later only possibly operates on fewer values. If that's not sound, we probably shouldn't inherit from VPWidenRecipe without also implementing the corresponding isa relationship.

VPWidenLoad/VPWidenLoadEVL only share VPWidenMemoryRecipe as common base-class, for which all VPWidenLoad|Store? return true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume than you really meant to introduce base class for VPWidenRecipe and VPWidenEVLRecipe that will return true for both of them, right ? In this case class hierarchy will be similar to VPWiden[Load|Store][|EVL].
If so, same should go to all future EVL-recipes:

           VPSomeRecipeBase
           /             \
VPSomeRecipe          VPSomeEVLRecipe
VPSomeRecipeBase::classof(...) { return Opcode == VPSomeRecipeSC || Opcode == VPSomeEVLRecipeSC; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally VPWidenRecipe could serve as such a base class as mentioned above, unless there’s a compelling reason not to. That way we automatically benefit from all folds and analysis already implemented for VPWidenRecipe

Copy link
Contributor Author

@nikolaypanchenko nikolaypanchenko Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand added hierarchy then. If the goal is to allow reuse of existing code, then what is different in VPWidenStoreEVL or VPWidenLoadEVL as they won't be treated as VPWidenStore or VPWidenLoad respectively:

isa<VPWidenMemoryRecipe>(EVLStore) => true
isa<VPWidenStoreEVLRecipe>(EVLStore) => true
isa<VPWidenStoreRecipe>(EVLStore) => false

Anyway, I'm ok to extend classof

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Adjusting VPWidenStoreEVLRecipe/VPWidenLoadEVLRecipe would probably make sense separately, although the reasoning may be a bit more complicated.


~VPWidenRecipe() override = default;

Expand All @@ -1424,7 +1431,15 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
return R;
}

VP_CLASSOF_IMPL(VPDef::VPWidenSC)
static inline bool classof(const VPRecipeBase *R) {
return R->getVPDefID() == VPRecipeBase::VPWidenSC ||
R->getVPDefID() == VPRecipeBase::VPWidenEVLSC;
}

static inline bool classof(const VPUser *U) {
auto *R = dyn_cast<VPRecipeBase>(U);
return R && classof(R);
}

/// Produce a widened instruction using the opcode and operands of the recipe,
/// processing State.VF elements.
Expand All @@ -1443,6 +1458,54 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
#endif
};

/// A recipe for widening operations with vector-predication intrinsics with
/// explicit vector length (EVL).
class VPWidenEVLRecipe : public VPWidenRecipe {
using VPRecipeWithIRFlags::transferFlags;

public:
template <typename IterT>
VPWidenEVLRecipe(Instruction &I, iterator_range<IterT> Operands, VPValue &EVL)
: VPWidenRecipe(VPDef::VPWidenEVLSC, I, Operands) {
addOperand(&EVL);
}
VPWidenEVLRecipe(VPWidenRecipe &W, VPValue &EVL)
: VPWidenEVLRecipe(*W.getUnderlyingInstr(), W.operands(), EVL) {
transferFlags(W);
}

~VPWidenEVLRecipe() override = default;

VPWidenRecipe *clone() override final {
llvm_unreachable("VPWidenEVLRecipe cannot be cloned");
return nullptr;
}

VP_CLASSOF_IMPL(VPDef::VPWidenEVLSC);

VPValue *getEVL() { return getOperand(getNumOperands() - 1); }
const VPValue *getEVL() const { return getOperand(getNumOperands() - 1); }

/// Produce a vp-intrinsic using the opcode and operands of the recipe,
/// processing EVL elements.
void execute(VPTransformState &State) override final;

/// Returns true if the recipe only uses the first lane of operand \p Op.
bool onlyFirstLaneUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");
// EVL in that recipe is always the last operand, thus any use before means
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to post a follow-up patch to enforce EVL only used as last operand of various recipes in the verifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will create PR with verification I added and removed previously

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks!

// the VPValue should be vectorized.
return getEVL() == Op;
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override final;
#endif
};

/// VPWidenCastRecipe is a recipe to create vector cast instructions.
class VPWidenCastRecipe : public VPRecipeWithIRFlags {
/// Cast instruction opcode.
Expand Down
5 changes: 3 additions & 2 deletions llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -263,8 +263,9 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
VPWidenCanonicalIVRecipe>([this](const VPRecipeBase *R) {
return inferScalarType(R->getOperand(0));
})
.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
VPWidenCallRecipe, VPWidenMemoryRecipe, VPWidenSelectRecipe>(
.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPWidenEVLRecipe,
VPReplicateRecipe, VPWidenCallRecipe, VPWidenMemoryRecipe,
VPWidenSelectRecipe>(
[this](const auto *R) { return inferScalarTypeForRecipe(R); })
.Case<VPInterleaveRecipe>([V](const VPInterleaveRecipe *R) {
// TODO: Use info from interleave group.
Expand Down
52 changes: 52 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"
#include "llvm/IR/VectorBuilder.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
Expand Down Expand Up @@ -74,6 +75,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPWidenLoadSC:
case VPWidenPHISC:
case VPWidenSC:
case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
Expand Down Expand Up @@ -114,6 +116,7 @@ bool VPRecipeBase::mayReadFromMemory() const {
case VPWidenIntOrFpInductionSC:
case VPWidenPHISC:
case VPWidenSC:
case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
Expand Down Expand Up @@ -164,6 +167,7 @@ bool VPRecipeBase::mayHaveSideEffects() const {
case VPWidenPHISC:
case VPWidenPointerInductionSC:
case VPWidenSC:
case VPWidenEVLSC:
case VPWidenSelectSC: {
const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
Expand Down Expand Up @@ -1262,6 +1266,45 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
}
}

void VPWidenEVLRecipe::execute(VPTransformState &State) {
unsigned Opcode = getOpcode();
// TODO: Support other opcodes
if (!Instruction::isBinaryOp(Opcode) && !Instruction::isUnaryOp(Opcode))
llvm_unreachable("Unsupported opcode in VPWidenEVLRecipe::execute");

State.setDebugLocFrom(getDebugLoc());
assert(State.UF == 1 && "Expected only UF == 1 when vectorizing with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know much about vectorized design. I would like to know why UF is forced to be set to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for EVL-vectorization, as for now it's easier not to handle multiple get.vector.length, where one or many could return 0, thus following instructions will process 0 elements. Another worry I have is we will have to emit extra checks for zero EVL if instruction does ignore it, like RISC-V's vmv.x.s if it's generated within a loop

"explicit vector length.");
VPValue *Op0 = getOperand(0);

assert(State.get(Op0, 0)->getType()->isVectorTy() &&
"VPWidenEVLRecipe should not be used for scalars");

VPValue *EVL = getEVL();
Value *EVLArg = State.get(EVL, 0, /*NeedsScalar=*/true);
IRBuilderBase &BuilderIR = State.Builder;
VectorBuilder Builder(BuilderIR);
Value *Mask = BuilderIR.CreateVectorSplat(State.VF, BuilderIR.getTrue());

SmallVector<Value *, 4> Ops;
for (unsigned I = 0, E = getNumOperands() - 1; I < E; ++I) {
VPValue *VPOp = getOperand(I);
Ops.push_back(State.get(VPOp, 0));
}

Builder.setMask(Mask).setEVL(EVLArg);
Value *VPInst =
Builder.createVectorInstruction(Opcode, Ops[0]->getType(), Ops, "vp.op");
// Currently vp-intrinsics only accept FMF flags.
// TODO: Enable other flags when support is added.
if (isa<FPMathOperator>(VPInst))
setFlags(cast<Instruction>(VPInst));

State.set(this, VPInst, 0);
State.addMetadata(VPInst,
dyn_cast_or_null<Instruction>(getUnderlyingValue()));
}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
Expand All @@ -1271,6 +1314,15 @@ void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
printFlags(O);
printOperands(O, SlotTracker);
}

void VPWidenEVLRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
O << Indent << "WIDEN-VP ";
printAsOperand(O, SlotTracker);
O << " = " << Instruction::getOpcodeName(getOpcode());
printFlags(O);
printOperands(O, SlotTracker);
}
#endif

void VPWidenCastRecipe::execute(VPTransformState &State) {
Expand Down
101 changes: 60 additions & 41 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

#include "VPlanTransforms.h"
#include "VPRecipeBuilder.h"
#include "VPlan.h"
#include "VPlanAnalysis.h"
#include "VPlanCFG.h"
#include "VPlanDominatorTree.h"
Expand All @@ -21,6 +22,7 @@
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Analysis/IVDescriptors.h"
#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Intrinsics.h"
Expand Down Expand Up @@ -1315,6 +1317,63 @@ void VPlanTransforms::addActiveLaneMask(
HeaderMask->replaceAllUsesWith(LaneMask);
}

/// Replace recipes with their EVL variants.
static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
SmallVector<VPValue *> HeaderMasks = collectAllHeaderMasks(Plan);
for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
for (VPUser *U : collectUsersRecursively(HeaderMask)) {
auto *CurRecipe = dyn_cast<VPRecipeBase>(U);
if (!CurRecipe)
continue;
auto GetNewMask = [&](VPValue *OrigMask) -> VPValue * {
assert(OrigMask && "Unmasked recipe when folding tail");
return HeaderMask == OrigMask ? nullptr : OrigMask;
};

VPRecipeBase *NewRecipe =
TypeSwitch<VPRecipeBase *, VPRecipeBase *>(CurRecipe)
.Case<VPWidenLoadRecipe>([&](VPWidenLoadRecipe *L) {
VPValue *NewMask = GetNewMask(L->getMask());
return new VPWidenLoadEVLRecipe(*L, EVL, NewMask);
})
.Case<VPWidenStoreRecipe>([&](VPWidenStoreRecipe *S) {
VPValue *NewMask = GetNewMask(S->getMask());
return new VPWidenStoreEVLRecipe(*S, EVL, NewMask);
})
.Case<VPWidenRecipe>([&](VPWidenRecipe *W) -> VPRecipeBase * {
unsigned Opcode = W->getOpcode();
if (!Instruction::isBinaryOp(Opcode) &&
!Instruction::isUnaryOp(Opcode))
return nullptr;
return new VPWidenEVLRecipe(*W, EVL);
})
.Case<VPReductionRecipe>([&](VPReductionRecipe *Red) {
VPValue *NewMask = GetNewMask(Red->getCondOp());
return new VPReductionEVLRecipe(*Red, EVL, NewMask);
})
.Default([&](VPRecipeBase *R) { return nullptr; });

if (!NewRecipe)
continue;

[[maybe_unused]] unsigned NumDefVal = NewRecipe->getNumDefinedValues();
assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
"New recipe must define the same number of values as the "
"original.");
assert(
NumDefVal <= 1 &&
"Only supports recipes with a single definition or without users.");
NewRecipe->insertBefore(CurRecipe);
if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(NewRecipe)) {
VPValue *CurVPV = CurRecipe->getVPSingleValue();
CurVPV->replaceAllUsesWith(NewRecipe->getVPSingleValue());
}
CurRecipe->eraseFromParent();
}
recursivelyDeleteDeadRecipes(HeaderMask);
}
}

/// Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
/// replaces all uses except the canonical IV increment of
/// VPCanonicalIVPHIRecipe with a VPEVLBasedIVPHIRecipe. VPCanonicalIVPHIRecipe
Expand Down Expand Up @@ -1384,48 +1443,8 @@ bool VPlanTransforms::tryAddExplicitVectorLength(VPlan &Plan) {
NextEVLIV->insertBefore(CanonicalIVIncrement);
EVLPhi->addOperand(NextEVLIV);

for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
for (VPUser *U : collectUsersRecursively(HeaderMask)) {
VPRecipeBase *NewRecipe = nullptr;
auto *CurRecipe = dyn_cast<VPRecipeBase>(U);
if (!CurRecipe)
continue;

auto GetNewMask = [&](VPValue *OrigMask) -> VPValue * {
assert(OrigMask && "Unmasked recipe when folding tail");
return HeaderMask == OrigMask ? nullptr : OrigMask;
};
if (auto *MemR = dyn_cast<VPWidenMemoryRecipe>(CurRecipe)) {
VPValue *NewMask = GetNewMask(MemR->getMask());
if (auto *L = dyn_cast<VPWidenLoadRecipe>(MemR))
NewRecipe = new VPWidenLoadEVLRecipe(*L, *VPEVL, NewMask);
else if (auto *S = dyn_cast<VPWidenStoreRecipe>(MemR))
NewRecipe = new VPWidenStoreEVLRecipe(*S, *VPEVL, NewMask);
else
llvm_unreachable("unsupported recipe");
} else if (auto *RedR = dyn_cast<VPReductionRecipe>(CurRecipe)) {
NewRecipe = new VPReductionEVLRecipe(*RedR, *VPEVL,
GetNewMask(RedR->getCondOp()));
}
transformRecipestoEVLRecipes(Plan, *VPEVL);

if (NewRecipe) {
[[maybe_unused]] unsigned NumDefVal = NewRecipe->getNumDefinedValues();
assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
"New recipe must define the same number of values as the "
"original.");
assert(
NumDefVal <= 1 &&
"Only supports recipes with a single definition or without users.");
NewRecipe->insertBefore(CurRecipe);
if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe>(NewRecipe)) {
VPValue *CurVPV = CurRecipe->getVPSingleValue();
CurVPV->replaceAllUsesWith(NewRecipe->getVPSingleValue());
}
CurRecipe->eraseFromParent();
}
}
recursivelyDeleteDeadRecipes(HeaderMask);
}
// Replace all uses of VPCanonicalIVPHIRecipe by
// VPEVLBasedIVPHIRecipe except for the canonical IV increment.
CanonicalIVPHI->replaceAllUsesWith(EVLPhi);
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlanValue.h
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,7 @@ class VPDef {
VPWidenStoreEVLSC,
VPWidenStoreSC,
VPWidenSC,
VPWidenEVLSC,
VPWidenSelectSC,
VPBlendSC,
// START: Phi-like recipes. Need to be kept together.
Expand Down
Loading