-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][Vector] Add utility for computing scalable value bounds #83876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir-vector @llvm/pr-subscribers-mlir Author: Benjamin Maxwell (MacDue) ChangesThis adds a new API built with the The result is an The API is defined as follows: FailureOr<ConstantOrScalableBound>
vector::computeScalableBound(Value value, std::optional<int64_t> dim,
unsigned vscaleMin, unsigned vscaleMax,
presburger::BoundType boundType); Note: We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors). Patch is 23.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83876.diff 6 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index f6b03a0f2c8007..635d609d0b3e71 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -9,6 +9,7 @@
#ifndef MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
#define MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
+#include "mlir/Analysis/Presburger/IntegerRelation.h"
#include "mlir/Dialect/Utils/IndexingUtils.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"
#include "mlir/IR/BuiltinAttributes.h"
@@ -98,6 +99,34 @@ bool isContiguousSlice(MemRefType memrefType, VectorType vectorType);
std::optional<StaticTileOffsetRange>
createUnrollIterator(VectorType vType, int64_t targetRank = 1);
+struct ConstantOrScalableBound {
+ AffineMap map;
+
+ struct BoundSize {
+ int64_t baseSize{0};
+ bool scalable{false};
+ };
+
+ /// Get the (possibly) scalable size of the bound, returns failure if the
+ /// bound cannot be represented as a single quantity.
+ FailureOr<BoundSize> getSize() const;
+};
+
+/// Computes a (possibly) scalable bound for a given value. This is similar to
+/// `ValueBoundsConstraintSet::computeConstantBound()`, but uses knowledge of
+/// the range of vscale to compute either a constant bound, an expression in
+/// terms of vscale, or failure if no bound can be computed.
+///
+/// The resulting `AffineMap` will always take at most one parameter, vscale,
+/// and return a single result, which is the bound of `value`.
+///
+/// Note: `vscaleMin` must be `<=` to `vscaleMax`. If `vscaleMin` ==
+/// `vscaleMax`, the resulting bound (if found), will be constant.
+FailureOr<ConstantOrScalableBound>
+computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType);
+
} // namespace vector
/// Constructs a permutation map of invariant memref indices to vector
diff --git a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
index 28dadfb9ecf868..6d0e16bf215f8a 100644
--- a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
+++ b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
@@ -265,10 +265,28 @@ class ValueBoundsConstraintSet {
ValueBoundsConstraintSet(MLIRContext *ctx);
+ /// A callback to allow injecting custom value bounds constraints.
+ /// It takes the current value, the dim (or kIndexValue), and a reference to
+ /// the constraints set.
+ using PopulateCustomValueBoundsFn =
+ function_ref<void(Value, int64_t, ValueBoundsConstraintSet &)>;
+
+ /// Populates the constraint set for a value/map without actually computing
+ /// the bound.
+ int64_t populateConstraintsSet(
+ Value value, std::optional<int64_t> dim = std::nullopt,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr,
+ StopConditionFn stopCondition = nullptr);
+ int64_t populateConstraintsSet(
+ AffineMap map, ValueDimList mapOperands,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr,
+ StopConditionFn stopCondition = nullptr, int64_t *posOut = nullptr);
+
/// Iteratively process all elements on the worklist until an index-typed
/// value or shaped value meets `stopCondition`. Such values are not processed
/// any further.
- void processWorklist(StopConditionFn stopCondition);
+ void processWorklist(StopConditionFn stopCondition,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr);
/// Bound the given column in the underlying constraint set by the given
/// expression.
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index d613672608c3ad..77b2ad6ff540ce 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -24,6 +24,7 @@
#include "mlir/IR/IntegerSet.h"
#include "mlir/IR/Operation.h"
#include "mlir/IR/TypeUtilities.h"
+#include "mlir/Interfaces/ValueBoundsOpInterface.h"
#include "mlir/Support/LLVM.h"
#include "mlir/Support/MathExtras.h"
@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
}
+
+FailureOr<vector::ConstantOrScalableBound::BoundSize>
+vector::ConstantOrScalableBound::getSize() const {
+ if (map.isSingleConstant())
+ return BoundSize{map.getSingleConstantResult(), /*scalable=*/false};
+ if (map.getNumResults() != 1 || map.getNumInputs() != 1)
+ return failure();
+ auto binop = dyn_cast<AffineBinaryOpExpr>(map.getResult(0));
+ if (!binop || binop.getKind() != AffineExprKind::Mul)
+ return failure();
+ auto matchConstant = [&](AffineExpr expr, int64_t &constant) -> bool {
+ if (auto cst = dyn_cast<AffineConstantExpr>(expr)) {
+ constant = cst.getValue();
+ return true;
+ }
+ return false;
+ };
+ // Match `s0 * cst` or `cst * s0`:
+ int64_t cst = 0;
+ auto lhs = binop.getLHS();
+ auto rhs = binop.getRHS();
+ if ((matchConstant(lhs, cst) && isa<AffineSymbolExpr>(rhs)) ||
+ (matchConstant(rhs, cst) && isa<AffineSymbolExpr>(lhs))) {
+ return BoundSize{cst, /*scalable=*/true};
+ }
+ return failure();
+}
+
+namespace {
+struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {
+ using ValueBoundsConstraintSet::ValueBoundsConstraintSet;
+
+ static Operation *getOwnerOfValue(Value value) {
+ if (auto bbArg = dyn_cast<BlockArgument>(value))
+ return bbArg.getOwner()->getParentOp();
+ return value.getDefiningOp();
+ }
+
+ static FailureOr<AffineMap>
+ computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType) {
+ using namespace presburger;
+
+ assert(vscaleMin <= vscaleMax);
+ ScalableValueBoundsConstraintSet cstr(value.getContext());
+
+ Value vscale;
+ int64_t pos = cstr.populateConstraintsSet(
+ value, dim,
+ /* Custom vscale value bounds */
+ [&vscale, vscaleMin, vscaleMax](Value value, int64_t dim,
+ ValueBoundsConstraintSet &cstr) {
+ if (dim != ValueBoundsConstraintSet::kIndexValue)
+ return;
+ if (isa_and_present<vector::VectorScaleOp>(getOwnerOfValue(value))) {
+ if (vscale) {
+ // All copies of vscale are equivalent.
+ cstr.bound(value) == cstr.getExpr(vscale);
+ } else {
+ // We know vscale is confined to [vscaleMin, vscaleMax].
+ cstr.bound(value) >= vscaleMin;
+ cstr.bound(value) <= vscaleMax;
+ vscale = value;
+ }
+ }
+ },
+ /* Stop condition */
+ [](auto, auto) {
+ // Keep adding constraints till the worklist is empty.
+ return false;
+ });
+
+ // Project out all variables apart from the first vscale.
+ cstr.projectOut([&](ValueDim p) { return p.first != vscale; });
+
+ assert(cstr.cstr.getNumDimAndSymbolVars() ==
+ cstr.positionToValueDim.size() &&
+ "inconsistent mapping state");
+
+ for (int64_t i = 0; i < cstr.cstr.getNumDimAndSymbolVars(); ++i) {
+ if (i == pos)
+ continue;
+ if (cstr.positionToValueDim[i] !=
+ ValueDim(vscale, ValueBoundsConstraintSet::kIndexValue)) {
+ return failure();
+ }
+ }
+
+ SmallVector<AffineMap, 1> lowerBound(1), upperBound(1);
+ cstr.cstr.getSliceBounds(pos, 1, value.getContext(), &lowerBound,
+ &upperBound,
+ /*closedUB=*/true);
+
+ auto invalidBound = [](auto &bound) {
+ return !bound[0] || bound[0].getNumResults() != 1;
+ };
+
+ AffineMap bound = [&] {
+ if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
+ lowerBound[0] == lowerBound[0]) {
+ return lowerBound[0];
+ } else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
+ return lowerBound[0];
+ } else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
+ return upperBound[0];
+ }
+ return AffineMap{};
+ }();
+
+ if (!bound)
+ return failure();
+
+ return bound;
+ }
+};
+
+} // namespace
+
+FailureOr<vector::ConstantOrScalableBound>
+vector::computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType) {
+ auto bound = ScalableValueBoundsConstraintSet::computeScalableBound(
+ value, dim, vscaleMin, vscaleMax, boundType);
+ if (failed(bound))
+ return failure();
+ return ConstantOrScalableBound{*bound};
+}
diff --git a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
index 85abc2df894797..ac4e3b935a0542 100644
--- a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
+++ b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
@@ -191,7 +191,9 @@ static Operation *getOwnerOfValue(Value value) {
return value.getDefiningOp();
}
-void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
+void ValueBoundsConstraintSet::processWorklist(
+ StopConditionFn stopCondition,
+ PopulateCustomValueBoundsFn customValueBounds) {
while (!worklist.empty()) {
int64_t pos = worklist.front();
worklist.pop();
@@ -215,8 +217,11 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
if (stopCondition(value, maybeDim))
continue;
- // Query `ValueBoundsOpInterface` for constraints. New items may be added to
- // the worklist.
+ // 1. Query `customValueBounds` for constraints (if provided).
+ if (customValueBounds)
+ customValueBounds(value, dim, *this);
+
+ // 2. Query `ValueBoundsOpInterface` for constraints.
auto valueBoundsOp =
dyn_cast<ValueBoundsOpInterface>(getOwnerOfValue(value));
if (valueBoundsOp) {
@@ -228,6 +233,8 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
continue;
}
+ // Steps 1 and 2 above may add new items to the worklist.
+
// If the op does not implement `ValueBoundsOpInterface`, check if it
// implements the `DestinationStyleOpInterface`. OpResults of such ops are
// tied to OpOperands. Tied values have the same shape.
@@ -471,55 +478,92 @@ FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
closedUB);
}
+FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
+ presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
+ StopConditionFn stopCondition, bool closedUB) {
+ ValueDimList valueDims;
+ for (Value v : operands) {
+ assert(v.getType().isIndex() && "expected index type");
+ valueDims.emplace_back(v, std::nullopt);
+ }
+ return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+}
+
FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
presburger::BoundType type, AffineMap map, ValueDimList operands,
StopConditionFn stopCondition, bool closedUB) {
assert(map.getNumResults() == 1 && "expected affine map with one result");
ValueBoundsConstraintSet cstr(map.getContext());
- int64_t pos = cstr.insert(/*isSymbol=*/false);
+
+ int64_t pos = 0;
+ if (stopCondition) {
+ cstr.populateConstraintsSet(map, operands, nullptr, stopCondition, &pos);
+ } else {
+ // No stop condition specified: Keep adding constraints until a bound could
+ // be computed.
+ cstr.populateConstraintsSet(
+ map, operands, nullptr,
+ [&](Value v, std::optional<int64_t> dim) {
+ return cstr.cstr.getConstantBound64(type, pos).has_value();
+ },
+ &pos);
+ }
+ // Compute constant bound for `valueDim`.
+ int64_t ubAdjustment = closedUB ? 0 : 1;
+ if (auto bound = cstr.cstr.getConstantBound64(type, pos))
+ return type == BoundType::UB ? *bound + ubAdjustment : *bound;
+ return failure();
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+ Value value, std::optional<int64_t> dim,
+ PopulateCustomValueBoundsFn customValueBounds,
+ StopConditionFn stopCondition) {
+#ifndef NDEBUG
+ assertValidValueDim(value, dim);
+#endif // NDEBUG
+
+ AffineMap map =
+ AffineMap::get(/*dimCount=*/1, /*symbolCount=*/0,
+ Builder(value.getContext()).getAffineDimExpr(0));
+ return populateConstraintsSet(map, {{value, dim}}, customValueBounds,
+ stopCondition);
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+ AffineMap map, ValueDimList operands,
+ PopulateCustomValueBoundsFn customValueBounds,
+ StopConditionFn stopCondition, int64_t *posOut) {
+ assert(map.getNumResults() == 1 && "expected affine map with one result");
+ int64_t pos = insert(/*isSymbol=*/false);
+ if (posOut)
+ *posOut = pos;
// Add map and operands to the constraint set. Dimensions are converted to
// symbols. All operands are added to the worklist.
auto mapper = [&](std::pair<Value, std::optional<int64_t>> v) {
- return cstr.getExpr(v.first, v.second);
+ return getExpr(v.first, v.second);
};
SmallVector<AffineExpr> dimReplacements = llvm::to_vector(
llvm::map_range(ArrayRef(operands).take_front(map.getNumDims()), mapper));
SmallVector<AffineExpr> symReplacements = llvm::to_vector(
llvm::map_range(ArrayRef(operands).drop_front(map.getNumDims()), mapper));
- cstr.addBound(
+ addBound(
presburger::BoundType::EQ, pos,
map.getResult(0).replaceDimsAndSymbols(dimReplacements, symReplacements));
// Process the backward slice of `operands` (i.e., reverse use-def chain)
// until `stopCondition` is met.
if (stopCondition) {
- cstr.processWorklist(stopCondition);
+ processWorklist(stopCondition, customValueBounds);
} else {
- // No stop condition specified: Keep adding constraints until a bound could
- // be computed.
- cstr.processWorklist(
- /*stopCondition=*/[&](Value v, std::optional<int64_t> dim) {
- return cstr.cstr.getConstantBound64(type, pos).has_value();
- });
+ // No stop condition specified: Keep adding constraints until the worklist
+ // is empty.
+ processWorklist([](Value v, std::optional<int64_t> dim) { return false; },
+ customValueBounds);
}
- // Compute constant bound for `valueDim`.
- int64_t ubAdjustment = closedUB ? 0 : 1;
- if (auto bound = cstr.cstr.getConstantBound64(type, pos))
- return type == BoundType::UB ? *bound + ubAdjustment : *bound;
- return failure();
-}
-
-FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
- presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
- StopConditionFn stopCondition, bool closedUB) {
- ValueDimList valueDims;
- for (Value v : operands) {
- assert(v.getType().isIndex() && "expected index type");
- valueDims.emplace_back(v, std::nullopt);
- }
- return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+ return pos;
}
FailureOr<int64_t>
diff --git a/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
new file mode 100644
index 00000000000000..2afc4db874b73e
--- /dev/null
+++ b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
@@ -0,0 +1,137 @@
+// RUN: mlir-opt %s -test-affine-reify-value-bounds -cse -verify-diagnostics \
+// RUN: -verify-diagnostics -split-input-file | FileCheck %s
+
+#fixedDim0Map = affine_map<(d0)[s0] -> (-d0 + 32400, s0)>
+#fixedDim1Map = affine_map<(d0)[s0] -> (-d0 + 16, s0)>
+
+// Here the upper bound for min_i is 4 x vscale, as we know 4 x vscale is
+// always less than 32400. The bound for min_j is 16 as at vscale > 4,
+// 4 x vscale will be > 16, so the value will be clamped at 16.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_0:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @fixed_size_loop_nest
+// CHECK-DAG: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_0]]()[%vscale]
+// CHECK-DAG: %[[C16:.*]] = arith.constant 16 : index
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]], %[[C16]]) : (index, index) -> ()
+func.func @fixed_size_loop_nest() {
+ %c16 = arith.constant 16 : index
+ %c32400 = arith.constant 32400 : index
+ %c4 = arith.constant 4 : index
+ %c0 = arith.constant 0 : index
+ %vscale = vector.vscale
+ %c4_vscale = arith.muli %vscale, %c4 : index
+ scf.for %i = %c0 to %c32400 step %c4_vscale {
+ %min_i = affine.min #fixedDim0Map(%i)[%c4_vscale]
+ scf.for %j = %c0 to %c16 step %c4_vscale {
+ %min_j = affine.min #fixedDim1Map(%j)[%c4_vscale]
+ %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+ %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+ }
+ }
+ return
+}
+
+// -----
+
+#dynamicDim0Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+#dynamicDim1Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+
+// Here upper bounds for both min_i and min_j are both 4 x vscale, as we know
+// that is always the largest value they could take. As if `dim < 4 x vscale`
+// then 4 x vscale is an overestimate, and if `dim > 4 x vscale` then the min
+// will be clamped to 4 x vscale.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_1:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @dynamic_size_loop_nest
+// CHECK: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_1]]()[%vscale]
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]], %[[SCALABLE_BOUND]]) : (index, index) -> ()
+func.func @dynamic_size_loop_nest(%dim0: index, %dim1: index) {
+ %c4 = arith.constant 4 : index
+ %c0 = arith.constant 0 : index
+ %vscale = vector.vscale
+ %c4_vscale = arith.muli %vscale, %c4 : index
+ scf.for %i = %c0 to %dim0 step %c4_vscale {
+ %min_i = affine.min #dynamicDim0Map(%i)[%c4_vscale, %dim0]
+ scf.for %j = %c0 to %dim1 step %c4_vscale {
+ %min_j = affine.min #dynamicDim1Map(%j)[%c4_vscale, %dim1]
+ %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+ %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+ }
+ }
+ return
+}
+
+// -----
+
+// Here the upper bound is just a value + a constant.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_2:.*]] = affine_map<()[s0] -> (s0 + 8)>
+
+// CHECK-LABEL: @add_to_vscale
+// CHECK: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_2]]()[%vscale]
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]]) : (index) -> ()
+func.func @add_to_vscale() {
+ %vscale = vector.vscale
+ %c8 = arith.constant 8 : index
+ %vscale_plus_c8 = arith.addi %vscale, %c8 : index
+ %bound = "test.reify_scalable_bound"(%vscale_plus_c8) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we know vscale is always 2 so we get a constant upper bound.
+
+// CHECK-LABEL: @vscale_fixed_size
+// CHECK: %[[C2:.*]] = arith.constant 2 : index
+// CHECK: "test.some_use"(%[[C2]]) : (index) -> ()
+func.func @vscale_fixed_size() {
+ %vscale = vector.vscale
+ %bound = "test.reify_scalable_bound"(%vscale) {type = "UB", vscale_min = 2, vscale_max = 2} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we don't know the upper bound (%a is underspecified)
+
+func.func @unknown_bound(%a: index) {
+ %vscale = vector.vscale
+ %vscale_plus_a = arith.muli %vscale, %a : index
+ // expected-error @below{{could not reify bound}}
+ %bound = "test.reify_scalable_bound"(%vscale_plus_a) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we have two vscale values (that have not been CSE'd), but they should
+// still be treated as equivalent.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_3:.*]] = affine_map<()[s0] -> (s0 * 6)>
+
+// CHECK-LABEL: @duplicate_vscale_values
+// CHECK: %[...
[truncated]
|
@matthias-springer I think you implemented ValueBoundsConstraintSet in the past, could you help review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, left a few minor comments, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments from me. I'm not familiar with this area, hence these are mostly requests for more clarification. Thanks!
} | ||
|
||
namespace { | ||
struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to see if we can reuse more functionality from ValueBoundsConstraintSet
, so less has to be reimplemented here.
One thing that stands out to me in this PR is that vector.vscale
does not implement ValueBoundsOpInterface
.
Here's an idea that could work:
ScalableValueBoundsConstraintSet
defines 3 fields:vscaleMin
,vscaleMax
andvscale
.vscaleMin
andvscaleMax
are initialized in the constructor ofScalableValueBoundsConstraintSet
.vscale
is first seenvscale
SSA value. "empty" value in the beginning.- The
ValueBoundsOpInterface::populateBoundsForIndexValue
implementation dyn_castscstr
toScalableValueBoundsConstraintSet &
. If that succeeds, we can getvscaleMin
,vscaleMax
andvscale
and implement the same logic that you have here without thePopulateCustomValueBoundsFn
callback. - At that point, we may just be able to call
ValueBoundsConstraintSet::computeDependentBound
, which gives you a bound in which only certain SSA values (vscale
) are allowed to appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ValueBoundsConstraintSet::computeDependentBound()
does exactly what I want here. First, I'm not sure I want to stop at the first vscale
found, also computeDependentBound()
does not project out the valueDim
, which in my experiments gave affine_maps
that take two input symbols (presumably one for vscale
and the valueDim
), which did not usefully describe the bound.
Only once I eliminated all variables except vscale
did I get the bounds I was interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think the other points above could work 👍)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the above, and I don't really think it helps much with reuse, and is a little complex to setup. To be able to dyn_cast
the ValueBoundConstraintSet
we'd need to implement LLVM RTTI for it, and then it still only makes sense to use the helper on ScalableValueBoundsConstraintSet
to compute the bound. The other variants construct a non-scalable ValueBoundsConstraintSet
internally, so would not be useful for computing bounds dependent on vscale
.
And like I mentioned before none of the existing bounds functions do exactly what I want here, so I think I'd still need computeScalableBound()
(and it's accompanying helpers).
I think it could be done, but I think (right now) all it'd really eliminate is the PopulateCustomValueBoundsFn
callback.
Also, in the current implementation ScalableValueBoundsConstraintSet
is implementation detail, that's not exposed in any header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give it a try? I was mainly worried about adding new API (PopulateCustomValueBoundsFn
) to ValueBoundsOpInterface
when it is not needed.
Setting up the RTTI should be pretty easy, you can copy this: 6a43523
It would indeed only make sense for ScalableValueBoundsConstraintSet
, but I don't see a problem with that.
I don't think
ValueBoundsConstraintSet::computeDependentBound()
does exactly what I want here. First, I'm not sure I want to stop at the firstvscale
found, alsocomputeDependentBound()
does not project out thevalueDim
, which in my experiments gaveaffine_maps
that take two input symbols (presumably one forvscale
and thevalueDim
), which did not usefully describe the bound.Only once I eliminated all variables except
vscale
did I get the bounds I was interested in.
At the moment, stopFunction
controls how far the traversal goes and also what SSA values to project out. It looks like you need some more fine-grained control here: computeScalableBound
adds constraints for vscale
, but it does not project vscale
out. I don't really mind having a bit of duplicated functionality in computeScalableBound
; as you said it's quite isolated. I can look into adding the missing configuration options in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now reworked things things as suggested above (which has removed the PopulateCustomValueBoundsFn
callback).
@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) { | |||
shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim); | |||
return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1); | |||
} | |||
|
|||
FailureOr<vector::ConstantOrScalableBound::BoundSize> | |||
vector::ConstantOrScalableBound::getSize() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need all this pattern matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for converting the affine_map
to a single quantity. In my mind computeScalableBound()
is just like computeConstantBound()
, but it supports scalability. So in many cases you can convert the bound to a single (possibly scalable) value (which is simply represented with a int64_t
and a scalable flag, like it is in the vector type).
mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h
Outdated
Show resolved
Hide resolved
|
||
auto loc = op->getLoc(); | ||
auto reifiedScalable = | ||
vector::ScalableValueBoundsConstraintSet::computeScalableBound( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this test functionality belongs to the vector
dialect. But then we would also have to duplicate some functionality there. I'll leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just leave this here for now, it is slightly awkwardly placed, but still convenient for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for implementing this Ben!
Few more nits, but feel free to ignore - mostly me trying to grasp few more bits.
using RTTIExtends::StopConditionFn; | ||
|
||
/// A thin wrapper over an `AffineMap` which can represent a constant bound, | ||
/// or a scalable bound (in terms of vscale). The `AffineMap` will always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] vscale -> vector.vscale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm referring to the concept of vscale
, not the vector dialect operation.
This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` will always take at most one parameter, vscale, and return a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::computeScalableBound(Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors). Squashed commits: Fix `affine_map` in dynamic dim test Require vscale min/max to be explicit in tests Rewrite to use RTTI and vscale impl of `ValueBoundsOpInterface` Expose computeScalableBound() `stopCondition` Hide ValueBoundsConstraintSet methods on ScalableValueBoundsConstraintSet
Thanks a lot for the reviews! Big thanks to Matthias too (as the expert here) :) I'll land this tomorrow if there's no further comments. |
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
…83876) This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. Squashed: Review fixups Review fixups (round 2) CI fix Add `iree-llvmcpu-stack-allocation-assumed-vscale` option This allows changing the value of vscale used when checking if (scalable) allocations are within the stack limit. Internal suggestion from @ banach-space.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
#16869) This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1` (by default), since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. ci-extra: build_test_all_arm64
This adds a new API built with the
ValueBoundsConstraintSet
to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.The result is an
AffineMap
that will always take at most one parameter, vscale, and returns a single result, which is the bound ofvalue
.The API is defined as follows:
Note:
ConstantOrScalableBound
is a thin wrapper over theAffineMap
with a utility for converting the bound to a single quantity (i.e. a size and scalable flag).We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).