[mlir][Vector] Add utility for computing scalable value bounds #83876

MacDue · 2024-03-04T17:36:16Z

This adds a new API built with the ValueBoundsConstraintSet to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.

The result is an AffineMap that will always take at most one parameter, vscale, and returns a single result, which is the bound of value.

The API is defined as follows:

FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
  Value value, std::optional<int64_t> dim,
  unsigned vscaleMin, unsigned vscaleMax,
  presburger::BoundType boundType, 
  bool closedUB = true,
  StopConditionFn stopCondition = nullptr);

Note: ConstantOrScalableBound is a thin wrapper over the AffineMap with a utility for converting the bound to a single quantity (i.e. a size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).

llvmbot · 2024-03-04T17:36:47Z

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-affine

@llvm/pr-subscribers-mlir

Author: Benjamin Maxwell (MacDue)

Changes

This adds a new API built with the ValueBoundsConstraintSet to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.

The result is an AffineMap will always take at most one parameter, vscale, and return a single result, which is the bound of value.

The API is defined as follows:

FailureOr&lt;ConstantOrScalableBound&gt;
vector::computeScalableBound(Value value, std::optional&lt;int64_t&gt; dim,
                             unsigned vscaleMin, unsigned vscaleMax,
                             presburger::BoundType boundType);

Note: ConstantOrScalableBound is a thin wrapper over the AffineMap with a utility for converting the bound to a single quantity (i.e. a size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).

Patch is 23.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83876.diff

6 Files Affected:

(modified) mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h (+29)
(modified) mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h (+19-1)
(modified) mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp (+130)
(modified) mlir/lib/Interfaces/ValueBoundsOpInterface.cpp (+73-29)
(added) mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir (+137)
(modified) mlir/test/lib/Dialect/Affine/TestReifyValueBounds.cpp (+29-1)

diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index f6b03a0f2c8007..635d609d0b3e71 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -9,6 +9,7 @@
 #ifndef MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
 #define MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
 
+#include "mlir/Analysis/Presburger/IntegerRelation.h"
 #include "mlir/Dialect/Utils/IndexingUtils.h"
 #include "mlir/Dialect/Vector/IR/VectorOps.h"
 #include "mlir/IR/BuiltinAttributes.h"
@@ -98,6 +99,34 @@ bool isContiguousSlice(MemRefType memrefType, VectorType vectorType);
 std::optional<StaticTileOffsetRange>
 createUnrollIterator(VectorType vType, int64_t targetRank = 1);
 
+struct ConstantOrScalableBound {
+  AffineMap map;
+
+  struct BoundSize {
+    int64_t baseSize{0};
+    bool scalable{false};
+  };
+
+  /// Get the (possibly) scalable size of the bound, returns failure if the
+  /// bound cannot be represented as a single quantity.
+  FailureOr<BoundSize> getSize() const;
+};
+
+/// Computes a (possibly) scalable bound for a given value. This is similar to
+/// `ValueBoundsConstraintSet::computeConstantBound()`, but uses knowledge of
+/// the range of vscale to compute either a constant bound, an expression in
+/// terms of vscale, or failure if no bound can be computed.
+///
+/// The resulting `AffineMap` will always take at most one parameter, vscale,
+/// and return a single result, which is the bound of `value`.
+///
+/// Note: `vscaleMin` must be `<=` to `vscaleMax`. If `vscaleMin` ==
+/// `vscaleMax`, the resulting bound (if found), will be constant.
+FailureOr<ConstantOrScalableBound>
+computeScalableBound(Value value, std::optional<int64_t> dim,
+                     unsigned vscaleMin, unsigned vscaleMax,
+                     presburger::BoundType boundType);
+
 } // namespace vector
 
 /// Constructs a permutation map of invariant memref indices to vector
diff --git a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
index 28dadfb9ecf868..6d0e16bf215f8a 100644
--- a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
+++ b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
@@ -265,10 +265,28 @@ class ValueBoundsConstraintSet {
 
   ValueBoundsConstraintSet(MLIRContext *ctx);
 
+  /// A callback to allow injecting custom value bounds constraints.
+  /// It takes the current value, the dim (or kIndexValue), and a reference to
+  /// the constraints set.
+  using PopulateCustomValueBoundsFn =
+      function_ref<void(Value, int64_t, ValueBoundsConstraintSet &)>;
+
+  /// Populates the constraint set for a value/map without actually computing
+  /// the bound.
+  int64_t populateConstraintsSet(
+      Value value, std::optional<int64_t> dim = std::nullopt,
+      PopulateCustomValueBoundsFn customValueBounds = nullptr,
+      StopConditionFn stopCondition = nullptr);
+  int64_t populateConstraintsSet(
+      AffineMap map, ValueDimList mapOperands,
+      PopulateCustomValueBoundsFn customValueBounds = nullptr,
+      StopConditionFn stopCondition = nullptr, int64_t *posOut = nullptr);
+
   /// Iteratively process all elements on the worklist until an index-typed
   /// value or shaped value meets `stopCondition`. Such values are not processed
   /// any further.
-  void processWorklist(StopConditionFn stopCondition);
+  void processWorklist(StopConditionFn stopCondition,
+                       PopulateCustomValueBoundsFn customValueBounds = nullptr);
 
   /// Bound the given column in the underlying constraint set by the given
   /// expression.
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index d613672608c3ad..77b2ad6ff540ce 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -24,6 +24,7 @@
 #include "mlir/IR/IntegerSet.h"
 #include "mlir/IR/Operation.h"
 #include "mlir/IR/TypeUtilities.h"
+#include "mlir/Interfaces/ValueBoundsOpInterface.h"
 #include "mlir/Support/LLVM.h"
 #include "mlir/Support/MathExtras.h"
 
@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
   shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
   return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
 }
+
+FailureOr<vector::ConstantOrScalableBound::BoundSize>
+vector::ConstantOrScalableBound::getSize() const {
+  if (map.isSingleConstant())
+    return BoundSize{map.getSingleConstantResult(), /*scalable=*/false};
+  if (map.getNumResults() != 1 || map.getNumInputs() != 1)
+    return failure();
+  auto binop = dyn_cast<AffineBinaryOpExpr>(map.getResult(0));
+  if (!binop || binop.getKind() != AffineExprKind::Mul)
+    return failure();
+  auto matchConstant = [&](AffineExpr expr, int64_t &constant) -> bool {
+    if (auto cst = dyn_cast<AffineConstantExpr>(expr)) {
+      constant = cst.getValue();
+      return true;
+    }
+    return false;
+  };
+  // Match `s0 * cst` or `cst * s0`:
+  int64_t cst = 0;
+  auto lhs = binop.getLHS();
+  auto rhs = binop.getRHS();
+  if ((matchConstant(lhs, cst) && isa<AffineSymbolExpr>(rhs)) ||
+      (matchConstant(rhs, cst) && isa<AffineSymbolExpr>(lhs))) {
+    return BoundSize{cst, /*scalable=*/true};
+  }
+  return failure();
+}
+
+namespace {
+struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {
+  using ValueBoundsConstraintSet::ValueBoundsConstraintSet;
+
+  static Operation *getOwnerOfValue(Value value) {
+    if (auto bbArg = dyn_cast<BlockArgument>(value))
+      return bbArg.getOwner()->getParentOp();
+    return value.getDefiningOp();
+  }
+
+  static FailureOr<AffineMap>
+  computeScalableBound(Value value, std::optional<int64_t> dim,
+                       unsigned vscaleMin, unsigned vscaleMax,
+                       presburger::BoundType boundType) {
+    using namespace presburger;
+
+    assert(vscaleMin <= vscaleMax);
+    ScalableValueBoundsConstraintSet cstr(value.getContext());
+
+    Value vscale;
+    int64_t pos = cstr.populateConstraintsSet(
+        value, dim,
+        /* Custom vscale value bounds */
+        [&vscale, vscaleMin, vscaleMax](Value value, int64_t dim,
+                                        ValueBoundsConstraintSet &cstr) {
+          if (dim != ValueBoundsConstraintSet::kIndexValue)
+            return;
+          if (isa_and_present<vector::VectorScaleOp>(getOwnerOfValue(value))) {
+            if (vscale) {
+              // All copies of vscale are equivalent.
+              cstr.bound(value) == cstr.getExpr(vscale);
+            } else {
+              // We know vscale is confined to [vscaleMin, vscaleMax].
+              cstr.bound(value) >= vscaleMin;
+              cstr.bound(value) <= vscaleMax;
+              vscale = value;
+            }
+          }
+        },
+        /* Stop condition */
+        [](auto, auto) {
+          // Keep adding constraints till the worklist is empty.
+          return false;
+        });
+
+    // Project out all variables apart from the first vscale.
+    cstr.projectOut([&](ValueDim p) { return p.first != vscale; });
+
+    assert(cstr.cstr.getNumDimAndSymbolVars() ==
+               cstr.positionToValueDim.size() &&
+           "inconsistent mapping state");
+
+    for (int64_t i = 0; i < cstr.cstr.getNumDimAndSymbolVars(); ++i) {
+      if (i == pos)
+        continue;
+      if (cstr.positionToValueDim[i] !=
+          ValueDim(vscale, ValueBoundsConstraintSet::kIndexValue)) {
+        return failure();
+      }
+    }
+
+    SmallVector<AffineMap, 1> lowerBound(1), upperBound(1);
+    cstr.cstr.getSliceBounds(pos, 1, value.getContext(), &lowerBound,
+                             &upperBound,
+                             /*closedUB=*/true);
+
+    auto invalidBound = [](auto &bound) {
+      return !bound[0] || bound[0].getNumResults() != 1;
+    };
+
+    AffineMap bound = [&] {
+      if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
+          lowerBound[0] == lowerBound[0]) {
+        return lowerBound[0];
+      } else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
+        return lowerBound[0];
+      } else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
+        return upperBound[0];
+      }
+      return AffineMap{};
+    }();
+
+    if (!bound)
+      return failure();
+
+    return bound;
+  }
+};
+
+} // namespace
+
+FailureOr<vector::ConstantOrScalableBound>
+vector::computeScalableBound(Value value, std::optional<int64_t> dim,
+                             unsigned vscaleMin, unsigned vscaleMax,
+                             presburger::BoundType boundType) {
+  auto bound = ScalableValueBoundsConstraintSet::computeScalableBound(
+      value, dim, vscaleMin, vscaleMax, boundType);
+  if (failed(bound))
+    return failure();
+  return ConstantOrScalableBound{*bound};
+}
diff --git a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
index 85abc2df894797..ac4e3b935a0542 100644
--- a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
+++ b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
@@ -191,7 +191,9 @@ static Operation *getOwnerOfValue(Value value) {
   return value.getDefiningOp();
 }
 
-void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
+void ValueBoundsConstraintSet::processWorklist(
+    StopConditionFn stopCondition,
+    PopulateCustomValueBoundsFn customValueBounds) {
   while (!worklist.empty()) {
     int64_t pos = worklist.front();
     worklist.pop();
@@ -215,8 +217,11 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
     if (stopCondition(value, maybeDim))
       continue;
 
-    // Query `ValueBoundsOpInterface` for constraints. New items may be added to
-    // the worklist.
+    // 1. Query `customValueBounds` for constraints (if provided).
+    if (customValueBounds)
+      customValueBounds(value, dim, *this);
+
+    // 2. Query `ValueBoundsOpInterface` for constraints.
     auto valueBoundsOp =
         dyn_cast<ValueBoundsOpInterface>(getOwnerOfValue(value));
     if (valueBoundsOp) {
@@ -228,6 +233,8 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
       continue;
     }
 
+    // Steps 1 and 2 above may add new items to the worklist.
+
     // If the op does not implement `ValueBoundsOpInterface`, check if it
     // implements the `DestinationStyleOpInterface`. OpResults of such ops are
     // tied to OpOperands. Tied values have the same shape.
@@ -471,55 +478,92 @@ FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
                               closedUB);
 }
 
+FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
+    presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
+    StopConditionFn stopCondition, bool closedUB) {
+  ValueDimList valueDims;
+  for (Value v : operands) {
+    assert(v.getType().isIndex() && "expected index type");
+    valueDims.emplace_back(v, std::nullopt);
+  }
+  return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+}
+
 FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
     presburger::BoundType type, AffineMap map, ValueDimList operands,
     StopConditionFn stopCondition, bool closedUB) {
   assert(map.getNumResults() == 1 && "expected affine map with one result");
   ValueBoundsConstraintSet cstr(map.getContext());
-  int64_t pos = cstr.insert(/*isSymbol=*/false);
+
+  int64_t pos = 0;
+  if (stopCondition) {
+    cstr.populateConstraintsSet(map, operands, nullptr, stopCondition, &pos);
+  } else {
+    // No stop condition specified: Keep adding constraints until a bound could
+    // be computed.
+    cstr.populateConstraintsSet(
+        map, operands, nullptr,
+        [&](Value v, std::optional<int64_t> dim) {
+          return cstr.cstr.getConstantBound64(type, pos).has_value();
+        },
+        &pos);
+  }
+  // Compute constant bound for `valueDim`.
+  int64_t ubAdjustment = closedUB ? 0 : 1;
+  if (auto bound = cstr.cstr.getConstantBound64(type, pos))
+    return type == BoundType::UB ? *bound + ubAdjustment : *bound;
+  return failure();
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+    Value value, std::optional<int64_t> dim,
+    PopulateCustomValueBoundsFn customValueBounds,
+    StopConditionFn stopCondition) {
+#ifndef NDEBUG
+  assertValidValueDim(value, dim);
+#endif // NDEBUG
+
+  AffineMap map =
+      AffineMap::get(/*dimCount=*/1, /*symbolCount=*/0,
+                     Builder(value.getContext()).getAffineDimExpr(0));
+  return populateConstraintsSet(map, {{value, dim}}, customValueBounds,
+                                stopCondition);
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+    AffineMap map, ValueDimList operands,
+    PopulateCustomValueBoundsFn customValueBounds,
+    StopConditionFn stopCondition, int64_t *posOut) {
+  assert(map.getNumResults() == 1 && "expected affine map with one result");
+  int64_t pos = insert(/*isSymbol=*/false);
+  if (posOut)
+    *posOut = pos;
 
   // Add map and operands to the constraint set. Dimensions are converted to
   // symbols. All operands are added to the worklist.
   auto mapper = [&](std::pair<Value, std::optional<int64_t>> v) {
-    return cstr.getExpr(v.first, v.second);
+    return getExpr(v.first, v.second);
   };
   SmallVector<AffineExpr> dimReplacements = llvm::to_vector(
       llvm::map_range(ArrayRef(operands).take_front(map.getNumDims()), mapper));
   SmallVector<AffineExpr> symReplacements = llvm::to_vector(
       llvm::map_range(ArrayRef(operands).drop_front(map.getNumDims()), mapper));
-  cstr.addBound(
+  addBound(
       presburger::BoundType::EQ, pos,
       map.getResult(0).replaceDimsAndSymbols(dimReplacements, symReplacements));
 
   // Process the backward slice of `operands` (i.e., reverse use-def chain)
   // until `stopCondition` is met.
   if (stopCondition) {
-    cstr.processWorklist(stopCondition);
+    processWorklist(stopCondition, customValueBounds);
   } else {
-    // No stop condition specified: Keep adding constraints until a bound could
-    // be computed.
-    cstr.processWorklist(
-        /*stopCondition=*/[&](Value v, std::optional<int64_t> dim) {
-          return cstr.cstr.getConstantBound64(type, pos).has_value();
-        });
+    // No stop condition specified: Keep adding constraints until the worklist
+    // is empty.
+    processWorklist([](Value v, std::optional<int64_t> dim) { return false; },
+                    customValueBounds);
   }
 
-  // Compute constant bound for `valueDim`.
-  int64_t ubAdjustment = closedUB ? 0 : 1;
-  if (auto bound = cstr.cstr.getConstantBound64(type, pos))
-    return type == BoundType::UB ? *bound + ubAdjustment : *bound;
-  return failure();
-}
-
-FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
-    presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
-    StopConditionFn stopCondition, bool closedUB) {
-  ValueDimList valueDims;
-  for (Value v : operands) {
-    assert(v.getType().isIndex() && "expected index type");
-    valueDims.emplace_back(v, std::nullopt);
-  }
-  return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+  return pos;
 }
 
 FailureOr<int64_t>
diff --git a/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
new file mode 100644
index 00000000000000..2afc4db874b73e
--- /dev/null
+++ b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
@@ -0,0 +1,137 @@
+// RUN: mlir-opt %s -test-affine-reify-value-bounds -cse -verify-diagnostics \
+// RUN:   -verify-diagnostics -split-input-file | FileCheck %s
+
+#fixedDim0Map = affine_map<(d0)[s0] -> (-d0 + 32400, s0)>
+#fixedDim1Map = affine_map<(d0)[s0] -> (-d0 + 16, s0)>
+
+// Here the upper bound for min_i is 4 x vscale, as we know 4 x vscale is
+// always less than 32400. The bound for min_j is 16 as at vscale > 4,
+// 4 x vscale will be > 16, so the value will be clamped at 16.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_0:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @fixed_size_loop_nest
+//   CHECK-DAG:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_0]]()[%vscale]
+//   CHECK-DAG:   %[[C16:.*]] = arith.constant 16 : index
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]], %[[C16]]) : (index, index) -> ()
+func.func @fixed_size_loop_nest() {
+  %c16 = arith.constant 16 : index
+  %c32400 = arith.constant 32400 : index
+  %c4 = arith.constant 4 : index
+  %c0 = arith.constant 0 : index
+  %vscale = vector.vscale
+  %c4_vscale = arith.muli %vscale, %c4 : index
+  scf.for %i = %c0 to %c32400 step %c4_vscale {
+    %min_i = affine.min #fixedDim0Map(%i)[%c4_vscale]
+    scf.for %j = %c0 to %c16 step %c4_vscale {
+      %min_j = affine.min #fixedDim1Map(%j)[%c4_vscale]
+      %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+      %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+      "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+    }
+  }
+  return
+}
+
+// -----
+
+#dynamicDim0Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+#dynamicDim1Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+
+// Here upper bounds for both min_i and min_j are both 4 x vscale, as we know
+// that is always the largest value they could take. As if `dim < 4 x vscale`
+// then 4 x vscale is an overestimate, and if `dim > 4 x vscale` then the min
+// will be clamped to 4 x vscale.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_1:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @dynamic_size_loop_nest
+//       CHECK:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_1]]()[%vscale]
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]], %[[SCALABLE_BOUND]]) : (index, index) -> ()
+func.func @dynamic_size_loop_nest(%dim0: index, %dim1: index) {
+  %c4 = arith.constant 4 : index
+  %c0 = arith.constant 0 : index
+  %vscale = vector.vscale
+  %c4_vscale = arith.muli %vscale, %c4 : index
+  scf.for %i = %c0 to %dim0 step %c4_vscale {
+    %min_i = affine.min #dynamicDim0Map(%i)[%c4_vscale, %dim0]
+    scf.for %j = %c0 to %dim1 step %c4_vscale {
+      %min_j = affine.min #dynamicDim1Map(%j)[%c4_vscale, %dim1]
+      %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+      %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+      "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+    }
+  }
+  return
+}
+
+// -----
+
+// Here the upper bound is just a value + a constant.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_2:.*]] = affine_map<()[s0] -> (s0 + 8)>
+
+// CHECK-LABEL: @add_to_vscale
+//       CHECK:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_2]]()[%vscale]
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]]) : (index) -> ()
+func.func @add_to_vscale() {
+  %vscale = vector.vscale
+  %c8 = arith.constant 8 : index
+  %vscale_plus_c8 = arith.addi %vscale, %c8 : index
+  %bound = "test.reify_scalable_bound"(%vscale_plus_c8) {type = "UB"} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we know vscale is always 2 so we get a constant upper bound.
+
+// CHECK-LABEL: @vscale_fixed_size
+//       CHECK:   %[[C2:.*]] = arith.constant 2 : index
+//       CHECK:   "test.some_use"(%[[C2]]) : (index) -> ()
+func.func @vscale_fixed_size() {
+  %vscale = vector.vscale
+  %bound = "test.reify_scalable_bound"(%vscale) {type = "UB", vscale_min = 2, vscale_max = 2} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we don't know the upper bound (%a is underspecified)
+
+func.func @unknown_bound(%a: index) {
+  %vscale = vector.vscale
+  %vscale_plus_a = arith.muli %vscale, %a : index
+  // expected-error @below{{could not reify bound}}
+  %bound = "test.reify_scalable_bound"(%vscale_plus_a) {type = "UB"} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we have two vscale values (that have not been CSE'd), but they should
+// still be treated as equivalent.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_3:.*]] = affine_map<()[s0] -> (s0 * 6)>
+
+// CHECK-LABEL: @duplicate_vscale_values
+//       CHECK:   %[...
[truncated]

hanhanW · 2024-03-06T17:41:31Z

@matthias-springer I think you implemented ValueBoundsConstraintSet in the past, could you help review ?

banach-space

Makes sense, left a few minor comments, thanks!

mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir

mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h

mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir

banach-space

Few more comments from me. I'm not familiar with this area, hence these are mostly requests for more clarification. Thanks!

mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h

mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp

mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h

mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir

mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h

matthias-springer · 2024-03-08T13:30:53Z

mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp

+}
+
+namespace {
+struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {


I'm trying to see if we can reuse more functionality from ValueBoundsConstraintSet, so less has to be reimplemented here.

One thing that stands out to me in this PR is that vector.vscale does not implement ValueBoundsOpInterface.

Here's an idea that could work:

ScalableValueBoundsConstraintSet defines 3 fields: vscaleMin, vscaleMax and vscale.

vscaleMin and vscaleMax are initialized in the constructor of ScalableValueBoundsConstraintSet.

vscale is first seen vscale SSA value. "empty" value in the beginning.

The ValueBoundsOpInterface::populateBoundsForIndexValue implementation dyn_casts cstr to ScalableValueBoundsConstraintSet &. If that succeeds, we can get vscaleMin, vscaleMax and vscale and implement the same logic that you have here without the PopulateCustomValueBoundsFn callback.

At that point, we may just be able to call ValueBoundsConstraintSet::computeDependentBound, which gives you a bound in which only certain SSA values (vscale) are allowed to appear.

I don't think ValueBoundsConstraintSet::computeDependentBound() does exactly what I want here. First, I'm not sure I want to stop at the first vscale found, also computeDependentBound() does not project out the valueDim, which in my experiments gave affine_maps that take two input symbols (presumably one for vscale and the valueDim), which did not usefully describe the bound.

Only once I eliminated all variables except vscale did I get the bounds I was interested in.

(I think the other points above could work 👍)

I looked into the above, and I don't really think it helps much with reuse, and is a little complex to setup. To be able to dyn_cast the ValueBoundConstraintSet we'd need to implement LLVM RTTI for it, and then it still only makes sense to use the helper on ScalableValueBoundsConstraintSet to compute the bound. The other variants construct a non-scalable ValueBoundsConstraintSet internally, so would not be useful for computing bounds dependent on vscale.

And like I mentioned before none of the existing bounds functions do exactly what I want here, so I think I'd still need computeScalableBound() (and it's accompanying helpers).

I think it could be done, but I think (right now) all it'd really eliminate is the PopulateCustomValueBoundsFn callback.

Also, in the current implementation ScalableValueBoundsConstraintSet is implementation detail, that's not exposed in any header files.

Can we give it a try? I was mainly worried about adding new API (PopulateCustomValueBoundsFn) to ValueBoundsOpInterface when it is not needed.

Setting up the RTTI should be pretty easy, you can copy this: 6a43523

It would indeed only make sense for ScalableValueBoundsConstraintSet, but I don't see a problem with that.

I don't think ValueBoundsConstraintSet::computeDependentBound() does exactly what I want here. First, I'm not sure I want to stop at the first vscale found, also computeDependentBound() does not project out the valueDim, which in my experiments gave affine_maps that take two input symbols (presumably one for vscale and the valueDim), which did not usefully describe the bound.

Only once I eliminated all variables except vscale did I get the bounds I was interested in.

At the moment, stopFunction controls how far the traversal goes and also what SSA values to project out. It looks like you need some more fine-grained control here: computeScalableBound adds constraints for vscale, but it does not project vscale out. I don't really mind having a bit of duplicated functionality in computeScalableBound; as you said it's quite isolated. I can look into adding the missing configuration options in a follow-up.

I've now reworked things things as suggested above (which has removed the PopulateCustomValueBoundsFn callback).

matthias-springer · 2024-03-08T13:31:42Z

mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp

@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
  shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
  return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
 }
+
+FailureOr<vector::ConstantOrScalableBound::BoundSize>
+vector::ConstantOrScalableBound::getSize() const {


Why do you need all this pattern matching?

It's for converting the affine_map to a single quantity. In my mind computeScalableBound() is just like computeConstantBound(), but it supports scalability. So in many cases you can convert the bound to a single (possibly scalable) value (which is simply represented with a int64_t and a scalable flag, like it is in the vector type).

mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h

mlir/test/Dialect/Vector/test-scalable-bounds.mlir

mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h

matthias-springer · 2024-03-18T02:44:59Z

mlir/test/lib/Dialect/Affine/TestReifyValueBounds.cpp

+
+        auto loc = op->getLoc();
+        auto reifiedScalable =
+            vector::ScalableValueBoundsConstraintSet::computeScalableBound(


It feels like this test functionality belongs to the vector dialect. But then we would also have to duplicate some functionality there. I'll leave it up to you.

I'd prefer to just leave this here for now, it is slightly awkwardly placed, but still convenient for testing.

banach-space

LGTM, thanks for implementing this Ben!

Few more nits, but feel free to ignore - mostly me trying to grasp few more bits.

banach-space · 2024-03-19T15:47:41Z

mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h

+  using RTTIExtends::StopConditionFn;
+
+  /// A thin wrapper over an `AffineMap` which can represent a constant bound,
+  /// or a scalable bound (in terms of vscale). The `AffineMap` will always


[nit] vscale -> vector.vscale

Here I'm referring to the concept of vscale, not the vector dialect operation.

mlir/lib/Dialect/Vector/IR/ValueBoundsOpInterfaceImpl.cpp

mlir/test/Dialect/Vector/test-scalable-bounds.mlir

mlir/lib/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.cpp

This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` will always take at most one parameter, vscale, and return a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::computeScalableBound(Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors). Squashed commits: Fix `affine_map` in dynamic dim test Require vscale min/max to be explicit in tests Rewrite to use RTTI and vscale impl of `ValueBoundsOpInterface` Expose computeScalableBound() `stopCondition` Hide ValueBoundsConstraintSet methods on ScalableValueBoundsConstraintSet

MacDue · 2024-03-20T20:48:26Z

Thanks a lot for the reviews! Big thanks to Matthias too (as the expert here) :)

I'll land this tomorrow if there's no further comments.

This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.

…83876) This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).

This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.

This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. Squashed: Review fixups Review fixups (round 2) CI fix Add `iree-llvmcpu-stack-allocation-assumed-vscale` option This allows changing the value of vscale used when checking if (scalable) allocations are within the stack limit. Internal suggestion from @ banach-space.

This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.

#16869) This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1` (by default), since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. ci-extra: build_test_all_arm64

MacDue requested review from c-rhodes and banach-space March 4, 2024 17:36

MacDue requested review from dcaballe, nicolasvasilache and matthias-springer as code owners March 4, 2024 17:36

llvmbot added mlir:affine mlir:vectorops mlir mlir:vector labels Mar 4, 2024

MacDue requested a review from hanhanW March 5, 2024 17:10

banach-space reviewed Mar 7, 2024

View reviewed changes

mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir Outdated Show resolved Hide resolved

banach-space reviewed Mar 8, 2024

View reviewed changes

matthias-springer reviewed Mar 8, 2024

View reviewed changes

matthias-springer approved these changes Mar 18, 2024

View reviewed changes

banach-space approved these changes Mar 19, 2024

View reviewed changes

matthias-springer mentioned this pull request Mar 20, 2024

[mlir][SCF] ValueBoundsConstraintSet: Support scf.if (branches) #85895

Merged

MacDue added 2 commits March 20, 2024 10:18

Fixups

8aba696

MacDue force-pushed the scalable_bounds branch from e59bc9c to 8aba696 Compare March 20, 2024 20:37

MacDue merged commit 2861856 into llvm:main Mar 21, 2024

MacDue deleted the scalable_bounds branch March 21, 2024 14:19

MacDue mentioned this pull request Mar 21, 2024

[Codegen] Support scalability when hoisting (and checking) allocations iree-org/iree#16869

Merged

[mlir][Vector] Add utility for computing scalable value bounds #83876

[mlir][Vector] Add utility for computing scalable value bounds #83876

Uh oh!

Conversation

MacDue commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Mar 6, 2024

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MacDue commented Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Mar 4, 2024 •

edited

Loading

llvmbot commented Mar 4, 2024 •

edited

Loading

MacDue Mar 8, 2024 •

edited

Loading

MacDue Mar 8, 2024 •

edited

Loading

MacDue commented Mar 20, 2024 •

edited

Loading