Skip to content

[mlir][Vector] Add utility for computing scalable value bounds #83876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 21, 2024

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Mar 4, 2024

This adds a new API built with the ValueBoundsConstraintSet to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.

The result is an AffineMap that will always take at most one parameter, vscale, and returns a single result, which is the bound of value.

The API is defined as follows:

FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
  Value value, std::optional<int64_t> dim,
  unsigned vscaleMin, unsigned vscaleMax,
  presburger::BoundType boundType, 
  bool closedUB = true,
  StopConditionFn stopCondition = nullptr);

Note: ConstantOrScalableBound is a thin wrapper over the AffineMap with a utility for converting the bound to a single quantity (i.e. a size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).

@llvmbot
Copy link
Member

llvmbot commented Mar 4, 2024

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-affine

@llvm/pr-subscribers-mlir

Author: Benjamin Maxwell (MacDue)

Changes

This adds a new API built with the ValueBoundsConstraintSet to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.

The result is an AffineMap will always take at most one parameter, vscale, and return a single result, which is the bound of value.

The API is defined as follows:

FailureOr&lt;ConstantOrScalableBound&gt;
vector::computeScalableBound(Value value, std::optional&lt;int64_t&gt; dim,
                             unsigned vscaleMin, unsigned vscaleMax,
                             presburger::BoundType boundType);

Note: ConstantOrScalableBound is a thin wrapper over the AffineMap with a utility for converting the bound to a single quantity (i.e. a size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).


Patch is 23.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83876.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h (+29)
  • (modified) mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h (+19-1)
  • (modified) mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp (+130)
  • (modified) mlir/lib/Interfaces/ValueBoundsOpInterface.cpp (+73-29)
  • (added) mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir (+137)
  • (modified) mlir/test/lib/Dialect/Affine/TestReifyValueBounds.cpp (+29-1)
diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index f6b03a0f2c8007..635d609d0b3e71 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -9,6 +9,7 @@
 #ifndef MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
 #define MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
 
+#include "mlir/Analysis/Presburger/IntegerRelation.h"
 #include "mlir/Dialect/Utils/IndexingUtils.h"
 #include "mlir/Dialect/Vector/IR/VectorOps.h"
 #include "mlir/IR/BuiltinAttributes.h"
@@ -98,6 +99,34 @@ bool isContiguousSlice(MemRefType memrefType, VectorType vectorType);
 std::optional<StaticTileOffsetRange>
 createUnrollIterator(VectorType vType, int64_t targetRank = 1);
 
+struct ConstantOrScalableBound {
+  AffineMap map;
+
+  struct BoundSize {
+    int64_t baseSize{0};
+    bool scalable{false};
+  };
+
+  /// Get the (possibly) scalable size of the bound, returns failure if the
+  /// bound cannot be represented as a single quantity.
+  FailureOr<BoundSize> getSize() const;
+};
+
+/// Computes a (possibly) scalable bound for a given value. This is similar to
+/// `ValueBoundsConstraintSet::computeConstantBound()`, but uses knowledge of
+/// the range of vscale to compute either a constant bound, an expression in
+/// terms of vscale, or failure if no bound can be computed.
+///
+/// The resulting `AffineMap` will always take at most one parameter, vscale,
+/// and return a single result, which is the bound of `value`.
+///
+/// Note: `vscaleMin` must be `<=` to `vscaleMax`. If `vscaleMin` ==
+/// `vscaleMax`, the resulting bound (if found), will be constant.
+FailureOr<ConstantOrScalableBound>
+computeScalableBound(Value value, std::optional<int64_t> dim,
+                     unsigned vscaleMin, unsigned vscaleMax,
+                     presburger::BoundType boundType);
+
 } // namespace vector
 
 /// Constructs a permutation map of invariant memref indices to vector
diff --git a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
index 28dadfb9ecf868..6d0e16bf215f8a 100644
--- a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
+++ b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
@@ -265,10 +265,28 @@ class ValueBoundsConstraintSet {
 
   ValueBoundsConstraintSet(MLIRContext *ctx);
 
+  /// A callback to allow injecting custom value bounds constraints.
+  /// It takes the current value, the dim (or kIndexValue), and a reference to
+  /// the constraints set.
+  using PopulateCustomValueBoundsFn =
+      function_ref<void(Value, int64_t, ValueBoundsConstraintSet &)>;
+
+  /// Populates the constraint set for a value/map without actually computing
+  /// the bound.
+  int64_t populateConstraintsSet(
+      Value value, std::optional<int64_t> dim = std::nullopt,
+      PopulateCustomValueBoundsFn customValueBounds = nullptr,
+      StopConditionFn stopCondition = nullptr);
+  int64_t populateConstraintsSet(
+      AffineMap map, ValueDimList mapOperands,
+      PopulateCustomValueBoundsFn customValueBounds = nullptr,
+      StopConditionFn stopCondition = nullptr, int64_t *posOut = nullptr);
+
   /// Iteratively process all elements on the worklist until an index-typed
   /// value or shaped value meets `stopCondition`. Such values are not processed
   /// any further.
-  void processWorklist(StopConditionFn stopCondition);
+  void processWorklist(StopConditionFn stopCondition,
+                       PopulateCustomValueBoundsFn customValueBounds = nullptr);
 
   /// Bound the given column in the underlying constraint set by the given
   /// expression.
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index d613672608c3ad..77b2ad6ff540ce 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -24,6 +24,7 @@
 #include "mlir/IR/IntegerSet.h"
 #include "mlir/IR/Operation.h"
 #include "mlir/IR/TypeUtilities.h"
+#include "mlir/Interfaces/ValueBoundsOpInterface.h"
 #include "mlir/Support/LLVM.h"
 #include "mlir/Support/MathExtras.h"
 
@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
   shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
   return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
 }
+
+FailureOr<vector::ConstantOrScalableBound::BoundSize>
+vector::ConstantOrScalableBound::getSize() const {
+  if (map.isSingleConstant())
+    return BoundSize{map.getSingleConstantResult(), /*scalable=*/false};
+  if (map.getNumResults() != 1 || map.getNumInputs() != 1)
+    return failure();
+  auto binop = dyn_cast<AffineBinaryOpExpr>(map.getResult(0));
+  if (!binop || binop.getKind() != AffineExprKind::Mul)
+    return failure();
+  auto matchConstant = [&](AffineExpr expr, int64_t &constant) -> bool {
+    if (auto cst = dyn_cast<AffineConstantExpr>(expr)) {
+      constant = cst.getValue();
+      return true;
+    }
+    return false;
+  };
+  // Match `s0 * cst` or `cst * s0`:
+  int64_t cst = 0;
+  auto lhs = binop.getLHS();
+  auto rhs = binop.getRHS();
+  if ((matchConstant(lhs, cst) && isa<AffineSymbolExpr>(rhs)) ||
+      (matchConstant(rhs, cst) && isa<AffineSymbolExpr>(lhs))) {
+    return BoundSize{cst, /*scalable=*/true};
+  }
+  return failure();
+}
+
+namespace {
+struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {
+  using ValueBoundsConstraintSet::ValueBoundsConstraintSet;
+
+  static Operation *getOwnerOfValue(Value value) {
+    if (auto bbArg = dyn_cast<BlockArgument>(value))
+      return bbArg.getOwner()->getParentOp();
+    return value.getDefiningOp();
+  }
+
+  static FailureOr<AffineMap>
+  computeScalableBound(Value value, std::optional<int64_t> dim,
+                       unsigned vscaleMin, unsigned vscaleMax,
+                       presburger::BoundType boundType) {
+    using namespace presburger;
+
+    assert(vscaleMin <= vscaleMax);
+    ScalableValueBoundsConstraintSet cstr(value.getContext());
+
+    Value vscale;
+    int64_t pos = cstr.populateConstraintsSet(
+        value, dim,
+        /* Custom vscale value bounds */
+        [&vscale, vscaleMin, vscaleMax](Value value, int64_t dim,
+                                        ValueBoundsConstraintSet &cstr) {
+          if (dim != ValueBoundsConstraintSet::kIndexValue)
+            return;
+          if (isa_and_present<vector::VectorScaleOp>(getOwnerOfValue(value))) {
+            if (vscale) {
+              // All copies of vscale are equivalent.
+              cstr.bound(value) == cstr.getExpr(vscale);
+            } else {
+              // We know vscale is confined to [vscaleMin, vscaleMax].
+              cstr.bound(value) >= vscaleMin;
+              cstr.bound(value) <= vscaleMax;
+              vscale = value;
+            }
+          }
+        },
+        /* Stop condition */
+        [](auto, auto) {
+          // Keep adding constraints till the worklist is empty.
+          return false;
+        });
+
+    // Project out all variables apart from the first vscale.
+    cstr.projectOut([&](ValueDim p) { return p.first != vscale; });
+
+    assert(cstr.cstr.getNumDimAndSymbolVars() ==
+               cstr.positionToValueDim.size() &&
+           "inconsistent mapping state");
+
+    for (int64_t i = 0; i < cstr.cstr.getNumDimAndSymbolVars(); ++i) {
+      if (i == pos)
+        continue;
+      if (cstr.positionToValueDim[i] !=
+          ValueDim(vscale, ValueBoundsConstraintSet::kIndexValue)) {
+        return failure();
+      }
+    }
+
+    SmallVector<AffineMap, 1> lowerBound(1), upperBound(1);
+    cstr.cstr.getSliceBounds(pos, 1, value.getContext(), &lowerBound,
+                             &upperBound,
+                             /*closedUB=*/true);
+
+    auto invalidBound = [](auto &bound) {
+      return !bound[0] || bound[0].getNumResults() != 1;
+    };
+
+    AffineMap bound = [&] {
+      if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
+          lowerBound[0] == lowerBound[0]) {
+        return lowerBound[0];
+      } else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
+        return lowerBound[0];
+      } else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
+        return upperBound[0];
+      }
+      return AffineMap{};
+    }();
+
+    if (!bound)
+      return failure();
+
+    return bound;
+  }
+};
+
+} // namespace
+
+FailureOr<vector::ConstantOrScalableBound>
+vector::computeScalableBound(Value value, std::optional<int64_t> dim,
+                             unsigned vscaleMin, unsigned vscaleMax,
+                             presburger::BoundType boundType) {
+  auto bound = ScalableValueBoundsConstraintSet::computeScalableBound(
+      value, dim, vscaleMin, vscaleMax, boundType);
+  if (failed(bound))
+    return failure();
+  return ConstantOrScalableBound{*bound};
+}
diff --git a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
index 85abc2df894797..ac4e3b935a0542 100644
--- a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
+++ b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
@@ -191,7 +191,9 @@ static Operation *getOwnerOfValue(Value value) {
   return value.getDefiningOp();
 }
 
-void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
+void ValueBoundsConstraintSet::processWorklist(
+    StopConditionFn stopCondition,
+    PopulateCustomValueBoundsFn customValueBounds) {
   while (!worklist.empty()) {
     int64_t pos = worklist.front();
     worklist.pop();
@@ -215,8 +217,11 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
     if (stopCondition(value, maybeDim))
       continue;
 
-    // Query `ValueBoundsOpInterface` for constraints. New items may be added to
-    // the worklist.
+    // 1. Query `customValueBounds` for constraints (if provided).
+    if (customValueBounds)
+      customValueBounds(value, dim, *this);
+
+    // 2. Query `ValueBoundsOpInterface` for constraints.
     auto valueBoundsOp =
         dyn_cast<ValueBoundsOpInterface>(getOwnerOfValue(value));
     if (valueBoundsOp) {
@@ -228,6 +233,8 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
       continue;
     }
 
+    // Steps 1 and 2 above may add new items to the worklist.
+
     // If the op does not implement `ValueBoundsOpInterface`, check if it
     // implements the `DestinationStyleOpInterface`. OpResults of such ops are
     // tied to OpOperands. Tied values have the same shape.
@@ -471,55 +478,92 @@ FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
                               closedUB);
 }
 
+FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
+    presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
+    StopConditionFn stopCondition, bool closedUB) {
+  ValueDimList valueDims;
+  for (Value v : operands) {
+    assert(v.getType().isIndex() && "expected index type");
+    valueDims.emplace_back(v, std::nullopt);
+  }
+  return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+}
+
 FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
     presburger::BoundType type, AffineMap map, ValueDimList operands,
     StopConditionFn stopCondition, bool closedUB) {
   assert(map.getNumResults() == 1 && "expected affine map with one result");
   ValueBoundsConstraintSet cstr(map.getContext());
-  int64_t pos = cstr.insert(/*isSymbol=*/false);
+
+  int64_t pos = 0;
+  if (stopCondition) {
+    cstr.populateConstraintsSet(map, operands, nullptr, stopCondition, &pos);
+  } else {
+    // No stop condition specified: Keep adding constraints until a bound could
+    // be computed.
+    cstr.populateConstraintsSet(
+        map, operands, nullptr,
+        [&](Value v, std::optional<int64_t> dim) {
+          return cstr.cstr.getConstantBound64(type, pos).has_value();
+        },
+        &pos);
+  }
+  // Compute constant bound for `valueDim`.
+  int64_t ubAdjustment = closedUB ? 0 : 1;
+  if (auto bound = cstr.cstr.getConstantBound64(type, pos))
+    return type == BoundType::UB ? *bound + ubAdjustment : *bound;
+  return failure();
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+    Value value, std::optional<int64_t> dim,
+    PopulateCustomValueBoundsFn customValueBounds,
+    StopConditionFn stopCondition) {
+#ifndef NDEBUG
+  assertValidValueDim(value, dim);
+#endif // NDEBUG
+
+  AffineMap map =
+      AffineMap::get(/*dimCount=*/1, /*symbolCount=*/0,
+                     Builder(value.getContext()).getAffineDimExpr(0));
+  return populateConstraintsSet(map, {{value, dim}}, customValueBounds,
+                                stopCondition);
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+    AffineMap map, ValueDimList operands,
+    PopulateCustomValueBoundsFn customValueBounds,
+    StopConditionFn stopCondition, int64_t *posOut) {
+  assert(map.getNumResults() == 1 && "expected affine map with one result");
+  int64_t pos = insert(/*isSymbol=*/false);
+  if (posOut)
+    *posOut = pos;
 
   // Add map and operands to the constraint set. Dimensions are converted to
   // symbols. All operands are added to the worklist.
   auto mapper = [&](std::pair<Value, std::optional<int64_t>> v) {
-    return cstr.getExpr(v.first, v.second);
+    return getExpr(v.first, v.second);
   };
   SmallVector<AffineExpr> dimReplacements = llvm::to_vector(
       llvm::map_range(ArrayRef(operands).take_front(map.getNumDims()), mapper));
   SmallVector<AffineExpr> symReplacements = llvm::to_vector(
       llvm::map_range(ArrayRef(operands).drop_front(map.getNumDims()), mapper));
-  cstr.addBound(
+  addBound(
       presburger::BoundType::EQ, pos,
       map.getResult(0).replaceDimsAndSymbols(dimReplacements, symReplacements));
 
   // Process the backward slice of `operands` (i.e., reverse use-def chain)
   // until `stopCondition` is met.
   if (stopCondition) {
-    cstr.processWorklist(stopCondition);
+    processWorklist(stopCondition, customValueBounds);
   } else {
-    // No stop condition specified: Keep adding constraints until a bound could
-    // be computed.
-    cstr.processWorklist(
-        /*stopCondition=*/[&](Value v, std::optional<int64_t> dim) {
-          return cstr.cstr.getConstantBound64(type, pos).has_value();
-        });
+    // No stop condition specified: Keep adding constraints until the worklist
+    // is empty.
+    processWorklist([](Value v, std::optional<int64_t> dim) { return false; },
+                    customValueBounds);
   }
 
-  // Compute constant bound for `valueDim`.
-  int64_t ubAdjustment = closedUB ? 0 : 1;
-  if (auto bound = cstr.cstr.getConstantBound64(type, pos))
-    return type == BoundType::UB ? *bound + ubAdjustment : *bound;
-  return failure();
-}
-
-FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
-    presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
-    StopConditionFn stopCondition, bool closedUB) {
-  ValueDimList valueDims;
-  for (Value v : operands) {
-    assert(v.getType().isIndex() && "expected index type");
-    valueDims.emplace_back(v, std::nullopt);
-  }
-  return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+  return pos;
 }
 
 FailureOr<int64_t>
diff --git a/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
new file mode 100644
index 00000000000000..2afc4db874b73e
--- /dev/null
+++ b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
@@ -0,0 +1,137 @@
+// RUN: mlir-opt %s -test-affine-reify-value-bounds -cse -verify-diagnostics \
+// RUN:   -verify-diagnostics -split-input-file | FileCheck %s
+
+#fixedDim0Map = affine_map<(d0)[s0] -> (-d0 + 32400, s0)>
+#fixedDim1Map = affine_map<(d0)[s0] -> (-d0 + 16, s0)>
+
+// Here the upper bound for min_i is 4 x vscale, as we know 4 x vscale is
+// always less than 32400. The bound for min_j is 16 as at vscale > 4,
+// 4 x vscale will be > 16, so the value will be clamped at 16.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_0:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @fixed_size_loop_nest
+//   CHECK-DAG:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_0]]()[%vscale]
+//   CHECK-DAG:   %[[C16:.*]] = arith.constant 16 : index
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]], %[[C16]]) : (index, index) -> ()
+func.func @fixed_size_loop_nest() {
+  %c16 = arith.constant 16 : index
+  %c32400 = arith.constant 32400 : index
+  %c4 = arith.constant 4 : index
+  %c0 = arith.constant 0 : index
+  %vscale = vector.vscale
+  %c4_vscale = arith.muli %vscale, %c4 : index
+  scf.for %i = %c0 to %c32400 step %c4_vscale {
+    %min_i = affine.min #fixedDim0Map(%i)[%c4_vscale]
+    scf.for %j = %c0 to %c16 step %c4_vscale {
+      %min_j = affine.min #fixedDim1Map(%j)[%c4_vscale]
+      %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+      %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+      "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+    }
+  }
+  return
+}
+
+// -----
+
+#dynamicDim0Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+#dynamicDim1Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+
+// Here upper bounds for both min_i and min_j are both 4 x vscale, as we know
+// that is always the largest value they could take. As if `dim < 4 x vscale`
+// then 4 x vscale is an overestimate, and if `dim > 4 x vscale` then the min
+// will be clamped to 4 x vscale.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_1:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @dynamic_size_loop_nest
+//       CHECK:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_1]]()[%vscale]
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]], %[[SCALABLE_BOUND]]) : (index, index) -> ()
+func.func @dynamic_size_loop_nest(%dim0: index, %dim1: index) {
+  %c4 = arith.constant 4 : index
+  %c0 = arith.constant 0 : index
+  %vscale = vector.vscale
+  %c4_vscale = arith.muli %vscale, %c4 : index
+  scf.for %i = %c0 to %dim0 step %c4_vscale {
+    %min_i = affine.min #dynamicDim0Map(%i)[%c4_vscale, %dim0]
+    scf.for %j = %c0 to %dim1 step %c4_vscale {
+      %min_j = affine.min #dynamicDim1Map(%j)[%c4_vscale, %dim1]
+      %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+      %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+      "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+    }
+  }
+  return
+}
+
+// -----
+
+// Here the upper bound is just a value + a constant.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_2:.*]] = affine_map<()[s0] -> (s0 + 8)>
+
+// CHECK-LABEL: @add_to_vscale
+//       CHECK:   %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_2]]()[%vscale]
+//       CHECK:   "test.some_use"(%[[SCALABLE_BOUND]]) : (index) -> ()
+func.func @add_to_vscale() {
+  %vscale = vector.vscale
+  %c8 = arith.constant 8 : index
+  %vscale_plus_c8 = arith.addi %vscale, %c8 : index
+  %bound = "test.reify_scalable_bound"(%vscale_plus_c8) {type = "UB"} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we know vscale is always 2 so we get a constant upper bound.
+
+// CHECK-LABEL: @vscale_fixed_size
+//       CHECK:   %[[C2:.*]] = arith.constant 2 : index
+//       CHECK:   "test.some_use"(%[[C2]]) : (index) -> ()
+func.func @vscale_fixed_size() {
+  %vscale = vector.vscale
+  %bound = "test.reify_scalable_bound"(%vscale) {type = "UB", vscale_min = 2, vscale_max = 2} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we don't know the upper bound (%a is underspecified)
+
+func.func @unknown_bound(%a: index) {
+  %vscale = vector.vscale
+  %vscale_plus_a = arith.muli %vscale, %a : index
+  // expected-error @below{{could not reify bound}}
+  %bound = "test.reify_scalable_bound"(%vscale_plus_a) {type = "UB"} : (index) -> index
+  "test.some_use"(%bound) : (index) -> ()
+  return
+}
+
+// -----
+
+// Here we have two vscale values (that have not been CSE'd), but they should
+// still be treated as equivalent.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_3:.*]] = affine_map<()[s0] -> (s0 * 6)>
+
+// CHECK-LABEL: @duplicate_vscale_values
+//       CHECK:   %[...
[truncated]

@MacDue MacDue requested a review from hanhanW March 5, 2024 17:10
@hanhanW
Copy link
Contributor

hanhanW commented Mar 6, 2024

@matthias-springer I think you implemented ValueBoundsConstraintSet in the past, could you help review ?

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, left a few minor comments, thanks!

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments from me. I'm not familiar with this area, hence these are mostly requests for more clarification. Thanks!

}

namespace {
struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to see if we can reuse more functionality from ValueBoundsConstraintSet, so less has to be reimplemented here.

One thing that stands out to me in this PR is that vector.vscale does not implement ValueBoundsOpInterface.

Here's an idea that could work:

  • ScalableValueBoundsConstraintSet defines 3 fields: vscaleMin, vscaleMax and vscale.
  • vscaleMin and vscaleMax are initialized in the constructor of ScalableValueBoundsConstraintSet.
  • vscale is first seen vscale SSA value. "empty" value in the beginning.
  • The ValueBoundsOpInterface::populateBoundsForIndexValue implementation dyn_casts cstr to ScalableValueBoundsConstraintSet &. If that succeeds, we can get vscaleMin, vscaleMax and vscale and implement the same logic that you have here without the PopulateCustomValueBoundsFn callback.
  • At that point, we may just be able to call ValueBoundsConstraintSet::computeDependentBound, which gives you a bound in which only certain SSA values (vscale) are allowed to appear.

Copy link
Member Author

@MacDue MacDue Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ValueBoundsConstraintSet::computeDependentBound() does exactly what I want here. First, I'm not sure I want to stop at the first vscale found, also computeDependentBound() does not project out the valueDim, which in my experiments gave affine_maps that take two input symbols (presumably one for vscale and the valueDim), which did not usefully describe the bound.

Only once I eliminated all variables except vscale did I get the bounds I was interested in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think the other points above could work 👍)

Copy link
Member Author

@MacDue MacDue Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into the above, and I don't really think it helps much with reuse, and is a little complex to setup. To be able to dyn_cast the ValueBoundConstraintSet we'd need to implement LLVM RTTI for it, and then it still only makes sense to use the helper on ScalableValueBoundsConstraintSet to compute the bound. The other variants construct a non-scalable ValueBoundsConstraintSet internally, so would not be useful for computing bounds dependent on vscale.

And like I mentioned before none of the existing bounds functions do exactly what I want here, so I think I'd still need computeScalableBound() (and it's accompanying helpers).

I think it could be done, but I think (right now) all it'd really eliminate is the PopulateCustomValueBoundsFn callback.

Also, in the current implementation ScalableValueBoundsConstraintSet is implementation detail, that's not exposed in any header files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give it a try? I was mainly worried about adding new API (PopulateCustomValueBoundsFn) to ValueBoundsOpInterface when it is not needed.

Setting up the RTTI should be pretty easy, you can copy this: 6a43523

It would indeed only make sense for ScalableValueBoundsConstraintSet, but I don't see a problem with that.

I don't think ValueBoundsConstraintSet::computeDependentBound() does exactly what I want here. First, I'm not sure I want to stop at the first vscale found, also computeDependentBound() does not project out the valueDim, which in my experiments gave affine_maps that take two input symbols (presumably one for vscale and the valueDim), which did not usefully describe the bound.

Only once I eliminated all variables except vscale did I get the bounds I was interested in.

At the moment, stopFunction controls how far the traversal goes and also what SSA values to project out. It looks like you need some more fine-grained control here: computeScalableBound adds constraints for vscale, but it does not project vscale out. I don't really mind having a bit of duplicated functionality in computeScalableBound; as you said it's quite isolated. I can look into adding the missing configuration options in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now reworked things things as suggested above (which has removed the PopulateCustomValueBoundsFn callback).

@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
}

FailureOr<vector::ConstantOrScalableBound::BoundSize>
vector::ConstantOrScalableBound::getSize() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need all this pattern matching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for converting the affine_map to a single quantity. In my mind computeScalableBound() is just like computeConstantBound(), but it supports scalability. So in many cases you can convert the bound to a single (possibly scalable) value (which is simply represented with a int64_t and a scalable flag, like it is in the vector type).


auto loc = op->getLoc();
auto reifiedScalable =
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this test functionality belongs to the vector dialect. But then we would also have to duplicate some functionality there. I'll leave it up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just leave this here for now, it is slightly awkwardly placed, but still convenient for testing.

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for implementing this Ben!

Few more nits, but feel free to ignore - mostly me trying to grasp few more bits.

using RTTIExtends::StopConditionFn;

/// A thin wrapper over an `AffineMap` which can represent a constant bound,
/// or a scalable bound (in terms of vscale). The `AffineMap` will always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] vscale -> vector.vscale

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm referring to the concept of vscale, not the vector dialect operation.

MacDue added 2 commits March 20, 2024 10:18
This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.

The result is an `AffineMap` will always take at most one parameter,
vscale, and return a single result, which is the bound of `value`.

The API is defined as follows:

```c++
FailureOr<ConstantOrScalableBound>
vector::computeScalableBound(Value value, std::optional<int64_t> dim,
                             unsigned vscaleMin, unsigned vscaleMax,
                             presburger::BoundType boundType);
```

Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses
a similar analysis to hoist allocas, which currently fails for scalable
vectors).

Squashed commits:

Fix `affine_map` in dynamic dim test

Require vscale min/max to be explicit in tests

Rewrite to use RTTI and vscale impl of `ValueBoundsOpInterface`

Expose computeScalableBound() `stopCondition`

Hide ValueBoundsConstraintSet methods on ScalableValueBoundsConstraintSet
@MacDue
Copy link
Member Author

MacDue commented Mar 20, 2024

Thanks a lot for the reviews! Big thanks to Matthias too (as the expert here) :)

I'll land this tomorrow if there's no further comments.

@MacDue MacDue merged commit 2861856 into llvm:main Mar 21, 2024
@MacDue MacDue deleted the scalable_bounds branch March 21, 2024 14:19
MacDue added a commit to MacDue/iree that referenced this pull request Mar 21, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
…83876)

This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.

The result is an `AffineMap` that will always take at most one
parameter, vscale, and returns a single result, which is the bound of
`value`.

The API is defined as follows:

```c++
FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
  Value value, std::optional<int64_t> dim,
  unsigned vscaleMin, unsigned vscaleMax,
  presburger::BoundType boundType, 
  bool closedUB = true,
  StopConditionFn stopCondition = nullptr);
```

Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a
similar analysis to hoist allocas, which currently fails for scalable
vectors).
MacDue added a commit to MacDue/iree that referenced this pull request Mar 26, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
MacDue added a commit to MacDue/iree that referenced this pull request Mar 27, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
MacDue added a commit to MacDue/iree that referenced this pull request Apr 2, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
MacDue added a commit to MacDue/iree that referenced this pull request Apr 2, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.

Squashed:

Review fixups

Review fixups (round 2)

CI fix

Add `iree-llvmcpu-stack-allocation-assumed-vscale` option

This allows changing the value of vscale used when checking if
(scalable) allocations are within the stack limit. Internal suggestion
from @ banach-space.
MacDue added a commit to MacDue/iree that referenced this pull request Apr 3, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
MacDue added a commit to MacDue/iree that referenced this pull request Apr 3, 2024
This makes use of recent work (llvm/llvm-project#83876)
to support scalability in value bounds analysis. With these changes
scalable allocations (i.e. allocations that scale with vscale), can
now be hoisted and pass stack limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1`, since this
avoids needing to look at the target (and always resolves to a constant
bound), and still prevents excessive allocations.
banach-space pushed a commit to iree-org/iree that referenced this pull request Apr 4, 2024
#16869)

This makes use of recent work
(llvm/llvm-project#83876) to support scalability
in value bounds analysis. With these changes scalable allocations (i.e.
allocations that scale with vscale), can now be hoisted and pass stack
limit checking.

Hoisting scalable allocations requires knowledge of the range of vscale,
which is taken from the target (using the maximum range defined by the
ISA). The stack limit checking just assumes `vscale = 1` (by default),
since this avoids needing to look at the target (and always resolves to
a constant bound), and still prevents excessive allocations.

ci-extra: build_test_all_arm64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants