[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

banach-space · 2024-11-11T09:21:55Z

Adds an end-to-end test for tensor.pack with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling tensor.pack:

[mlir][tensor] Generalize/restrict GeneralizeOuterUnitDimsPackOpPattern #114315, [mlir][tensor] Improve FoldTensorCastProducerOp (dynamic shapes) #114559, [mlir][tensor] Restrict the verifier for tensor.pack/tensor.unpack #113108.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Adds an end-to-end test for `tensor.pack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few non-trivial fixes in handling `tensor.pack`: * llvm#114315, llvm#114559, llvm#113108. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes.

llvmbot · 2024-11-11T09:22:31Z

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

Adds an end-to-end test for tensor.pack with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling tensor.pack:

#114315, #114559, #113108.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Full diff: https://github.com/llvm/llvm-project/pull/115698.diff

1 Files Affected:

(added) mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir (+97)

diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
new file mode 100644
index 00000000000000..bec1b9a4e9d82f
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
@@ -0,0 +1,97 @@
+// DEFINE: %{compile} =  mlir-opt %s \
+// DEFINE: -transform-interpreter -test-transform-dialect-erase-schedule |\
+// DEFINE:  mlir-opt --test-linalg-transform-patterns="test-generalize-tensor-pack"\
+// DEFINE:    --test-transform-dialect-erase-schedule \
+// DEFINE:    -one-shot-bufferize="bufferize-function-boundaries" \
+// DEFINE:    -buffer-deallocation-pipeline="private-function-dynamic-ownership" \
+// DEFINE:    -cse -canonicalize -test-lower-to-llvm -o %t
+// DEFINE: %{entry_point} = main
+// DEFINE: %{run} = mlir-cpu-runner %t -e %{entry_point} -entry-point-result=void \
+// DEFINE:    -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils
+
+// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s
+
+/// End-to-end test for tensor.pack where one of the inner tile sizes is
+/// dynamic.
+///
+/// Note, ATM this is a relatively simple example, with no vectorization and
+/// the dynamic tile size being a compile-time constant. The intention is to
+/// incrementally expand the config to something much more complex.
+
+func.func @main() {
+  // Allocate and initialise the inputs
+  %A_alloc = tensor.empty() : tensor<7x16xi32>
+
+  %A = arith.constant dense<[
+    [ 1,  8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78, 85, 92, 99 , 106],
+    [ 2,  9, 16, 23, 30, 37, 44, 51, 58, 65, 72, 79, 86, 93, 100, 107],
+    [ 3, 10, 17, 24, 31, 38, 45, 52, 59, 66, 73, 80, 87, 94, 101, 108],
+    [ 4, 11, 18, 25, 32, 39, 46, 53, 60, 67, 74, 81, 88, 95, 102, 109],
+    [ 5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96, 103, 110],
+    [ 6, 13, 20, 27, 34, 41, 48, 55, 62, 69, 76, 83, 90, 97, 104, 111],
+    [ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112]
+  ]> : tensor<7x16xi32>
+
+  func.call @pack(%A) : (tensor<7x16xi32>) -> ()
+
+  return
+}
+
+func.func private @pack(%A: tensor<7x16xi32>) {
+  %c1 = arith.constant 1 : index
+  %pad_val = arith.constant 123 : i32
+
+  // Dynamic tile size
+  %tile_size = arith.constant 8 : index
+  %A_pack_empty = tensor.empty(%c1, %tile_size) : tensor<?x16x?x1xi32>
+
+  %A_pack = tensor.pack %A
+    padding_value(%pad_val : i32)
+    inner_dims_pos = [0, 1]
+    inner_tiles = [%tile_size, 1]
+    into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
+  %A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>
+
+  // Print the results
+  // CHECK: Unranked Memref base@ = 0{{.*}} rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
+  // Tile 1: (8 x 1)
+  // CHECK-NEXT:  1
+  // CHECK-NEXT:  2
+  // CHECK-NEXT:  3
+  // CHECK-NEXT:  4
+  // CHECK-NEXT:  5
+  // CHECK-NEXT:  6
+  // CHECK-NEXT:  7
+  // Expect pad value after 7 elements
+  // CHECK-NEXT:  123
+  // Tile 2: (8 x 1)
+  // CHECK-NEXT:  8
+  // CHECK-NEXT:  9
+  // CHECK-NEXT:  10
+  // CHECK-NEXT:  11
+  // CHECK-NEXT:  12
+  // CHECK-NEXT:  13
+  // CHECK-NEXT:  14
+  // Expect pad value after further 7 elements
+  // CHECK-NEXT:  123
+  // Tile 3: (8 x 1)
+  // CHECK-NEXT:  15
+  // CHECK-NEXT:  16
+  // ...
+  call @printMemrefI32(%A_cast) : (tensor<*xi32>) -> ()
+
+  return
+}
+
+module @transforms attributes { transform.with_named_sequence } {
+  transform.named_sequence @__transform_main(%module: !transform.any_op {transform.readonly}) {
+    %pack = transform.structured.match ops{["tensor.pack"]} in %module : (!transform.any_op) -> !transform.any_op
+
+    %tiled_linalg_op_p, %loops:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
+       : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)
+
+    transform.yield
+  }
+}
+
+func.func private @printMemrefI32(%ptr : tensor<*xi32>)

javedabsar1 · 2024-11-13T11:57:33Z

mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir

+  %pad_val = arith.constant 123 : i32
+
+  // Dynamic tile size
+  %tile_size = arith.constant 8 : index


to be truly dynamic could we pass tile_size as an arg to @pack?

That's the next step ;-)

The PRs that I am listing in the summary where specifically for the case when the tile size is an SSA value that's a known compile-time constant. See e.g. this: #114315

So yes, you are right, this is not truly dynamic. But it's enough of a complication to have revealed numerous issues.

javedabsar1 · 2024-11-13T12:06:07Z

mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir

+    into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
+  %A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>
+
+  // Print the results


Please ignore if this ask is not appealing - could you please
add a commented section on how the padded tiled version looks like.

You really need to run this stuff and check yourself - it's so cool and illustrative! :) (as in, I encourage you to experiment, but also acknowledge your comment as a valid request)

Here are first two 8x1 tiles:

Unranked Memref base@ = 0x125f07e80 rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data = [[[[1], [2], [3], [4], [5], [6], [7], [123]], [[8], [9], [10], [11], [12], [13], [14], [123]],

That's very similar to the check-lines, but I avoided the brackets which are a bit annoying to match. Would you still find it helpful as a comment?

Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]

javedabsar1 · 2024-11-13T12:06:47Z

mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir

+/// Note, ATM this is a relatively simple example, with no vectorization and
+/// the dynamic tile size being a compile-time constant. The intention is to
+/// incrementally expand the config to something much more complex.
+


Very nicely written test. learnt some new stuff here.

javedabsar1

Thanks. LGTM.

javedabsar1 · 2024-11-13T13:52:57Z

mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir

+    into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
+  %A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>
+
+  // Print the results


Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]

This PR simply creates `populateGeneralizePatterns` to collect the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO), and wraps the newly created `populate*` method into a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows creating more involved e2e compilation pipelines for `tensor.pack` and `tensor.unpack` Ops. I will require this this Op, in a forthcoming PR that will build on top of llvm#115698. No new tests are added. Instead, I've re-used existing tests: * "generalize-tensor-pack.mlir".

…lect Op This PR introduces populateGeneralizePatterns, which collects the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO). These patterns are wrapped in a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates creating more involved end-to-end compilation pipelines for `tensor.pack` and `tensor.unpack` operations. It will be required in an upcoming PR building on top of llvm#115698. No new tests are added in this PR. Instead, existing tests from: * "generalize-tensor-pack.mlir" are reused. To achieve this: * I've updated the test to use `transform.apply_patterns.linalg.generalize_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-generalize-tensor-pack"`, avoiding artificial tests solely for the TD Op. * The TD sequence is saved to a new file, "generalize_pack.mlir", and pre-loaded using the option: `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'` This avoids duplicating the sequence for every "split" in the input file. * Added lit.local.cfg to exclude the "test/Dialect/Linalg/td" directory from test discovery, ensuring "generalize_pack.mlir" is not treated as a test file.

This PR introduces populateGeneralizePatterns, which collects the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO). These patterns are wrapped in a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates creating more involved end-to-end compilation pipelines for `tensor.pack` and `tensor.unpack` operations. It will be required in an upcoming PR building on top of #115698. No new tests are added in this PR. Instead, existing tests from: * "generalize-tensor-pack.mlir" are reused. To achieve this: * I've updated the test to use `transform.apply_patterns.linalg.generalize_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-generalize-tensor-pack"`, avoiding artificial tests solely for the TD Op. * The TD sequence is saved to a new file, "generalize_pack.mlir", and pre-loaded using the option: `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'` This avoids duplicating the sequence for every "split" in the input file. * Added "lit.local.cfg" to exclude the "test/Dialect/Linalg/td" directory from test discovery, ensuring "generalize_pack.mlir" is not treated as a test file.

Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * llvm#119379, llvm#121393, llvm#121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements llvm#115698 in which similar test for `tensor.pack` was added.

…#121557) Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * #119379, #121393, #121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements #115698 in which similar test for `tensor.pack` was added.

llvmbot added mlir:linalg mlir labels Nov 11, 2024

banach-space requested a review from c-rhodes November 11, 2024 09:22

banach-space requested a review from javedabsar1 November 11, 2024 09:22

javedabsar1 requested changes Nov 13, 2024

View reviewed changes

javedabsar1 approved these changes Nov 14, 2024

View reviewed changes

banach-space merged commit 33a9c26 into llvm:main Nov 14, 2024
11 checks passed

banach-space deleted the andrzej/tensor_pack_e2e branch November 14, 2024 11:25

banach-space mentioned this pull request Nov 15, 2024

[mlir] Add apply_patterns.linalg.generalize_pack_unpack TD Op #116373

Merged

banach-space mentioned this pull request Jan 3, 2025

[mlir][tensor] Add e2e test for tensor.unpack with dynamic tile sizes #121557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

banach-space commented Nov 11, 2024

llvmbot commented Nov 11, 2024 •

edited

Loading

javedabsar1 Nov 13, 2024

banach-space Nov 13, 2024

javedabsar1 Nov 13, 2024

banach-space Nov 13, 2024

javedabsar1 Nov 13, 2024

javedabsar1 Nov 13, 2024

javedabsar1 left a comment

javedabsar1 Nov 13, 2024

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

Conversation

banach-space commented Nov 11, 2024

llvmbot commented Nov 11, 2024 • edited Loading

javedabsar1 Nov 13, 2024

Choose a reason for hiding this comment

banach-space Nov 13, 2024

Choose a reason for hiding this comment

javedabsar1 Nov 13, 2024

Choose a reason for hiding this comment

banach-space Nov 13, 2024

Choose a reason for hiding this comment

javedabsar1 Nov 13, 2024

Choose a reason for hiding this comment

javedabsar1 Nov 13, 2024

Choose a reason for hiding this comment

javedabsar1 left a comment

Choose a reason for hiding this comment

javedabsar1 Nov 13, 2024

Choose a reason for hiding this comment

llvmbot commented Nov 11, 2024 •

edited

Loading