-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds an end-to-end test for `tensor.pack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few non-trivial fixes in handling `tensor.pack`: * llvm#114315, llvm#114559, llvm#113108. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes.
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) ChangesAdds an end-to-end test for
The end goal for this test is to incrementally increase its complexity Full diff: https://github.com/llvm/llvm-project/pull/115698.diff 1 Files Affected:
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
new file mode 100644
index 00000000000000..bec1b9a4e9d82f
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
@@ -0,0 +1,97 @@
+// DEFINE: %{compile} = mlir-opt %s \
+// DEFINE: -transform-interpreter -test-transform-dialect-erase-schedule |\
+// DEFINE: mlir-opt --test-linalg-transform-patterns="test-generalize-tensor-pack"\
+// DEFINE: --test-transform-dialect-erase-schedule \
+// DEFINE: -one-shot-bufferize="bufferize-function-boundaries" \
+// DEFINE: -buffer-deallocation-pipeline="private-function-dynamic-ownership" \
+// DEFINE: -cse -canonicalize -test-lower-to-llvm -o %t
+// DEFINE: %{entry_point} = main
+// DEFINE: %{run} = mlir-cpu-runner %t -e %{entry_point} -entry-point-result=void \
+// DEFINE: -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils
+
+// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s
+
+/// End-to-end test for tensor.pack where one of the inner tile sizes is
+/// dynamic.
+///
+/// Note, ATM this is a relatively simple example, with no vectorization and
+/// the dynamic tile size being a compile-time constant. The intention is to
+/// incrementally expand the config to something much more complex.
+
+func.func @main() {
+ // Allocate and initialise the inputs
+ %A_alloc = tensor.empty() : tensor<7x16xi32>
+
+ %A = arith.constant dense<[
+ [ 1, 8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78, 85, 92, 99 , 106],
+ [ 2, 9, 16, 23, 30, 37, 44, 51, 58, 65, 72, 79, 86, 93, 100, 107],
+ [ 3, 10, 17, 24, 31, 38, 45, 52, 59, 66, 73, 80, 87, 94, 101, 108],
+ [ 4, 11, 18, 25, 32, 39, 46, 53, 60, 67, 74, 81, 88, 95, 102, 109],
+ [ 5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96, 103, 110],
+ [ 6, 13, 20, 27, 34, 41, 48, 55, 62, 69, 76, 83, 90, 97, 104, 111],
+ [ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112]
+ ]> : tensor<7x16xi32>
+
+ func.call @pack(%A) : (tensor<7x16xi32>) -> ()
+
+ return
+}
+
+func.func private @pack(%A: tensor<7x16xi32>) {
+ %c1 = arith.constant 1 : index
+ %pad_val = arith.constant 123 : i32
+
+ // Dynamic tile size
+ %tile_size = arith.constant 8 : index
+ %A_pack_empty = tensor.empty(%c1, %tile_size) : tensor<?x16x?x1xi32>
+
+ %A_pack = tensor.pack %A
+ padding_value(%pad_val : i32)
+ inner_dims_pos = [0, 1]
+ inner_tiles = [%tile_size, 1]
+ into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
+ %A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>
+
+ // Print the results
+ // CHECK: Unranked Memref base@ = 0{{.*}} rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
+ // Tile 1: (8 x 1)
+ // CHECK-NEXT: 1
+ // CHECK-NEXT: 2
+ // CHECK-NEXT: 3
+ // CHECK-NEXT: 4
+ // CHECK-NEXT: 5
+ // CHECK-NEXT: 6
+ // CHECK-NEXT: 7
+ // Expect pad value after 7 elements
+ // CHECK-NEXT: 123
+ // Tile 2: (8 x 1)
+ // CHECK-NEXT: 8
+ // CHECK-NEXT: 9
+ // CHECK-NEXT: 10
+ // CHECK-NEXT: 11
+ // CHECK-NEXT: 12
+ // CHECK-NEXT: 13
+ // CHECK-NEXT: 14
+ // Expect pad value after further 7 elements
+ // CHECK-NEXT: 123
+ // Tile 3: (8 x 1)
+ // CHECK-NEXT: 15
+ // CHECK-NEXT: 16
+ // ...
+ call @printMemrefI32(%A_cast) : (tensor<*xi32>) -> ()
+
+ return
+}
+
+module @transforms attributes { transform.with_named_sequence } {
+ transform.named_sequence @__transform_main(%module: !transform.any_op {transform.readonly}) {
+ %pack = transform.structured.match ops{["tensor.pack"]} in %module : (!transform.any_op) -> !transform.any_op
+
+ %tiled_linalg_op_p, %loops:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
+ : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)
+
+ transform.yield
+ }
+}
+
+func.func private @printMemrefI32(%ptr : tensor<*xi32>)
|
%pad_val = arith.constant 123 : i32 | ||
|
||
// Dynamic tile size | ||
%tile_size = arith.constant 8 : index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be truly dynamic could we pass tile_size as an arg to @pack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the next step ;-)
The PRs that I am listing in the summary where specifically for the case when the tile size is an SSA value that's a known compile-time constant. See e.g. this: #114315
So yes, you are right, this is not truly dynamic. But it's enough of a complication to have revealed numerous issues.
into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32> | ||
%A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32> | ||
|
||
// Print the results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore if this ask is not appealing - could you please
add a commented section on how the padded tiled version looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really need to run this stuff and check yourself - it's so cool and illustrative! :) (as in, I encourage you to experiment, but also acknowledge your comment as a valid request)
Here are first two 8x1
tiles:
Unranked Memref base@ = 0x125f07e80 rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
[[[[1],
[2],
[3],
[4],
[5],
[6],
[7],
[123]],
[[8],
[9],
[10],
[11],
[12],
[13],
[14],
[123]],
That's very similar to the check-lines, but I avoided the brackets which are a bit annoying to match. Would you still find it helpful as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]
/// Note, ATM this is a relatively simple example, with no vectorization and | ||
/// the dynamic tile size being a compile-time constant. The intention is to | ||
/// incrementally expand the config to something much more complex. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nicely written test. learnt some new stuff here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM.
into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32> | ||
%A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32> | ||
|
||
// Print the results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]
This PR simply creates `populateGeneralizePatterns` to collect the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO), and wraps the newly created `populate*` method into a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows creating more involved e2e compilation pipelines for `tensor.pack` and `tensor.unpack` Ops. I will require this this Op, in a forthcoming PR that will build on top of llvm#115698. No new tests are added. Instead, I've re-used existing tests: * "generalize-tensor-pack.mlir".
This PR simply creates `populateGeneralizePatterns` to collect the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO), and wraps the newly created `populate*` method into a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows creating more involved e2e compilation pipelines for `tensor.pack` and `tensor.unpack` Ops. I will require this this Op, in a forthcoming PR that will build on top of llvm#115698. No new tests are added. Instead, I've re-used existing tests: * "generalize-tensor-pack.mlir".
This PR simply creates `populateGeneralizePatterns` to collect the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO), and wraps the newly created `populate*` method into a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows creating more involved e2e compilation pipelines for `tensor.pack` and `tensor.unpack` Ops. I will require this this Op, in a forthcoming PR that will build on top of llvm#115698. No new tests are added. Instead, I've re-used existing tests: * "generalize-tensor-pack.mlir".
…lect Op This PR introduces populateGeneralizePatterns, which collects the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO). These patterns are wrapped in a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates creating more involved end-to-end compilation pipelines for `tensor.pack` and `tensor.unpack` operations. It will be required in an upcoming PR building on top of llvm#115698. No new tests are added in this PR. Instead, existing tests from: * "generalize-tensor-pack.mlir" are reused. To achieve this: * I've updated the test to use `transform.apply_patterns.linalg.generalize_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-generalize-tensor-pack"`, avoiding artificial tests solely for the TD Op. * The TD sequence is saved to a new file, "generalize_pack.mlir", and pre-loaded using the option: `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'` This avoids duplicating the sequence for every "split" in the input file. * Added lit.local.cfg to exclude the "test/Dialect/Linalg/td" directory from test discovery, ensuring "generalize_pack.mlir" is not treated as a test file.
…lect Op This PR introduces populateGeneralizePatterns, which collects the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO). These patterns are wrapped in a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates creating more involved end-to-end compilation pipelines for `tensor.pack` and `tensor.unpack` operations. It will be required in an upcoming PR building on top of llvm#115698. No new tests are added in this PR. Instead, existing tests from: * "generalize-tensor-pack.mlir" are reused. To achieve this: * I've updated the test to use `transform.apply_patterns.linalg.generalize_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-generalize-tensor-pack"`, avoiding artificial tests solely for the TD Op. * The TD sequence is saved to a new file, "generalize_pack.mlir", and pre-loaded using the option: `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'` This avoids duplicating the sequence for every "split" in the input file. * Added lit.local.cfg to exclude the "test/Dialect/Linalg/td" directory from test discovery, ensuring "generalize_pack.mlir" is not treated as a test file.
This PR introduces populateGeneralizePatterns, which collects the following patterns: * `GeneralizeOuterUnitDimsPackOpPattern`, * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO). These patterns are wrapped in a new Transform Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates creating more involved end-to-end compilation pipelines for `tensor.pack` and `tensor.unpack` operations. It will be required in an upcoming PR building on top of #115698. No new tests are added in this PR. Instead, existing tests from: * "generalize-tensor-pack.mlir" are reused. To achieve this: * I've updated the test to use `transform.apply_patterns.linalg.generalize_pack_unpack` instead of the flag `--test-linalg-transform-patterns="test-generalize-tensor-pack"`, avoiding artificial tests solely for the TD Op. * The TD sequence is saved to a new file, "generalize_pack.mlir", and pre-loaded using the option: `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'` This avoids duplicating the sequence for every "split" in the input file. * Added "lit.local.cfg" to exclude the "test/Dialect/Linalg/td" directory from test discovery, ensuring "generalize_pack.mlir" is not treated as a test file.
Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * llvm#119379, llvm#121393, llvm#121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements llvm#115698 in which similar test for `tensor.pack` was added.
…#121557) Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * #119379, #121393, #121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements #115698 in which similar test for `tensor.pack` was added.
Adds an end-to-end test for
tensor.pack
with dynamic inner tile sizes.While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling
tensor.pack
:GeneralizeOuterUnitDimsPackOpPattern
#114315, [mlir][tensor] ImproveFoldTensorCastProducerOp
(dynamic shapes) #114559, [mlir][tensor] Restrict the verifier for tensor.pack/tensor.unpack #113108.The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.