Skip to content

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

banach-space
Copy link
Contributor

Adds an end-to-end test for tensor.pack with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling tensor.pack:

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Adds an end-to-end test for `tensor.pack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling `tensor.pack`:

* llvm#114315, llvm#114559, llvm#113108.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

Adds an end-to-end test for tensor.pack with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few non-trivial fixes in handling tensor.pack:

  • #114315, #114559, #113108.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.


Full diff: https://github.com/llvm/llvm-project/pull/115698.diff

1 Files Affected:

  • (added) mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir (+97)
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
new file mode 100644
index 00000000000000..bec1b9a4e9d82f
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/pack-dynamic-inner-tile.mlir
@@ -0,0 +1,97 @@
+// DEFINE: %{compile} =  mlir-opt %s \
+// DEFINE: -transform-interpreter -test-transform-dialect-erase-schedule |\
+// DEFINE:  mlir-opt --test-linalg-transform-patterns="test-generalize-tensor-pack"\
+// DEFINE:    --test-transform-dialect-erase-schedule \
+// DEFINE:    -one-shot-bufferize="bufferize-function-boundaries" \
+// DEFINE:    -buffer-deallocation-pipeline="private-function-dynamic-ownership" \
+// DEFINE:    -cse -canonicalize -test-lower-to-llvm -o %t
+// DEFINE: %{entry_point} = main
+// DEFINE: %{run} = mlir-cpu-runner %t -e %{entry_point} -entry-point-result=void \
+// DEFINE:    -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils
+
+// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s
+
+/// End-to-end test for tensor.pack where one of the inner tile sizes is
+/// dynamic.
+///
+/// Note, ATM this is a relatively simple example, with no vectorization and
+/// the dynamic tile size being a compile-time constant. The intention is to
+/// incrementally expand the config to something much more complex.
+
+func.func @main() {
+  // Allocate and initialise the inputs
+  %A_alloc = tensor.empty() : tensor<7x16xi32>
+
+  %A = arith.constant dense<[
+    [ 1,  8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78, 85, 92, 99 , 106],
+    [ 2,  9, 16, 23, 30, 37, 44, 51, 58, 65, 72, 79, 86, 93, 100, 107],
+    [ 3, 10, 17, 24, 31, 38, 45, 52, 59, 66, 73, 80, 87, 94, 101, 108],
+    [ 4, 11, 18, 25, 32, 39, 46, 53, 60, 67, 74, 81, 88, 95, 102, 109],
+    [ 5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96, 103, 110],
+    [ 6, 13, 20, 27, 34, 41, 48, 55, 62, 69, 76, 83, 90, 97, 104, 111],
+    [ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112]
+  ]> : tensor<7x16xi32>
+
+  func.call @pack(%A) : (tensor<7x16xi32>) -> ()
+
+  return
+}
+
+func.func private @pack(%A: tensor<7x16xi32>) {
+  %c1 = arith.constant 1 : index
+  %pad_val = arith.constant 123 : i32
+
+  // Dynamic tile size
+  %tile_size = arith.constant 8 : index
+  %A_pack_empty = tensor.empty(%c1, %tile_size) : tensor<?x16x?x1xi32>
+
+  %A_pack = tensor.pack %A
+    padding_value(%pad_val : i32)
+    inner_dims_pos = [0, 1]
+    inner_tiles = [%tile_size, 1]
+    into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
+  %A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>
+
+  // Print the results
+  // CHECK: Unranked Memref base@ = 0{{.*}} rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
+  // Tile 1: (8 x 1)
+  // CHECK-NEXT:  1
+  // CHECK-NEXT:  2
+  // CHECK-NEXT:  3
+  // CHECK-NEXT:  4
+  // CHECK-NEXT:  5
+  // CHECK-NEXT:  6
+  // CHECK-NEXT:  7
+  // Expect pad value after 7 elements
+  // CHECK-NEXT:  123
+  // Tile 2: (8 x 1)
+  // CHECK-NEXT:  8
+  // CHECK-NEXT:  9
+  // CHECK-NEXT:  10
+  // CHECK-NEXT:  11
+  // CHECK-NEXT:  12
+  // CHECK-NEXT:  13
+  // CHECK-NEXT:  14
+  // Expect pad value after further 7 elements
+  // CHECK-NEXT:  123
+  // Tile 3: (8 x 1)
+  // CHECK-NEXT:  15
+  // CHECK-NEXT:  16
+  // ...
+  call @printMemrefI32(%A_cast) : (tensor<*xi32>) -> ()
+
+  return
+}
+
+module @transforms attributes { transform.with_named_sequence } {
+  transform.named_sequence @__transform_main(%module: !transform.any_op {transform.readonly}) {
+    %pack = transform.structured.match ops{["tensor.pack"]} in %module : (!transform.any_op) -> !transform.any_op
+
+    %tiled_linalg_op_p, %loops:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
+       : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)
+
+    transform.yield
+  }
+}
+
+func.func private @printMemrefI32(%ptr : tensor<*xi32>)

%pad_val = arith.constant 123 : i32

// Dynamic tile size
%tile_size = arith.constant 8 : index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be truly dynamic could we pass tile_size as an arg to @pack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the next step ;-)

The PRs that I am listing in the summary where specifically for the case when the tile size is an SSA value that's a known compile-time constant. See e.g. this: #114315

So yes, you are right, this is not truly dynamic. But it's enough of a complication to have revealed numerous issues.

into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
%A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>

// Print the results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore if this ask is not appealing - could you please
add a commented section on how the padded tiled version looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really need to run this stuff and check yourself - it's so cool and illustrative! :) (as in, I encourage you to experiment, but also acknowledge your comment as a valid request)

Here are first two 8x1 tiles:

Unranked Memref base@ = 0x125f07e80 rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
[[[[1],
   [2],
   [3],
   [4],
   [5],
   [6],
   [7],
   [123]],
  [[8],
   [9],
   [10],
   [11],
   [12],
   [13],
   [14],
   [123]],

That's very similar to the check-lines, but I avoided the brackets which are a bit annoying to match. Would you still find it helpful as a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]

/// Note, ATM this is a relatively simple example, with no vectorization and
/// the dynamic tile size being a compile-time constant. The intention is to
/// incrementally expand the config to something much more complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely written test. learnt some new stuff here.

Copy link
Contributor

@javedabsar1 javedabsar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
%A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>

// Print the results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]

@banach-space banach-space merged commit 33a9c26 into llvm:main Nov 14, 2024
11 checks passed
@banach-space banach-space deleted the andrzej/tensor_pack_e2e branch November 14, 2024 11:25
banach-space added a commit to banach-space/llvm-project that referenced this pull request Nov 15, 2024
This PR simply creates `populateGeneralizePatterns` to collect the
following patterns:
  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO),

and wraps the newly created `populate*` method into a new Transform
Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows
creating more involved e2e compilation pipelines for `tensor.pack` and
`tensor.unpack` Ops. I will require this this Op, in a forthcoming
PR that will build on top of llvm#115698.

No new tests are added. Instead, I've re-used existing tests:
  * "generalize-tensor-pack.mlir".
banach-space added a commit to banach-space/llvm-project that referenced this pull request Nov 15, 2024
This PR simply creates `populateGeneralizePatterns` to collect the
following patterns:
  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO),

and wraps the newly created `populate*` method into a new Transform
Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows
creating more involved e2e compilation pipelines for `tensor.pack` and
`tensor.unpack` Ops. I will require this this Op, in a forthcoming
PR that will build on top of llvm#115698.

No new tests are added. Instead, I've re-used existing tests:
  * "generalize-tensor-pack.mlir".
banach-space added a commit to banach-space/llvm-project that referenced this pull request Nov 15, 2024
This PR simply creates `populateGeneralizePatterns` to collect the
following patterns:
  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (left as a TODO),

and wraps the newly created `populate*` method into a new Transform
Dialect Op: `apply_patterns.linalg.generalize_pack_unpack`. This allows
creating more involved e2e compilation pipelines for `tensor.pack` and
`tensor.unpack` Ops. I will require this this Op, in a forthcoming
PR that will build on top of llvm#115698.

No new tests are added. Instead, I've re-used existing tests:
  * "generalize-tensor-pack.mlir".
banach-space added a commit to banach-space/llvm-project that referenced this pull request Nov 16, 2024
…lect Op

This PR introduces populateGeneralizePatterns, which collects the
following patterns:

  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO).

These patterns are wrapped in a new Transform Dialect Op:
`apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates
creating more involved end-to-end compilation pipelines for
`tensor.pack` and `tensor.unpack` operations. It will be required in an
upcoming PR building on top of llvm#115698.

No new tests are added in this PR. Instead, existing tests from:

  * "generalize-tensor-pack.mlir"

are reused. To achieve this:

  * I've updated the test to use
    `transform.apply_patterns.linalg.generalize_pack_unpack` instead of
    the flag
    `--test-linalg-transform-patterns="test-generalize-tensor-pack"`,
    avoiding artificial tests solely for the TD Op.
  * The TD sequence is saved to a new file, "generalize_pack.mlir", and
    pre-loaded using the option:
    `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'`
    This avoids duplicating the sequence for every "split" in the input
    file.
  * Added lit.local.cfg to exclude the "test/Dialect/Linalg/td"
    directory from test discovery, ensuring "generalize_pack.mlir" is
    not treated as a test file.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Nov 16, 2024
…lect Op

This PR introduces populateGeneralizePatterns, which collects the
following patterns:

  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO).

These patterns are wrapped in a new Transform Dialect Op:
`apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates
creating more involved end-to-end compilation pipelines for
`tensor.pack` and `tensor.unpack` operations. It will be required in an
upcoming PR building on top of llvm#115698.

No new tests are added in this PR. Instead, existing tests from:

  * "generalize-tensor-pack.mlir"

are reused. To achieve this:

  * I've updated the test to use
    `transform.apply_patterns.linalg.generalize_pack_unpack` instead of
    the flag
    `--test-linalg-transform-patterns="test-generalize-tensor-pack"`,
    avoiding artificial tests solely for the TD Op.
  * The TD sequence is saved to a new file, "generalize_pack.mlir", and
    pre-loaded using the option:
    `--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'`
    This avoids duplicating the sequence for every "split" in the input
    file.
  * Added lit.local.cfg to exclude the "test/Dialect/Linalg/td"
    directory from test discovery, ensuring "generalize_pack.mlir" is
    not treated as a test file.
banach-space added a commit that referenced this pull request Nov 18, 2024
This PR introduces populateGeneralizePatterns, which collects the
following patterns:

  * `GeneralizeOuterUnitDimsPackOpPattern`,
  * `GeneralizeOuterUnitDimsUnPackOpPattern` (currently a TODO).

These patterns are wrapped in a new Transform Dialect Op:
`apply_patterns.linalg.generalize_pack_unpack`. This Op facilitates
creating more involved end-to-end compilation pipelines for
`tensor.pack` and `tensor.unpack` operations. It will be required in an
upcoming PR building on top of #115698.

No new tests are added in this PR. Instead, existing tests from:

  * "generalize-tensor-pack.mlir"

are reused. To achieve this:

  * I've updated the test to use
    `transform.apply_patterns.linalg.generalize_pack_unpack` instead of
    the flag
    `--test-linalg-transform-patterns="test-generalize-tensor-pack"`,
    avoiding artificial tests solely for the TD Op.
  * The TD sequence is saved to a new file, "generalize_pack.mlir", and
    pre-loaded using the option:

`--transform-preload-library='transform-library-paths=%p/td/generalize_pack.mlir'`
    This avoids duplicating the sequence for every "split" in the input
    file.
  * Added "lit.local.cfg" to exclude the "test/Dialect/Linalg/td"
    directory from test discovery, ensuring "generalize_pack.mlir" is
    not treated as a test file.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Jan 3, 2025
Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few fixes in handling `tensor.unpack` (and similar fixes for
`tensor.pack` before that):

* llvm#119379, llvm#121393, llvm#121400.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Note, this PR complements llvm#115698 in which similar test for
`tensor.pack` was added.
banach-space added a commit that referenced this pull request Jan 9, 2025
…#121557)

Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required a few
fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before
that):

* #119379, #121393, #121400.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Note, this PR complements #115698 in which similar test for
`tensor.pack` was added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants