Skip to content

[mlir][tensor] Add e2e test for tensor.pack with dynamic tile sizes #115698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// DEFINE: %{compile} = mlir-opt %s \
// DEFINE: -transform-interpreter -test-transform-dialect-erase-schedule |\
// DEFINE: mlir-opt --test-linalg-transform-patterns="test-generalize-tensor-pack"\
// DEFINE: --test-transform-dialect-erase-schedule \
// DEFINE: -one-shot-bufferize="bufferize-function-boundaries" \
// DEFINE: -buffer-deallocation-pipeline="private-function-dynamic-ownership" \
// DEFINE: -cse -canonicalize -test-lower-to-llvm -o %t
// DEFINE: %{entry_point} = main
// DEFINE: %{run} = mlir-cpu-runner %t -e %{entry_point} -entry-point-result=void \
// DEFINE: -shared-libs=%mlir_runner_utils,%mlir_c_runner_utils

// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s

/// End-to-end test for tensor.pack where one of the inner tile sizes is
/// dynamic.
///
/// Note, ATM this is a relatively simple example, with no vectorization and
/// the dynamic tile size being a compile-time constant. The intention is to
/// incrementally expand the config to something much more complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely written test. learnt some new stuff here.

func.func @main() {
// Allocate and initialise the inputs
%A_alloc = tensor.empty() : tensor<7x16xi32>

%A = arith.constant dense<[
[ 1, 8, 15, 22, 29, 36, 43, 50, 57, 64, 71, 78, 85, 92, 99 , 106],
[ 2, 9, 16, 23, 30, 37, 44, 51, 58, 65, 72, 79, 86, 93, 100, 107],
[ 3, 10, 17, 24, 31, 38, 45, 52, 59, 66, 73, 80, 87, 94, 101, 108],
[ 4, 11, 18, 25, 32, 39, 46, 53, 60, 67, 74, 81, 88, 95, 102, 109],
[ 5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96, 103, 110],
[ 6, 13, 20, 27, 34, 41, 48, 55, 62, 69, 76, 83, 90, 97, 104, 111],
[ 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98, 105, 112]
]> : tensor<7x16xi32>

func.call @pack(%A) : (tensor<7x16xi32>) -> ()

return
}

func.func private @pack(%A: tensor<7x16xi32>) {
%c1 = arith.constant 1 : index
%pad_val = arith.constant 123 : i32

// Dynamic tile size
%tile_size = arith.constant 8 : index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be truly dynamic could we pass tile_size as an arg to @pack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the next step ;-)

The PRs that I am listing in the summary where specifically for the case when the tile size is an SSA value that's a known compile-time constant. See e.g. this: #114315

So yes, you are right, this is not truly dynamic. But it's enough of a complication to have revealed numerous issues.

%A_pack_empty = tensor.empty(%c1, %tile_size) : tensor<?x16x?x1xi32>

%A_pack = tensor.pack %A
padding_value(%pad_val : i32)
inner_dims_pos = [0, 1]
inner_tiles = [%tile_size, 1]
into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>
%A_cast = tensor.cast %A_pack : tensor<?x16x?x1xi32> to tensor<*xi32>

// Print the results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore if this ask is not appealing - could you please
add a commented section on how the padded tiled version looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really need to run this stuff and check yourself - it's so cool and illustrative! :) (as in, I encourage you to experiment, but also acknowledge your comment as a valid request)

Here are first two 8x1 tiles:

Unranked Memref base@ = 0x125f07e80 rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
[[[[1],
   [2],
   [3],
   [4],
   [5],
   [6],
   [7],
   [123]],
  [[8],
   [9],
   [10],
   [11],
   [12],
   [13],
   [14],
   [123]],

That's very similar to the check-lines, but I avoided the brackets which are a bit annoying to match. Would you still find it helpful as a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
// Unranked Memref base@ = 0x__ rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// [[ [ [1], [2], [3], [4], [5], [6], [7], [123] ],
// [ [8], [9], [10], [11], [12], [13], [14], [123] ],
// ... ... .....
// [ [106], [107], ... [123] ],
// ]]

// CHECK: Unranked Memref base@ = 0{{.*}} rank = 4 offset = 0 sizes = [1, 16, 8, 1] strides = [128, 8, 1, 1] data =
// Tile 1: (8 x 1)
// CHECK-NEXT: 1
// CHECK-NEXT: 2
// CHECK-NEXT: 3
// CHECK-NEXT: 4
// CHECK-NEXT: 5
// CHECK-NEXT: 6
// CHECK-NEXT: 7
// Expect pad value after 7 elements
// CHECK-NEXT: 123
// Tile 2: (8 x 1)
// CHECK-NEXT: 8
// CHECK-NEXT: 9
// CHECK-NEXT: 10
// CHECK-NEXT: 11
// CHECK-NEXT: 12
// CHECK-NEXT: 13
// CHECK-NEXT: 14
// Expect pad value after further 7 elements
// CHECK-NEXT: 123
// Tile 3: (8 x 1)
// CHECK-NEXT: 15
// CHECK-NEXT: 16
// ...
call @printMemrefI32(%A_cast) : (tensor<*xi32>) -> ()

return
}

module @transforms attributes { transform.with_named_sequence } {
transform.named_sequence @__transform_main(%module: !transform.any_op {transform.readonly}) {
%pack = transform.structured.match ops{["tensor.pack"]} in %module : (!transform.any_op) -> !transform.any_op

%tiled_linalg_op_p, %loops:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

transform.yield
}
}

func.func private @printMemrefI32(%ptr : tensor<*xi32>)
Loading