Skip to content

[mlir][linalg][nfc] Update "pack-dynamic-inner-tile.mlir" #117533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@

/// End-to-end test for tensor.pack where one of the inner tile sizes is
/// dynamic.
///
/// Note, ATM this is a relatively simple example, with no vectorization and
/// the dynamic tile size being a compile-time constant. The intention is to
/// incrementally expand the config to something much more complex.

func.func @main() {
// Allocate and initialise the inputs
Expand Down Expand Up @@ -89,26 +85,49 @@ module @transforms attributes { transform.with_named_sequence } {
%tiled_pack_op_p, %loops:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

// 2. Decompose the tiled Op into (trimmed for brevity):
// 2. Decompose the tiled pack Op into (trimmed for brevity):
//
// %padded = tensor.pad %slice_of_A (..) :
// tensor<?x?xi32> to tensor<8x1xi32>
// %inserted_slice = tensor.insert_slice %padded into %slice_of_A_pack (...) :
// tensor<8x1xi32> into tensor<1x1x?x1xi32>
//
// NOTE: no tile is transposed, hence no linalg.transpose
%func_1 = transform.get_parent_op %tiled_pack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.any_op
transform.apply_patterns to %func_1 {
// (NOTE: no tile is transposed, hence no linalg.transpose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the pack op is decomposed with rank-reduced slices + outer_dims_perm map is empty/identity? Otherwise, I'd expect a transpose op that transposes the inner dimension of the first dimension into inner tiles.

E.g., it should be tensor<?x?x16x1> after expanding the padded tensor, so I'd expect a transpose to bring it to tensor<?x16x?x1>.

  %A_pack = tensor.pack %A
    padding_value(%pad_val : i32)
    inner_dims_pos = [0, 1]
    inner_tiles = [%tile_size, 1]
    into %A_pack_empty : tensor<7x16xi32> -> tensor<?x16x?x1xi32>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, note that no dimensions are transposed:

  • inner_dims_pos is an identity.
  • There's no outer_dims_perm (so it's also an identity).

Referring to the original tensor.pack:

  • We start with tensor<7x16xi32> and tile:
    • Dimension 7 using %tile_size (which is %c8).
    • Dimension 16 using 1.
  • This results in ?x1 as the trailing/inner dimensions in the output tensor.
  • The remaining dimensions form ?x16 as the outer dimensions in the output tensor:
    • ? corresponds to the tiling along 7.
    • 16 comes from the calculation original_dim / tile_size = 16 / 1 = 16.

Does this make sense? Let me know if anything needs clarification - I want to ensure I'm explaining this correctly 😅.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understood this part. There are few ways to decompose pack ops. We can either drop or not drop unit dims during the decomposition. I think I got the answer in the above transform op. We firstly tile outer dims with tile_size=1, and the outer dimensions all have size=1. Then we decompose the ops. In the decomposition, we use patterns that drop outer unit dims, so there are no transpose ops. So I can connect all the pieces now, thanks!

//
// This is followed by this decomposition of the pad Op:
//
// %c123_i32 = arith.constant 123 : i32
// %slice_of_A = tensor.extract_slice %A[%3, %arg3] [%4, %5] [1, 1] :
// tensor<7x16xi32> to tensor<?x?xi32>
// %empty = tensor.empty() : tensor<8x1xi32>
// %fill = linalg.fill ins(%c123_i32 : i32) outs(%empty :
// tensor<8x1xi32>) -> tensor<8x1xi32>
// %inserted_slice = tensor.insert_slice %slice_of_A into %fill[0, 0] [%4, %5] [1, 1] :
// tensor<?x?xi32> into tensor<8x1xi32>
//
%func_op = transform.get_parent_op %tiled_pack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.op<"func.func">
transform.apply_patterns to %func_op {
transform.apply_patterns.linalg.decompose_pack_unpack
} : !transform.any_op
transform.apply_patterns.linalg.decompose_pad
} : !transform.op<"func.func">

// 3. Vectorize linalg.fill.
// Vector sizes match the inner tiles in the payload IR.
%fill = transform.structured.match ops{["linalg.fill"]} in %func_op : (!transform.op<"func.func">) -> !transform.any_op
transform.structured.vectorize %fill vector_sizes [8, 1] : !transform.any_op

transform.apply_patterns to %func_op {
transform.apply_patterns.tensor.fold_tensor_subset_ops
transform.apply_patterns.canonicalization
} : !transform.op<"func.func">

// 3. Bufferize before lowering to LLVM
%bufferize = transform.bufferization.one_shot_bufferize %module
{bufferize_function_boundaries=true} : (!transform.any_op) -> !transform.any_op

// 4. Canonicalize
%func_2 = transform.structured.match ops{["func.func"]} in %bufferize : (!transform.any_op) -> !transform.op<"func.func">
transform.apply_patterns to %func_2 {
%func_op_bufferized = transform.structured.match ops{["func.func"]} in %bufferize : (!transform.any_op) -> !transform.op<"func.func">
transform.apply_patterns to %func_op_bufferized {
transform.apply_patterns.canonicalization
} : !transform.op<"func.func">

Expand Down
Loading