[mlir][linalg] Simplify vectorization test output using `-canonicalize -cse`

The Linalg vectorization tests are currently quite complex and hard to navigate (see full list with links below). One area I’d like to improve is simplifying the expected test output by updating the `mlir-opt` invocation to include:
* `-canonicalize -cse`.

### Why add `-cse`?
CSE alone is a huge win. It eliminates redundant constants like:
```mlir
%c0 = arith.constant  0 : index
%c0_1 = arith.constant  0 : index
%c0_2 = arith.constant  0 : index
```
Without CSE, test updates often involve unnecessarily matching different SSA values representing the same constant, which adds noise and overhead.

### Why add `-canonicalize`?

Adding -canonicalize helps simplify `tensor.dim`, `affine.appl`y, and other commonly duplicated constructs.

**Current output from the vectorizer**
```mlir
  func.func @test_masked_vectorize_dynamic_pad(%arg0: tensor<?x?xf32>, %arg1: index, %arg2: index) -> tensor<?x?xf32> {
    %cst = arith.constant 4.243000e+01 : f32
    %c0 = arith.constant 0 : index
    %c0_0 = arith.constant 0 : index
    %dim = tensor.dim %arg0, %c0_0 : tensor<?x?xf32>
    %0 = affine.apply #map()[%arg1, %dim]
    %c1 = arith.constant 1 : index
    %dim_1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %1 = affine.apply #map()[%arg2, %dim_1]
    %c0_2 = arith.constant 0 : index
    %c0_3 = arith.constant 0 : index
    %dim_4 = tensor.dim %arg0, %c0_3 : tensor<?x?xf32>
    %c1_5 = arith.constant 1 : index
    %dim_6 = tensor.dim %arg0, %c1_5 : tensor<?x?xf32>
    %2 = vector.create_mask %dim_4, %dim_6 : vector<2x4xi1>
    %3 = vector.mask %2 { vector.transfer_read %arg0[%c0_2, %c0_2], %cst {in_bounds = [true, true]} : tensor<?x?xf32>, vector<2x4xf32> } : vector<2x4xi1> -> vector<2x4xf32>
    %4 = tensor.empty(%0, %1) : tensor<?x?xf32>
    %c0_7 = arith.constant 0 : index
    %c0_8 = arith.constant 0 : index
    %dim_9 = tensor.dim %4, %c0_8 : tensor<?x?xf32>
    %c1_10 = arith.constant 1 : index
    %dim_11 = tensor.dim %4, %c1_10 : tensor<?x?xf32>
    %5 = vector.create_mask %dim_9, %dim_11 : vector<2x4xi1>
    %6 = vector.mask %5 { vector.transfer_write %3, %4[%c0_7, %c0_7] {in_bounds = [true, true]} : vector<2x4xf32>, tensor<?x?xf32> } : vector<2x4xi1> -> tensor<?x?xf32>
    return %6 : tensor<?x?xf32>
  }
```

There is a lot of duplication of `arith.constant` and `tensor.dim`.

**Output from the vectorizer after adding `-cse`:**
```mlir
  func.func @test_masked_vectorize_dynamic_pad(%arg0: tensor<?x?xf32>, %arg1: index, %arg2: index) -> tensor<?x?xf32> {
    %cst = arith.constant 4.243000e+01 : f32
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %0 = affine.apply #map()[%arg1, %dim]
    %c1 = arith.constant 1 : index
    %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %1 = affine.apply #map()[%arg2, %dim_0]
    %2 = vector.create_mask %dim, %dim_0 : vector<2x4xi1>
    %3 = vector.mask %2 { vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<?x?xf32>, vector<2x4xf32> } : vector<2x4xi1> -> vector<2x4xf32>
    %4 = tensor.empty(%0, %1) : tensor<?x?xf32>
    %dim_1 = tensor.dim %4, %c0 : tensor<?x?xf32>
    %dim_2 = tensor.dim %4, %c1 : tensor<?x?xf32>
    %5 = vector.create_mask %dim_1, %dim_2 : vector<2x4xi1>
    %6 = vector.mask %5 { vector.transfer_write %3, %4[%c0, %c0] {in_bounds = [true, true]} : vector<2x4xf32>, tensor<?x?xf32> } : vector<2x4xi1> -> tensor<?x?xf32>
    return %6 : tensor<?x?xf32>
  }
```

No duplication of `arith.constant`, but `tensor.dim` is still unnecessarily duplicated.

**Output from the vectorizer after adding `-canonicalize  -cse`:**

```mlir
  func.func @test_masked_vectorize_dynamic_pad(%arg0: tensor<?x?xf32>, %arg1: index, %arg2: index) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %cst = arith.constant 4.243000e+01 : f32
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %0 = affine.apply #map()[%arg1, %dim]
    %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %1 = affine.apply #map()[%arg2, %dim_0]
    %2 = vector.create_mask %dim, %dim_0 : vector<2x4xi1>
    %3 = vector.mask %2 { vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : tensor<?x?xf32>, vector<2x4xf32> } : vector<2x4xi1> -> vector<2x4xf32>
    %4 = tensor.empty(%0, %1) : tensor<?x?xf32>
    %5 = vector.create_mask %0, %1 : vector<2x4xi1>
    %6 = vector.mask %5 { vector.transfer_write %3, %4[%c0, %c0] {in_bounds = [true, true]} : vector<2x4xf32>, tensor<?x?xf32> } : vector<2x4xi1> -> tensor<?x?xf32>
    return %6 : tensor<?x?xf32>
  }
```

No duplication :)

### Pros vs Cons

Pros:
* Easier to focus on the semantic intent of vectorization output.
* Reduces test maintenance (less duplication, fewer fragile SSA names).
* Aligns with [FileCheck best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices): test the minimal necessary.

Cons:
* Tests will now depend on CSE and canonicalization, making them indirectly sensitive to unrelated changes.
* Tests will no longer isolate vectorization alone - they will validate a pipeline of transformations.

### Next steps

While there are trade-offs, I believe this change will be beneficial overall.

My first patch is here:

* https://github.com/llvm/llvm-project/pull/138267

Assuming there are no strong objections, I’d like to use this issue for discussion and long-term context.
CC @dcaballe @hanhanW - you've reviewed most of my patches in this area. Anyone else I should include?

Thanks!

### List of test files
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorization-scalable.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorization-unsupported.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorization-with-patterns.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorization.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-conv-masked-and-scalable.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-convolution-flatten.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-convolution.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-tensor-extract-masked.mlir
* https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][linalg] Simplify vectorization test output using `-canonicalize -cse` #138265

Why add `-cse`?

Why add `-canonicalize`?

Pros vs Cons

Next steps

List of test files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[mlir][linalg] Simplify vectorization test output using -canonicalize -cse #138265

Description

Why add -cse?

Why add -canonicalize?

Pros vs Cons

Next steps

List of test files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[mlir][linalg] Simplify vectorization test output using `-canonicalize -cse` #138265

Why add `-cse`?

Why add `-canonicalize`?