Skip to content

[mlir][SME] Update E2E test to show potential optimisation (NFC) #107585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

nujaa
Copy link
Contributor

@nujaa nujaa commented Sep 6, 2024

Introduces loop hoisting to ARM SME E2E tests to allow the hoisting of the tile load offering very important speedup.

Discussed here : https://discourse.llvm.org/t/mlir-for-arm-sme-reducing-tile-data-transfers/80065/2

@nujaa nujaa requested a review from MacDue September 6, 2024 13:49
@nujaa nujaa marked this pull request as ready for review September 6, 2024 14:13
@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Hugo Trachino (nujaa)

Changes

Introduces loop hoisting to ARM SME E2E tests to allow the hoisting of the tile load offering very important speedup.

Discussed here : https://discourse.llvm.org/t/mlir-for-arm-sme-reducing-tile-data-transfers/80065/2


Full diff: https://github.com/llvm/llvm-project/pull/107585.diff

2 Files Affected:

  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir (+8)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir (+8)
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
index a57348a543c3cf..886211b65efa2d 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
@@ -82,8 +82,16 @@ module attributes {transform.with_named_sequence} {
     transform.apply_patterns to %func {
       transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
       transform.apply_patterns.vector.lower_masks
+      transform.apply_patterns.canonicalization
     } : !transform.any_op
 
+    // Step 5: Hoist load of accumulator.
+    %func_h = transform.structured.hoist_redundant_vector_transfers %func
+        : (!transform.any_op) -> !transform.any_op
+    %all_loops = transform.structured.match interface{LoopLikeInterface} in %module
+      : (!transform.any_op) -> !transform.any_op
+    transform.apply_licm to %all_loops : !transform.any_op
+    transform.loop.hoist_loop_invariant_subsets %all_loops : !transform.any_op
     transform.yield
   }
 }
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
index 79c9fcac70604b..4b6b9a9c746499 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
@@ -88,8 +88,16 @@ module attributes {transform.with_named_sequence} {
       transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
       transform.apply_patterns.vector.lower_masks
       transform.apply_patterns.vector.rank_reducing_subview_patterns
+      transform.apply_patterns.canonicalization
     } : !transform.any_op
 
+    // Step 6: Hoist load of accumulator.
+    %func_h = transform.structured.hoist_redundant_vector_transfers %func
+        : (!transform.any_op) -> !transform.any_op
+    %all_loops = transform.structured.match interface{LoopLikeInterface} in %bufferize
+      : (!transform.any_op) -> !transform.any_op
+    transform.apply_licm to %all_loops : !transform.any_op
+    transform.loop.hoist_loop_invariant_subsets %all_loops : !transform.any_op
     transform.yield
   }
 }

@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-mlir-sme

Author: Hugo Trachino (nujaa)

Changes

Introduces loop hoisting to ARM SME E2E tests to allow the hoisting of the tile load offering very important speedup.

Discussed here : https://discourse.llvm.org/t/mlir-for-arm-sme-reducing-tile-data-transfers/80065/2


Full diff: https://github.com/llvm/llvm-project/pull/107585.diff

2 Files Affected:

  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir (+8)
  • (modified) mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir (+8)
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
index a57348a543c3cf..886211b65efa2d 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir
@@ -82,8 +82,16 @@ module attributes {transform.with_named_sequence} {
     transform.apply_patterns to %func {
       transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
       transform.apply_patterns.vector.lower_masks
+      transform.apply_patterns.canonicalization
     } : !transform.any_op
 
+    // Step 5: Hoist load of accumulator.
+    %func_h = transform.structured.hoist_redundant_vector_transfers %func
+        : (!transform.any_op) -> !transform.any_op
+    %all_loops = transform.structured.match interface{LoopLikeInterface} in %module
+      : (!transform.any_op) -> !transform.any_op
+    transform.apply_licm to %all_loops : !transform.any_op
+    transform.loop.hoist_loop_invariant_subsets %all_loops : !transform.any_op
     transform.yield
   }
 }
diff --git a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
index 79c9fcac70604b..4b6b9a9c746499 100644
--- a/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
+++ b/mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul.mlir
@@ -88,8 +88,16 @@ module attributes {transform.with_named_sequence} {
       transform.apply_patterns.vector.lower_contraction lowering_strategy = "outerproduct"
       transform.apply_patterns.vector.lower_masks
       transform.apply_patterns.vector.rank_reducing_subview_patterns
+      transform.apply_patterns.canonicalization
     } : !transform.any_op
 
+    // Step 6: Hoist load of accumulator.
+    %func_h = transform.structured.hoist_redundant_vector_transfers %func
+        : (!transform.any_op) -> !transform.any_op
+    %all_loops = transform.structured.match interface{LoopLikeInterface} in %bufferize
+      : (!transform.any_op) -> !transform.any_op
+    transform.apply_licm to %all_loops : !transform.any_op
+    transform.loop.hoist_loop_invariant_subsets %all_loops : !transform.any_op
     transform.yield
   }
 }

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hugo, LGTM!

Could you add a note that the additional step is not required for functional correctness and that instead it's an optimisation? This is obvious today, but our future selves might forget ;-) Thanks!

} : !transform.any_op

// Step 5: Hoist load of accumulator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's both the load and store of the accumulator that's hoisted.

@MacDue
Copy link
Member

MacDue commented Sep 9, 2024

Typo optionnal -> optional (also maybe say optimization rather than optional)

@nujaa nujaa merged commit 8aeb104 into llvm:main Sep 10, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants