-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MLIR][XeGPU] Update XeGPU doc #136155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[MLIR][XeGPU] Update XeGPU doc #136155
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@llvm/pr-subscribers-mlir-gpu @llvm/pr-subscribers-mlir Author: Chao Chen (chencha3) ChangesUpdate docs for XeGPU dialect, and fix mess-up Full diff: https://github.com/llvm/llvm-project/pull/136155.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
index ab5fb4a4a7de9..f1bed70253ef3 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
@@ -183,53 +183,54 @@ def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
is not present, the default value is [1, 0].
- ### Examples:
- 1. Subgroup level layout:
- ```mlir
- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- In this example, there are 16 work-items per subgroup, and is organized as
- [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
-
- 2. Subgroup level layout with order:
- ```mlir
- #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
- ```
- In this example, there are 16 work-items per subgroup, and is organized as
- [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
-
- 3. Subgroup level layout with inst_data
- ```mlir
- #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
- ```
- In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
- which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
- work-item is assigned four 2x2 blocks in a round-robin manner.
-
- 4. Workgroup level layout:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
- arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
- is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
-
- 5. Workgroup level layout with order:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
- ```
- In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
- arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
- is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
-
- 6. Workgroup level layout with inst_data:
- ```mlir
- #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
- ```
- This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
- each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
- Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
- unit may result in non-contiguous access.
+ Examples:
+
+ 1. Subgroup level layout:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
+
+ 2. Subgroup level layout with order:
+ ```mlir
+ #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+ ```
+ In this example, there are 16 work-items per subgroup, and is organized as
+ [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
+
+ 3. Subgroup level layout with inst_data
+ ```mlir
+ #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
+ ```
+ In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
+ which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
+ work-item is assigned four 2x2 blocks in a round-robin manner.
+
+ 4. Workgroup level layout:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+ arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+ is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
+
+ 5. Workgroup level layout with order:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+ ```
+ In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+ arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+ is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
+
+ 6. Workgroup level layout with inst_data:
+ ```mlir
+ #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+ ```
+ This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
+ each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
+ Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
+ unit may result in non-contiguous access.
}];
let parameters = (ins
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
index 765f218f95d26..fb5a1e6f1db0c 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
@@ -16,11 +16,20 @@ def XeGPU_Dialect : Dialect {
let cppNamespace = "::mlir::xegpu";
let summary = "The XeGPU dialect that models Intel GPU's ISA";
let description = [{
- The XeGPU dialect models Intel Xe ISA semantics but works at vector and
- TensorDesc data type. It provides 1:1 mappings to match Xe instructions
- like DPAS and 2D block load. The matrix size being processed at this level
- exactly matches the hardware instructions or the intrinsic supported by
- the lower-level GPU compiler.
+ The XeGPU dialect closely models a subset of the Xe GPU's ISA, providing an
+ abstraction to support high-performance GEMM code generation. It serves as a
+ bridge dialect in the MLIR gradual lowering process, working with MLIR memref
+ and vector types, and complements the Arith, Math, Vector, and Memref dialects.
+ XeGPU operations are introduced for special Xe instructions not modeled by the
+ LLVM/SPIR-V dialect, such as DPAS and 2D block load and store.
+
+ It supports a tile-based programming model, decomposing the GEMM kernel into
+ large predefined tile sizes at the subgroup and workgroup levels. XeGPU allows
+ the high-level GEMM algorithm to be easily expressed. Underneath, it uses
+ target-specific recipes and hardware features to achieve optimal performance
+ on specific hardware. By decomposing GEMM at submatrix granularity and mapping it
+ to registers, it naturally supports optimizations like fusing with neighboring
+ operations.
}];
let dependentDialects = ["arith::ArithDialect"];
|
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this pull request
May 6, 2025
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this pull request
May 6, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update docs for XeGPU dialect, and fix mess-up