Skip to content

[MLIR][XeGPU] Update XeGPU doc #136155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025
Merged

Conversation

chencha3
Copy link
Contributor

Update docs for XeGPU dialect, and fix mess-up

@chencha3 chencha3 marked this pull request as ready for review April 17, 2025 16:31
@llvmbot
Copy link
Member

llvmbot commented Apr 17, 2025

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Chao Chen (chencha3)

Changes

Update docs for XeGPU dialect, and fix mess-up


Full diff: https://github.com/llvm/llvm-project/pull/136155.diff

2 Files Affected:

  • (modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td (+48-47)
  • (modified) mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td (+14-5)
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
index ab5fb4a4a7de9..f1bed70253ef3 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td
@@ -183,53 +183,54 @@ def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
       1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
       is not present, the default value is [1, 0].
 
-    ### Examples:
-      1. Subgroup level layout:
-      ```mlir
-      #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
-      ```
-      In this example, there are 16 work-items per subgroup, and is organized as
-      [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
-
-      2. Subgroup level layout with order:
-      ```mlir
-      #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
-      ```
-      In this example, there are 16 work-items per subgroup, and is organized as
-      [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
-
-      3. Subgroup level layout with inst_data
-      ```mlir
-      #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
-      ```
-      In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
-      which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
-      work-item is assigned four 2x2 blocks in a round-robin manner.
-
-      4. Workgroup level layout:
-      ```mlir
-      #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
-      ```
-      In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
-      arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
-      is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
-
-      5. Workgroup level layout with order:
-      ```mlir
-      #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
-      ```
-      In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
-      arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
-      is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
-
-      6. Workgroup level layout with inst_data:
-      ```mlir
-      #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
-      ```
-      This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
-      each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
-      Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
-      unit may result in non-contiguous access.
+    Examples:
+
+    1. Subgroup level layout:
+    ```mlir
+    #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
+    ```
+    In this example, there are 16 work-items per subgroup, and is organized as
+    [[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
+
+    2. Subgroup level layout with order:
+    ```mlir
+    #xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+    ```
+    In this example, there are 16 work-items per subgroup, and is organized as
+    [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
+
+    3. Subgroup level layout with inst_data
+    ```mlir
+    #xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
+    ```
+    In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
+    which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
+    work-item is assigned four 2x2 blocks in a round-robin manner.
+
+    4. Workgroup level layout:
+    ```mlir
+    #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+    ```
+    In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+    arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+    is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
+
+    5. Workgroup level layout with order:
+    ```mlir
+    #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
+    ```
+    In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
+    arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
+    is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
+
+    6. Workgroup level layout with inst_data:
+    ```mlir
+    #xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
+    ```
+    This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
+    each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
+    Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
+    unit may result in non-contiguous access.
   }];
 
   let parameters = (ins
diff --git a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
index 765f218f95d26..fb5a1e6f1db0c 100644
--- a/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
+++ b/mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td
@@ -16,11 +16,20 @@ def XeGPU_Dialect : Dialect {
     let cppNamespace = "::mlir::xegpu";
     let summary = "The XeGPU dialect that models Intel GPU's ISA";
     let description = [{
-      The XeGPU dialect models Intel Xe ISA semantics but works at vector and
-      TensorDesc data type. It provides 1:1 mappings to match Xe instructions
-      like DPAS and 2D block load. The matrix size being processed at this level
-      exactly matches the hardware instructions or the intrinsic supported by
-      the lower-level GPU compiler.
+      The XeGPU dialect closely models a subset of the Xe GPU's ISA, providing an
+      abstraction to support high-performance GEMM code generation. It serves as a
+      bridge dialect in the MLIR gradual lowering process, working with MLIR memref
+      and vector types, and complements the Arith, Math, Vector, and Memref dialects.
+      XeGPU operations are introduced for special Xe instructions not modeled by the
+      LLVM/SPIR-V dialect, such as DPAS and 2D block load and store.
+
+      It supports a tile-based programming model, decomposing the GEMM kernel into
+      large predefined tile sizes at the subgroup and workgroup levels. XeGPU allows
+      the high-level GEMM algorithm to be easily expressed. Underneath, it uses
+      target-specific recipes and hardware features to achieve optimal performance
+      on specific hardware. By decomposing GEMM at submatrix granularity and mapping it
+      to registers, it naturally supports optimizations like fusing with neighboring
+      operations.
     }];
 
     let dependentDialects = ["arith::ArithDialect"];

@chencha3 chencha3 merged commit 386cc00 into main Apr 17, 2025
16 checks passed
@chencha3 chencha3 deleted the users/chencha3/xegpu/refine_xegpu_doc branch April 17, 2025 17:02
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants