Skip to content

Commit 8f6856f

Browse files
author
git apple-llvm automerger
committed
Merge commit '386cc00d8d0a' from llvm.org/main into next
2 parents e441b88 + 386cc00 commit 8f6856f

File tree

2 files changed

+62
-52
lines changed

2 files changed

+62
-52
lines changed

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUAttrs.td

Lines changed: 48 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -183,53 +183,54 @@ def XeGPU_LayoutAttr : XeGPUAttr<"Layout", "layout"> {
183183
1-dimensional layout. The first dimension in the order list is the fastest-changing dimension. If it
184184
is not present, the default value is [1, 0].
185185

186-
### Examples:
187-
1. Subgroup level layout:
188-
```mlir
189-
#xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
190-
```
191-
In this example, there are 16 work-items per subgroup, and is organized as
192-
[[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
193-
194-
2. Subgroup level layout with order:
195-
```mlir
196-
#xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
197-
```
198-
In this example, there are 16 work-items per subgroup, and is organized as
199-
[[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
200-
201-
3. Subgroup level layout with inst_data
202-
```mlir
203-
#xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
204-
```
205-
In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
206-
which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
207-
work-item is assigned four 2x2 blocks in a round-robin manner.
208-
209-
4. Workgroup level layout:
210-
```mlir
211-
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
212-
```
213-
In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
214-
arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
215-
is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
216-
217-
5. Workgroup level layout with order:
218-
```mlir
219-
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
220-
```
221-
In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
222-
arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
223-
is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
224-
225-
6. Workgroup level layout with inst_data:
226-
```mlir
227-
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
228-
```
229-
This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
230-
each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
231-
Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
232-
unit may result in non-contiguous access.
186+
Examples:
187+
188+
1. Subgroup level layout:
189+
```mlir
190+
#xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1]>
191+
```
192+
In this example, there are 16 work-items per subgroup, and is organized as
193+
[[0, 1, 2, .., 7],[8, 9, .., 15]]. The distribution unit is 1x1.
194+
195+
2. Subgroup level layout with order:
196+
```mlir
197+
#xegpu.layout<lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
198+
```
199+
In this example, there are 16 work-items per subgroup, and is organized as
200+
[[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]]. The distribution unit is 1x1.
201+
202+
3. Subgroup level layout with inst_data
203+
```mlir
204+
#xegpu.layout<inst_data = [8, 16], lane_layout = [2, 8], lane_data = [2, 2]>
205+
```
206+
In this example, the original problem size is partitioned into smaller subproblems of dimensions [8, 16],
207+
which are then distributed among 16 work-items arranged as [[0, 1, 2, ..., 7], [8, 9, ..., 15]]. Each
208+
work-item is assigned four 2x2 blocks in a round-robin manner.
209+
210+
4. Workgroup level layout:
211+
```mlir
212+
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1]>
213+
```
214+
In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
215+
arranged as [[0, 1, 2, 3], [4, 5, 6, 7]]. Each subgroup accesses a 16x16 block per instruction, which
216+
is further distributed to 16 work items which is organized as [[0, 1, 2, .., 7],[8, 9, .., 15]].
217+
218+
5. Workgroup level layout with order:
219+
```mlir
220+
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], lane_layout = [2, 8], lane_data = [1, 1], order = [0, 1]>
221+
```
222+
In this example, the layout represents a workgroup distribution. A workgroup consists of 8 subgroups
223+
arranged as [[0, 2, 4, 6], [1, 3, 5, 7]]. Each subgroup accesses a 16x16 block per instruction, which
224+
is further distributed to 16 work items which is organized as [[0, 2, 4, ..., 14], [1, 3, 5, ..., 15]].
225+
226+
6. Workgroup level layout with inst_data:
227+
```mlir
228+
#xegpu.layout<sg_layout = [2, 4], sg_data = [16, 16], inst_data = [8, 16], lane_layout = [2, 8], lane_data = [1, 1]>
229+
```
230+
This example is similar to the previous ones, but the `inst_data` parameter divides `sg_data` into two instructions,
231+
each processing an 8x16 block. These blocks are further distributed across 16 work-items with a distribution unit of 1x1.
232+
Unlike the 2x2 distribution unit in example 3, which results in accessing contiguous 2x2 blocks, the 1x1 distribution
233+
unit may result in non-contiguous access.
233234
}];
234235

235236
let parameters = (ins

mlir/include/mlir/Dialect/XeGPU/IR/XeGPUDialect.td

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,20 @@ def XeGPU_Dialect : Dialect {
1616
let cppNamespace = "::mlir::xegpu";
1717
let summary = "The XeGPU dialect that models Intel GPU's ISA";
1818
let description = [{
19-
The XeGPU dialect models Intel Xe ISA semantics but works at vector and
20-
TensorDesc data type. It provides 1:1 mappings to match Xe instructions
21-
like DPAS and 2D block load. The matrix size being processed at this level
22-
exactly matches the hardware instructions or the intrinsic supported by
23-
the lower-level GPU compiler.
19+
The XeGPU dialect closely models a subset of the Xe GPU's ISA, providing an
20+
abstraction to support high-performance GEMM code generation. It serves as a
21+
bridge dialect in the MLIR gradual lowering process, working with MLIR memref
22+
and vector types, and complements the Arith, Math, Vector, and Memref dialects.
23+
XeGPU operations are introduced for special Xe instructions not modeled by the
24+
LLVM/SPIR-V dialect, such as DPAS and 2D block load and store.
25+
26+
It supports a tile-based programming model, decomposing the GEMM kernel into
27+
large predefined tile sizes at the subgroup and workgroup levels. XeGPU allows
28+
the high-level GEMM algorithm to be easily expressed. Underneath, it uses
29+
target-specific recipes and hardware features to achieve optimal performance
30+
on specific hardware. By decomposing GEMM at submatrix granularity and mapping it
31+
to registers, it naturally supports optimizations like fusing with neighboring
32+
operations.
2433
}];
2534

2635
let dependentDialects = ["arith::ArithDialect"];

0 commit comments

Comments
 (0)