[mlir][nvgpu] Remove strict verifiers on `warpgroup.generate.descriptor` #69935

grypp · 2023-10-23T15:20:53Z

This PR relaxes some rules in the verifier. I found this to be overly restrictive. It's certainly possible to work around these rules, for example one way is to generate additional subview and etc., but this just bloats the IR.

The test #69913 needs this PR.

This PR relaxes some rules in the verifier. I found this to be overly restrictive. It's certainly possible to work around these rules, for example one way is to generate additional subview and etc., but this just bloats the IR.

llvmbot · 2023-10-23T15:22:03Z

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-nvgpu

@llvm/pr-subscribers-mlir-gpu

Author: Guray Ozen (grypp)

Changes

This PR relaxes some rules in the verifier. I found this to be overly restrictive. It's certainly possible to work around these rules, for example one way is to generate additional subview and etc., but this just bloats the IR.

Full diff: https://github.com/llvm/llvm-project/pull/69935.diff

1 Files Affected:

(modified) mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp (-6)

diff --git a/mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp b/mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
index f5b02fe1b515591..15eeba2839479d8 100644
--- a/mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
+++ b/mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
@@ -375,15 +375,9 @@ LogicalResult WarpgroupGenerateDescriptorOp::verify() {
   MemRefType memrefType = getTensor().getType();
   MemRefType tensorMapType = getTensorMap().getType().getTensor();
 
-  if (memrefType != tensorMapType)
-    return emitError() << "memref and tensor map type mismatch";
-
   if (!memrefType.hasStaticShape() || !tensorMapType.hasStaticShape())
     return emitError() << "supports only static shapes";
 
-  if (memrefType.getRank() != 2)
-    return emitError() << "supports only 2d memref is supported for now";
-
   if (getTensorMap().getType().getSwizzle() !=
       TensorMapSwizzleKind::SWIZZLE_128B) {
     return emitError() << "supports only "

nicolasvasilache · 2023-10-24T15:37:07Z

The test #69913 needs this PR.

Can you fold this change into the PR that needs it ?

qcolombet · 2023-10-24T15:52:49Z

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp

@@ -375,15 +375,9 @@ LogicalResult WarpgroupGenerateDescriptorOp::verify() {
  MemRefType memrefType = getTensor().getType();
  MemRefType tensorMapType = getTensorMap().getType().getTensor();

-  if (memrefType != tensorMapType)
-    return emitError() << "memref and tensor map type mismatch";


What's the semantic of unmatched memref/tensor types?

Could you add a test case demonstrating this case?

The issue is similar what we discuss here.

qcolombet · 2023-10-24T15:59:46Z

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp

  if (!memrefType.hasStaticShape() || !tensorMapType.hasStaticShape())
    return emitError() << "supports only static shapes";

-  if (memrefType.getRank() != 2)
-    return emitError() << "supports only 2d memref is supported for now";


Since this is supposed to be used to feed into wgmma operations, why do we need to support more than 2ds?

(Sorry for the dumb questions x).)

grypp · 2023-11-01T11:40:41Z

I created #70923 that is a better cleanup and improvement.

…#70028) PR #69913 added a GEMM test (128x128x128 F32 += F16 * F16) with if-statement. This PR adds the same test using predicates in PTX. Predicate support is enabled using _BasicPtxBuilderInterface_ `(nvgpu.opcode ..., predicate = %pred)`. The predicate condition is computed in `Step 2. [GPU] Elect fastest thread in CTA` inspired by cutlass. It is as follows: ``` lane_predicate = nvvm.elect.sync warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0) warp_idx_in_warp_group = warp_idx % 4 predicate = (lane_predicate & warp_idx_in_warp_group) ``` Depends on #70027 #69934 #69935 #69584

…llvm#70028) PR llvm#69913 added a GEMM test (128x128x128 F32 += F16 * F16) with if-statement. This PR adds the same test using predicates in PTX. Predicate support is enabled using _BasicPtxBuilderInterface_ `(nvgpu.opcode ..., predicate = %pred)`. The predicate condition is computed in `Step 2. [GPU] Elect fastest thread in CTA` inspired by cutlass. It is as follows: ``` lane_predicate = nvvm.elect.sync warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0) warp_idx_in_warp_group = warp_idx % 4 predicate = (lane_predicate & warp_idx_in_warp_group) ``` Depends on llvm#70027 llvm#69934 llvm#69935 llvm#69584

grypp requested a review from qcolombet October 23, 2023 15:20

llvmbot added mlir:gpu mlir mlir:nvgpu labels Oct 23, 2023

grypp mentioned this pull request Oct 24, 2023

[mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16*F16) with predicate #70028

Merged

qcolombet reviewed Oct 24, 2023

View reviewed changes

grypp closed this Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][nvgpu] Remove strict verifiers on `warpgroup.generate.descriptor` #69935

[mlir][nvgpu] Remove strict verifiers on `warpgroup.generate.descriptor` #69935

Uh oh!

grypp commented Oct 23, 2023 •

edited

Loading

Uh oh!

llvmbot commented Oct 23, 2023 •

edited

Loading

Uh oh!

nicolasvasilache commented Oct 24, 2023

Uh oh!

qcolombet Oct 24, 2023

Uh oh!

grypp Oct 24, 2023

Uh oh!

qcolombet Oct 24, 2023

Uh oh!

grypp commented Nov 1, 2023

Uh oh!

Uh oh!

[mlir][nvgpu] Remove strict verifiers on warpgroup.generate.descriptor #69935

[mlir][nvgpu] Remove strict verifiers on warpgroup.generate.descriptor #69935

Uh oh!

Conversation

grypp commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolasvasilache commented Oct 24, 2023

Uh oh!

qcolombet Oct 24, 2023

Choose a reason for hiding this comment

Uh oh!

grypp Oct 24, 2023

Choose a reason for hiding this comment

Uh oh!

qcolombet Oct 24, 2023

Choose a reason for hiding this comment

Uh oh!

grypp commented Nov 1, 2023

Uh oh!

Uh oh!

[mlir][nvgpu] Remove strict verifiers on `warpgroup.generate.descriptor` #69935

[mlir][nvgpu] Remove strict verifiers on `warpgroup.generate.descriptor` #69935

grypp commented Oct 23, 2023 •

edited

Loading

llvmbot commented Oct 23, 2023 •

edited

Loading