AMDGPU: Fix buffer intrinsic store of bfloat #95377

arsenm · 2024-06-13T09:17:13Z

No description provided.

arsenm · 2024-06-13T09:17:26Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite

llvmbot · 2024-06-13T09:20:27Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/95377.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll (+32-5)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                      {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
                       MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
                       MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-                      MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+                      MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
                      Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG &DAG,
                                                       EVT VDataType, SDLoc DL,
                                                       SDValue Ops[],
                                                       MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
     Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:    s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:    s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:    s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:       ; %bb.0:
+; GFX10-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:    s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:    s_nop 0
+; GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:    s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

This was referenced Jun 13, 2024

AMDGPU: Fix buffer intrinsic handling for various 16-bit elements. #95376

Merged

AMDGPU: Cleanup selection patterns for buffer loads #95378

Merged

AMDGPU: Fix buffer load/store of pointers #95379

Merged

arsenm added the backend:AMDGPU label Jun 13, 2024 — with Graphite App

arsenm requested review from jayfoad, krzysz00, piotrAMD and Sisyph June 13, 2024 09:20

arsenm marked this pull request as ready for review June 13, 2024 09:20

jayfoad approved these changes Jun 13, 2024

View reviewed changes

Base automatically changed from users/arsenm/amdgpu-fix-buffer-16-bit-vectors to main June 13, 2024 10:33

AMDGPU: Fix buffer intrinsic store of bfloat

3ceb488

arsenm force-pushed the users/arsenm/amdgpu-fix-buffer-store-bfloat branch from 520d91d to 3ceb488 Compare June 13, 2024 11:01

arsenm merged commit 5e8cf0b into main Jun 13, 2024
4 of 6 checks passed

arsenm deleted the users/arsenm/amdgpu-fix-buffer-store-bfloat branch June 13, 2024 11:13

EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024

AMDGPU: Fix buffer intrinsic store of bfloat (llvm#95377)

2469cd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Fix buffer intrinsic store of bfloat #95377

AMDGPU: Fix buffer intrinsic store of bfloat #95377

Uh oh!

arsenm commented Jun 13, 2024

Uh oh!

arsenm commented Jun 13, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Fix buffer intrinsic store of bfloat #95377

AMDGPU: Fix buffer intrinsic store of bfloat #95377

Uh oh!

Conversation

arsenm commented Jun 13, 2024

Uh oh!

arsenm commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jun 13, 2024 •

edited

Loading