Skip to content

AMDGPU: Fix buffer intrinsic store of bfloat #95377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jun 13, 2024

No description provided.

Copy link
Contributor Author

arsenm commented Jun 13, 2024

@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/95377.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll (+32-5)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 4946129c65a95..81098201e9c0f 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -874,7 +874,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM,
                      {MVT::Other, MVT::v2i16, MVT::v2f16, MVT::v2bf16,
                       MVT::v3i16, MVT::v3f16, MVT::v4f16, MVT::v4i16,
                       MVT::v4bf16, MVT::v8i16, MVT::v8f16, MVT::v8bf16,
-                      MVT::f16, MVT::i16, MVT::i8, MVT::i128},
+                      MVT::f16, MVT::i16, MVT::bf16, MVT::i8, MVT::i128},
                      Custom);
 
   setOperationAction(ISD::STACKSAVE, MVT::Other, Custom);
@@ -9973,7 +9973,7 @@ SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG &DAG,
                                                       EVT VDataType, SDLoc DL,
                                                       SDValue Ops[],
                                                       MemSDNode *M) const {
-  if (VDataType == MVT::f16)
+  if (VDataType == MVT::f16 || VDataType == MVT::bf16)
     Ops[1] = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Ops[1]);
 
   SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
index f7f3742a90633..82dd35ab4c240 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.ptr.buffer.store.bf16.ll
@@ -5,11 +5,38 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck --check-prefix=GFX10 %s
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 < %s | FileCheck --check-prefixes=GFX11 %s
 
-; FIXME
-; define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) {
-;   call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
-;   ret void
-; }
+define amdgpu_ps void @buffer_store_bf16(ptr addrspace(8) inreg %rsrc, bfloat %data, i32 %offset) {
+; GFX7-LABEL: buffer_store_bf16:
+; GFX7:       ; %bb.0:
+; GFX7-NEXT:    v_mul_f32_e32 v0, 1.0, v0
+; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 16, v0
+; GFX7-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:    s_endpgm
+;
+; GFX8-LABEL: buffer_store_bf16:
+; GFX8:       ; %bb.0:
+; GFX8-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX8-NEXT:    s_endpgm
+;
+; GFX9-LABEL: buffer_store_bf16:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX9-NEXT:    s_endpgm
+;
+; GFX10-LABEL: buffer_store_bf16:
+; GFX10:       ; %bb.0:
+; GFX10-NEXT:    buffer_store_short v0, v1, s[0:3], 0 offen
+; GFX10-NEXT:    s_endpgm
+;
+; GFX11-LABEL: buffer_store_bf16:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    buffer_store_b16 v0, v1, s[0:3], 0 offen
+; GFX11-NEXT:    s_nop 0
+; GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:    s_endpgm
+  call void @llvm.amdgcn.raw.ptr.buffer.store.bf16(bfloat %data, ptr addrspace(8) %rsrc, i32 %offset, i32 0, i32 0)
+  ret void
+}
 
 define amdgpu_ps void @buffer_store_v2bf16(ptr addrspace(8) inreg %rsrc, <2 x bfloat> %data, i32 %offset) {
 ; GFX7-LABEL: buffer_store_v2bf16:

@arsenm arsenm marked this pull request as ready for review June 13, 2024 09:20
Base automatically changed from users/arsenm/amdgpu-fix-buffer-16-bit-vectors to main June 13, 2024 10:33
@arsenm arsenm force-pushed the users/arsenm/amdgpu-fix-buffer-store-bfloat branch from 520d91d to 3ceb488 Compare June 13, 2024 11:01
@arsenm arsenm merged commit 5e8cf0b into main Jun 13, 2024
4 of 6 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu-fix-buffer-store-bfloat branch June 13, 2024 11:13
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants