AMDGPU: Add subtarget features for minimum3/maximum3 instructions #116308

arsenm · 2024-11-15T01:34:51Z

gfx12 and gfx950 managed to produce 3 different permutations of this feature.
gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.

arsenm · 2024-11-15T01:35:06Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2024-11-15T01:36:31Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

gfx12 and gfx950 managed to produce 3 different permutations of this feature.
gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16. This piece only
adds the f32/f16 features gfx12, so it can probably go directly upstream.

Full diff: https://github.com/llvm/llvm-project/pull/116308.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+22)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+10-1)
(modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+2-2)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index d028c1f5ca7613..35dbf86b7c6f36 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -137,6 +137,18 @@ def FeatureFmaMixInsts : SubtargetFeature<"fma-mix-insts",
   "Has v_fma_mix_f32, v_fma_mixlo_f16, v_fma_mixhi_f16 instructions"
 >;
 
+def FeatureMinimum3Maximum3F32 : SubtargetFeature<"minimum3-maximum3-f32",
+  "HasMinimum3Maximum3F32",
+  "true",
+  "Has v_minimum3_f32 and v_maximum3_f32 instructions"
+>;
+
+def FeatureMinimum3Maximum3F16 : SubtargetFeature<"minimum3-maximum3-f16",
+  "HasMinimum3Maximum3F16",
+  "true",
+  "Has v_minimum3_f16 and v_maximum3_f16 instructions"
+>;
+
 def FeatureSupportsXNACK : SubtargetFeature<"xnack-support",
   "SupportsXNACK",
   "true",
@@ -1263,6 +1275,7 @@ def FeatureGFX12 : GCNSubtargetFeatureGeneration<"GFX12",
    FeatureUnalignedDSAccess, FeatureTrue16BitInsts,
    FeatureDefaultComponentBroadcast, FeatureMaxHardClauseLength32,
    FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+   FeatureMinimum3Maximum3F32, FeatureMinimum3Maximum3F16,
    FeatureAgentScopeFineGrainedRemoteMemoryAtomics
   ]
 >;
@@ -2005,6 +2018,15 @@ def isGFX12Plus :
   Predicate<"Subtarget->getGeneration() >= AMDGPUSubtarget::GFX12">,
   AssemblerPredicate<(all_of FeatureGFX12Insts)>;
 
+def HasMinimum3Maximum3F32 :
+  Predicate<"Subtarget->hasMinimum3Maximum3F32()">,
+  AssemblerPredicate<(all_of FeatureMinimum3Maximum3F32)>;
+
+def HasMinimum3Maximum3F16 :
+  Predicate<"Subtarget->hasMinimum3Maximum3F16()">,
+  AssemblerPredicate<(all_of FeatureMinimum3Maximum3F16)>;
+
+
 def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">,
   AssemblerPredicate<(all_of FeatureFlatAddressSpace)>;
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 1b06756a8a1016..2e7a06a15bd52a 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -242,7 +242,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasForceStoreSC0SC1 = false;
   bool HasRequiredExportPriority = false;
   bool HasVmemWriteVgprInOrder = false;
-
+  bool HasMinimum3Maximum3F32 = false;
+  bool HasMinimum3Maximum3F16 = false;
   bool RequiresCOV6 = false;
 
   // Dummy feature to use for assembler in tablegen.
@@ -1307,6 +1308,14 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   /// \returns true if the target has instructions with xf32 format support.
   bool hasXF32Insts() const { return HasXF32Insts; }
 
+  bool hasMinimum3Maximum3F32() const {
+    return HasMinimum3Maximum3F32;
+  }
+
+  bool hasMinimum3Maximum3F16() const {
+    return HasMinimum3Maximum3F16;
+  }
+
   /// \returns The maximum number of instructions that can be enclosed in an
   /// S_CLAUSE on the given subtarget, or 0 for targets that do not support that
   /// instruction.
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 34ecdb56e8689d..551e8b3a679202 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -226,7 +226,7 @@ let mayRaiseFPException = 0 in {
   defm V_MED3_F32 : VOP3Inst <"v_med3_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, AMDGPUfmed3>;
 } // End mayRaiseFPException = 0
 
-let SubtargetPredicate = isGFX12Plus, ReadsModeReg = 0 in {
+let SubtargetPredicate = HasMinimum3Maximum3F32, ReadsModeReg = 0 in {
   defm V_MINIMUM3_F32 : VOP3Inst <"v_minimum3_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, AMDGPUfminimum3>;
   defm V_MAXIMUM3_F32 : VOP3Inst <"v_maximum3_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, AMDGPUfmaximum3>;
 } // End SubtargetPredicate = isGFX12Plus, ReadsModeReg = 0
@@ -625,7 +625,7 @@ defm V_MAX3_F16 : VOP3Inst <"v_max3_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3
 defm V_MAX3_I16 : VOP3Inst <"v_max3_i16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>, AMDGPUsmax3>;
 defm V_MAX3_U16 : VOP3Inst <"v_max3_u16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>, AMDGPUumax3>;
 
-let SubtargetPredicate = isGFX12Plus, ReadsModeReg = 0 in {
+let SubtargetPredicate = HasMinimum3Maximum3F16, ReadsModeReg = 0 in {
   defm V_MINIMUM3_F16 : VOP3Inst <"v_minimum3_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, AMDGPUfminimum3>;
   defm V_MAXIMUM3_F16 : VOP3Inst <"v_maximum3_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, AMDGPUfmaximum3>;
 } // End SubtargetPredicate = isGFX12Plus, ReadsModeReg = 0

github-actions · 2024-11-15T01:38:01Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff a6fc489bb7a2e9fb3a7f70cccc181e4ee70374bf cde7770eb305155c42e82432084b308f4248723a --extensions h -- llvm/lib/Target/AMDGPU/GCNSubtarget.h

View the diff from clang-format here.

diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 2e7a06a15b..d68177c281 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -1308,13 +1308,9 @@ public:
   /// \returns true if the target has instructions with xf32 format support.
   bool hasXF32Insts() const { return HasXF32Insts; }
 
-  bool hasMinimum3Maximum3F32() const {
-    return HasMinimum3Maximum3F32;
-  }
+  bool hasMinimum3Maximum3F32() const { return HasMinimum3Maximum3F32; }
 
-  bool hasMinimum3Maximum3F16() const {
-    return HasMinimum3Maximum3F16;
-  }
+  bool hasMinimum3Maximum3F16() const { return HasMinimum3Maximum3F16; }
 
   /// \returns The maximum number of instructions that can be enclosed in an
   /// S_CLAUSE on the given subtarget, or 0 for targets that do not support that

arsenm · 2024-11-18T18:34:52Z

Merge activity

Nov 18, 1:34 PM EST: A user started a stack merge that includes this pull request via Graphite.
Nov 18, 1:42 PM EST: Graphite rebased this pull request as part of a merge.
Nov 18, 1:44 PM EST: A user merged this pull request with Graphite.

gfx12 and gfx950 managed to produce 3 different permutations of this feature. gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.

…vm#116308) gfx12 and gfx950 managed to produce 3 different permutations of this feature. gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16. Change-Id: I18fa032af449c832fa9a6b099a5ef5039c8e57fb

This was referenced Nov 15, 2024

AMDGPU: Add gfx950 subtarget definitions #116307

Merged

AMDGPU: Increase the LDS size to support to 160 KB for gfx950 #116309

Merged

AMDGPU: Add v_prng_b32 instruction for gfx950 #116310

Merged

arsenm added the backend:AMDGPU label Nov 15, 2024 — with Graphite App

arsenm requested review from jayfoad, kosarev, rampitec, scchan, shiltian and Sisyph November 15, 2024 01:36

arsenm marked this pull request as ready for review November 15, 2024 01:37

arsenm force-pushed the users/arsenm/gfx950/add-minimum3-maximum3-features branch from b99a4f4 to 1eebc85 Compare November 15, 2024 01:43

This was referenced Nov 15, 2024

AMDGPU: Add V_CVT_F32_BF16 for gfx950 #116311

Merged

AMDGPU: Add first gfx950 mfma instructions #116312

Merged

shiltian approved these changes Nov 15, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/gfx950/add-subtarget-definition branch from d6fb34c to 8bee1d6 Compare November 18, 2024 16:39

arsenm force-pushed the users/arsenm/gfx950/add-minimum3-maximum3-features branch from 1eebc85 to 5097aa7 Compare November 18, 2024 16:39

arsenm force-pushed the users/arsenm/gfx950/add-subtarget-definition branch 2 times, most recently from c6a6353 to fd4cc28 Compare November 18, 2024 18:39

Base automatically changed from users/arsenm/gfx950/add-subtarget-definition to main November 18, 2024 18:41

AMDGPU: Add subtarget features for minimum3/maximum3 instructions

cde7770

gfx12 and gfx950 managed to produce 3 different permutations of this feature. gfx12 supports f32 and f16, and gfx950 supports f32 and v2f16.

arsenm force-pushed the users/arsenm/gfx950/add-minimum3-maximum3-features branch from 5097aa7 to cde7770 Compare November 18, 2024 18:41

arsenm merged commit cab7328 into main Nov 18, 2024
4 of 6 checks passed

arsenm deleted the users/arsenm/gfx950/add-minimum3-maximum3-features branch November 18, 2024 18:44

This was referenced Nov 18, 2024

AMDGPU: Add V_CVT_PK_BF16_F32 for gfx950 #116678

Merged

AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 #116679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Add subtarget features for minimum3/maximum3 instructions #116308

AMDGPU: Add subtarget features for minimum3/maximum3 instructions #116308

Uh oh!

arsenm commented Nov 15, 2024 •

edited

Loading

Uh oh!

arsenm commented Nov 15, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 15, 2024

Uh oh!

github-actions bot commented Nov 15, 2024 •

edited

Loading

Uh oh!

arsenm commented Nov 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Add subtarget features for minimum3/maximum3 instructions #116308

AMDGPU: Add subtarget features for minimum3/maximum3 instructions #116308

Uh oh!

Conversation

arsenm commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 15, 2024

Uh oh!

github-actions bot commented Nov 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

arsenm commented Nov 15, 2024 •

edited

Loading

arsenm commented Nov 15, 2024 •

edited

Loading

github-actions bot commented Nov 15, 2024 •

edited

Loading

arsenm commented Nov 18, 2024 •

edited

Loading