Skip to content

AMDGPU: Add a subtarget feature for fine-grained remote memory support #96442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jun 23, 2024

Atomic access to fine-grained remote memory does not work on all
subtargets. Add a feature for targets where this is expected to work.

@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Atomic access to fine-grained remote memory does not work on all
subtargets. Add a feature for targets where this is expected to work.


Full diff: https://github.com/llvm/llvm-project/pull/96442.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+12-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+8)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 5bc0fe8bba608..7ff861f5b144d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,14 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+  : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
+  "HasAgentScopeFineGrainedRemoteMemoryAtomics",
+  "true",
+  "Agent (device) scoped atomic operations not directly supported by "
+  "PCIe work for allocations in host or peer PCIe device memory"
+>;
+
 def FeatureDefaultComponentZero : SubtargetFeature<"default-component-zero",
   "HasDefaultComponentZero",
   "true",
@@ -1207,7 +1215,8 @@ def FeatureGFX12 : GCNSubtargetFeatureGeneration<"GFX12",
    FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
    FeatureTrue16BitInsts, FeatureDefaultComponentBroadcast,
    FeatureMaxHardClauseLength32,
-   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+   FeatureAgentScopeFineGrainedRemoteMemoryAtomics
   ]
 >;
 
@@ -1415,7 +1424,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
    FeatureBackOffBarrier,
    FeatureKernargPreload,
    FeatureAtomicFMinFMaxF64GlobalInsts,
-   FeatureAtomicFMinFMaxF64FlatInsts
+   FeatureAtomicFMinFMaxF64FlatInsts,
+   FeatureAgentScopeFineGrainedRemoteMemoryAtomics
    ]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 966708db4f37c..c40efbdcf7f0b 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -174,6 +174,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasAtomicBufferPkAddBF16Inst = false;
   bool HasFlatAtomicFaddF32Inst = false;
   bool HasDefaultComponentZero = false;
+  bool HasAgentScopeFineGrainedRemoteMemoryAtomics = false;
   bool HasDefaultComponentBroadcast = false;
   /// The maximum number of instructions that may be placed within an S_CLAUSE,
   /// which is one greater than the maximum argument to S_CLAUSE. A value of 0
@@ -871,6 +872,13 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
 
+  /// \return true if atomic operations targeting fine-grained memory work
+  /// correctly at device scope, in allocations in host or peer PCIe device
+  /// memory.
+  bool supportsAgentScopeFineGrainedRemoteMemoryAtomics() const {
+    return HasAgentScopeFineGrainedRemoteMemoryAtomics;
+  }
+
   bool hasDefaultComponentZero() const { return HasDefaultComponentZero; }
 
   bool hasDefaultComponentBroadcast() const {

@arsenm arsenm marked this pull request as ready for review June 23, 2024 20:19
@AlexVlx
Copy link
Contributor

AlexVlx commented Jun 23, 2024

Hmm, I might need to reparse this but I'm not entirely sure how this is expected to work in general since PCI-E attachment is not always / entirely knowable at compile time. Have I missed some preamble discussion here?

@arsenm
Copy link
Contributor Author

arsenm commented Jun 23, 2024

Hmm, I might need to reparse this but I'm not entirely sure how this is expected to work in general since PCI-E attachment is not always / entirely knowable at compile time. Have I missed some preamble discussion here?

We do statically know for some of the targets (mostly gfx12 and gfx940) that it's supposed to work. This is the "scope downgrade" vs. "nop" cases in the atomic support table

@rampitec
Copy link
Collaborator

We do statically know for some of the targets (mostly gfx12 and gfx940) that it's supposed to work. This is the "scope downgrade" vs. "nop" cases in the atomic support table

Actually not, we do not know the bus. Moreover, we know this is opposite.

@arsenm
Copy link
Contributor Author

arsenm commented Jun 24, 2024

Actually not, we do not know the bus. Moreover, we know this is opposite.

On gfx940/gfx12, we don't need to know the bus to handle the agent scope case correctly. The instructions still function, just always at device scope. We do need to know the bus and/or location to handle the system scope correctly, which will come from the new metadata.

@yxsamliu
Copy link
Collaborator

If a sub target does not have this feature, does none of the atomic instructions work for fine-grained remote memory, including integer atomic add/xchg/cmpxchg?

@arsenm
Copy link
Contributor Author

arsenm commented Jun 24, 2024

No, the string description mentions this is for the non-PCIe supported cases

@yxsamliu
Copy link
Collaborator

need some tests

@arsenm
Copy link
Contributor Author

arsenm commented Jun 24, 2024

need some tests

This does nothing as it is. The real use patches have thousands of tests

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 2da0565 to 1a441c0 Compare June 25, 2024 09:10
Base automatically changed from users/arsenm/amdgpu-atomicrmw-fadd-buffer-v2bf16 to main June 25, 2024 15:45
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 1a441c0 to 302a99a Compare June 25, 2024 22:32
Copy link
Contributor Author

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 53a120c to 8a87e14 Compare July 2, 2024 17:01
Copy link

github-actions bot commented Jul 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch 2 times, most recently from e0ae621 to 1a6ff86 Compare July 3, 2024 21:41
#include "llvm/Pass.h"
#include "llvm/Support/BranchProbability.h"

namespace llvm {

class MachineBranchProbabilityInfo {
class MachineBranchProbabilityInfo : public ImmutablePass {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the PR include all this unrelated stuff? Which part am I supposed to review? Normally I just look at the "Files changed" tab.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parts I actually committed. I don't know why "martinboehme and others added 14 commits 12 hours ago" happened, adding all this other stuff to my personal branch. This has happened several other times, but not consistently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't happening for the other patches further up the stack, just this one for some reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how it ended up like this but I've gotten rid of these commits

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch 2 times, most recently from a060a2a to 76190f2 Compare July 4, 2024 09:42
@@ -788,6 +788,16 @@ def FeatureFlatAtomicFaddF32Inst
"Has flat_atomic_add_f32 instruction"
>;

def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer using "Device" (hardware terminology) instead of "Agent" (terminology from some compute API??) in the hardware feature name. Although I see "agent" is already pervasive in AMDGPUUsage so perhaps that ship has sailed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally went with device and changed it to agent to match the syncscope spelling. This was the HSA name

@arsenm
Copy link
Contributor Author

arsenm commented Jul 10, 2024

ping

Copy link
Contributor Author

arsenm commented Jul 10, 2024

Merge activity

  • Jul 10, 8:37 AM EDT: @arsenm started a stack merge that includes this pull request via Graphite.
  • Jul 10, 8:41 AM EDT: Graphite rebased this pull request as part of a merge.
  • Jul 10, 8:43 AM EDT: @arsenm merged this pull request with Graphite.

Atomic access to fine-grained remote memory does not work on all
subtargets. Add a feature for targets where this is expected to work.
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 76190f2 to 2b6a7bb Compare July 10, 2024 12:40
@arsenm arsenm merged commit 78dcd02 into main Jul 10, 2024
4 of 6 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch July 10, 2024 12:43
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
llvm#96442)

Atomic access to fine-grained remote memory does not work on all
subtargets. Add a feature for targets where this is expected to work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants