-
Notifications
You must be signed in to change notification settings - Fork 13.6k
AMDGPU: Add a subtarget feature for fine-grained remote memory support #96442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Add a subtarget feature for fine-grained remote memory support #96442
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesAtomic access to fine-grained remote memory does not work on all Full diff: https://github.com/llvm/llvm-project/pull/96442.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 5bc0fe8bba608..7ff861f5b144d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,14 @@ def FeatureFlatAtomicFaddF32Inst
"Has flat_atomic_add_f32 instruction"
>;
+def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+ : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
+ "HasAgentScopeFineGrainedRemoteMemoryAtomics",
+ "true",
+ "Agent (device) scoped atomic operations not directly supported by "
+ "PCIe work for allocations in host or peer PCIe device memory"
+>;
+
def FeatureDefaultComponentZero : SubtargetFeature<"default-component-zero",
"HasDefaultComponentZero",
"true",
@@ -1207,7 +1215,8 @@ def FeatureGFX12 : GCNSubtargetFeatureGeneration<"GFX12",
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
FeatureTrue16BitInsts, FeatureDefaultComponentBroadcast,
FeatureMaxHardClauseLength32,
- FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
+ FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+ FeatureAgentScopeFineGrainedRemoteMemoryAtomics
]
>;
@@ -1415,7 +1424,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
FeatureBackOffBarrier,
FeatureKernargPreload,
FeatureAtomicFMinFMaxF64GlobalInsts,
- FeatureAtomicFMinFMaxF64FlatInsts
+ FeatureAtomicFMinFMaxF64FlatInsts,
+ FeatureAgentScopeFineGrainedRemoteMemoryAtomics
]>;
def FeatureISAVersion9_4_0 : FeatureSet<
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 966708db4f37c..c40efbdcf7f0b 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -174,6 +174,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool HasAtomicBufferPkAddBF16Inst = false;
bool HasFlatAtomicFaddF32Inst = false;
bool HasDefaultComponentZero = false;
+ bool HasAgentScopeFineGrainedRemoteMemoryAtomics = false;
bool HasDefaultComponentBroadcast = false;
/// The maximum number of instructions that may be placed within an S_CLAUSE,
/// which is one greater than the maximum argument to S_CLAUSE. A value of 0
@@ -871,6 +872,13 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
+ /// \return true if atomic operations targeting fine-grained memory work
+ /// correctly at device scope, in allocations in host or peer PCIe device
+ /// memory.
+ bool supportsAgentScopeFineGrainedRemoteMemoryAtomics() const {
+ return HasAgentScopeFineGrainedRemoteMemoryAtomics;
+ }
+
bool hasDefaultComponentZero() const { return HasDefaultComponentZero; }
bool hasDefaultComponentBroadcast() const {
|
Hmm, I might need to reparse this but I'm not entirely sure how this is expected to work in general since PCI-E attachment is not always / entirely knowable at compile time. Have I missed some preamble discussion here? |
We do statically know for some of the targets (mostly gfx12 and gfx940) that it's supposed to work. This is the "scope downgrade" vs. "nop" cases in the atomic support table |
Actually not, we do not know the bus. Moreover, we know this is opposite. |
On gfx940/gfx12, we don't need to know the bus to handle the agent scope case correctly. The instructions still function, just always at device scope. We do need to know the bus and/or location to handle the system scope correctly, which will come from the new metadata. |
If a sub target does not have this feature, does none of the atomic instructions work for fine-grained remote memory, including integer atomic add/xchg/cmpxchg? |
No, the string description mentions this is for the non-PCIe supported cases |
need some tests |
This does nothing as it is. The real use patches have thousands of tests |
2da0565
to
1a441c0
Compare
1a441c0
to
302a99a
Compare
438d5bb
to
53a120c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
53a120c
to
8a87e14
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
e0ae621
to
1a6ff86
Compare
#include "llvm/Pass.h" | ||
#include "llvm/Support/BranchProbability.h" | ||
|
||
namespace llvm { | ||
|
||
class MachineBranchProbabilityInfo { | ||
class MachineBranchProbabilityInfo : public ImmutablePass { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the PR include all this unrelated stuff? Which part am I supposed to review? Normally I just look at the "Files changed" tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parts I actually committed. I don't know why "martinboehme and others added 14 commits 12 hours ago" happened, adding all this other stuff to my personal branch. This has happened several other times, but not consistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't happening for the other patches further up the stack, just this one for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how it ended up like this but I've gotten rid of these commits
a060a2a
to
76190f2
Compare
@@ -788,6 +788,16 @@ def FeatureFlatAtomicFaddF32Inst | |||
"Has flat_atomic_add_f32 instruction" | |||
>; | |||
|
|||
def FeatureAgentScopeFineGrainedRemoteMemoryAtomics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer using "Device" (hardware terminology) instead of "Agent" (terminology from some compute API??) in the hardware feature name. Although I see "agent" is already pervasive in AMDGPUUsage so perhaps that ship has sailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally went with device and changed it to agent to match the syncscope spelling. This was the HSA name
ping |
Atomic access to fine-grained remote memory does not work on all subtargets. Add a feature for targets where this is expected to work.
76190f2
to
2b6a7bb
Compare
llvm#96442) Atomic access to fine-grained remote memory does not work on all subtargets. Add a feature for targets where this is expected to work.
Atomic access to fine-grained remote memory does not work on all
subtargets. Add a feature for targets where this is expected to work.