-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Offload][AMDGPU] Only allow memory pool access to valid agents #93969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool.
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-offload Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/93969.diff 1 Files Affected:
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 2a9503333c199..c6dd954746e4a 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -307,6 +307,15 @@ struct AMDGPUMemoryPoolTy {
return Plugin::check(Status, "Error in hsa_amd_memory_pool_free: %s");
}
+ /// Returns if the \p Agent can access the memory pool.
+ bool canAccess(hsa_agent_t Agent) {
+ hsa_amd_memory_pool_access_t Access;
+ if (hsa_amd_agent_memory_pool_get_info(
+ Agent, MemoryPool, HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS, &Access))
+ return false;
+ return Access != HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED;
+ }
+
/// Allow the device to access a specific allocation.
Error enableAccess(void *Ptr, int64_t Size,
const llvm::SmallVector<hsa_agent_t> &Agents) const {
@@ -3407,10 +3416,14 @@ void *AMDGPUMemoryManagerTy::allocate(size_t Size, void *HstPtr,
}
assert(Ptr && "Invalid pointer");
- auto &KernelAgents = Plugin.getKernelAgents();
+ // Get a list of agents that can access this memory pool.
+ llvm::SmallVector<hsa_agent_t> Agents;
+ llvm::copy_if(
+ Plugin.getKernelAgents(), std::back_inserter(Agents),
+ [&](hsa_agent_t Agent) { return MemoryPool->canAccess(Agent); });
- // Allow all kernel agents to access the allocation.
- if (auto Err = MemoryPool->enableAccess(Ptr, Size, KernelAgents)) {
+ // Allow all valid kernel agents to access the allocation.
+ if (auto Err = MemoryPool->enableAccess(Ptr, Size, Agents)) {
REPORT("%s\n", toString(std::move(Err)).data());
return nullptr;
}
@@ -3450,13 +3463,17 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
}
if (Alloc) {
- auto &KernelAgents =
- static_cast<AMDGPUPluginTy &>(Plugin).getKernelAgents();
- // Inherently necessary for host or shared allocations
- // Also enabled for device memory to allow device to device memcpy
-
- // Enable all kernel agents to access the buffer.
- if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) {
+ // Get a list of agents that can access this memory pool. Inherently
+ // necessary for host or shared allocations Also enabled for device memory
+ // to allow device to device memcpy
+ llvm::SmallVector<hsa_agent_t> Agents;
+ llvm::copy_if(static_cast<AMDGPUPluginTy &>(Plugin).getKernelAgents(),
+ std::back_inserter(Agents), [&](hsa_agent_t Agent) {
+ return MemoryPool->canAccess(Agent);
+ });
+
+ // Enable all valid kernel agents to access the buffer.
+ if (auto Err = MemoryPool->enableAccess(Alloc, Size, Agents)) {
REPORT("%s\n", toString(std::move(Err)).data());
return nullptr;
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, but I think this should be reworked to minimize what we do per allocation.
…#93969) Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool. Change-Id: I4761963f82a2c8ddcf152ba254f6d662c495dd4a
…#93969) Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool. Change-Id: I4761963f82a2c8ddcf152ba254f6d662c495dd4a
Summary:
The logic since the next-gen plugins was added was that every single
agent would get access to a memory pool we allocated. This is necessary
for things like fine-grained memory and to faciliate d2d copied.
However, there are cases where an agent cannot legally access a memory
pool. We have a debug check for this, but it would always be triggered
in these situations because both uses of the function simply passed
every agent. This patch changes the behavior by only enabling memory
pool access for agents that can access the memory pool.