[Offload][AMDGPU] Only allow memory pool access to valid agents #93969

jhuber6 · 2024-05-31T14:32:38Z

Summary:
The logic since the next-gen plugins was added was that every single
agent would get access to a memory pool we allocated. This is necessary
for things like fine-grained memory and to faciliate d2d copied.
However, there are cases where an agent cannot legally access a memory
pool. We have a debug check for this, but it would always be triggered
in these situations because both uses of the function simply passed
every agent. This patch changes the behavior by only enabling memory
pool access for agents that can access the memory pool.

Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool.

llvmbot · 2024-05-31T14:33:10Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
The logic since the next-gen plugins was added was that every single
agent would get access to a memory pool we allocated. This is necessary
for things like fine-grained memory and to faciliate d2d copied.
However, there are cases where an agent cannot legally access a memory
pool. We have a debug check for this, but it would always be triggered
in these situations because both uses of the function simply passed
every agent. This patch changes the behavior by only enabling memory
pool access for agents that can access the memory pool.

Full diff: https://github.com/llvm/llvm-project/pull/93969.diff

1 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+27-10)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 2a9503333c199..c6dd954746e4a 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -307,6 +307,15 @@ struct AMDGPUMemoryPoolTy {
     return Plugin::check(Status, "Error in hsa_amd_memory_pool_free: %s");
   }
 
+  /// Returns if the \p Agent can access the memory pool.
+  bool canAccess(hsa_agent_t Agent) {
+    hsa_amd_memory_pool_access_t Access;
+    if (hsa_amd_agent_memory_pool_get_info(
+            Agent, MemoryPool, HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS, &Access))
+      return false;
+    return Access != HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED;
+  }
+
   /// Allow the device to access a specific allocation.
   Error enableAccess(void *Ptr, int64_t Size,
                      const llvm::SmallVector<hsa_agent_t> &Agents) const {
@@ -3407,10 +3416,14 @@ void *AMDGPUMemoryManagerTy::allocate(size_t Size, void *HstPtr,
   }
   assert(Ptr && "Invalid pointer");
 
-  auto &KernelAgents = Plugin.getKernelAgents();
+  // Get a list of agents that can access this memory pool.
+  llvm::SmallVector<hsa_agent_t> Agents;
+  llvm::copy_if(
+      Plugin.getKernelAgents(), std::back_inserter(Agents),
+      [&](hsa_agent_t Agent) { return MemoryPool->canAccess(Agent); });
 
-  // Allow all kernel agents to access the allocation.
-  if (auto Err = MemoryPool->enableAccess(Ptr, Size, KernelAgents)) {
+  // Allow all valid kernel agents to access the allocation.
+  if (auto Err = MemoryPool->enableAccess(Ptr, Size, Agents)) {
     REPORT("%s\n", toString(std::move(Err)).data());
     return nullptr;
   }
@@ -3450,13 +3463,17 @@ void *AMDGPUDeviceTy::allocate(size_t Size, void *, TargetAllocTy Kind) {
   }
 
   if (Alloc) {
-    auto &KernelAgents =
-        static_cast<AMDGPUPluginTy &>(Plugin).getKernelAgents();
-    // Inherently necessary for host or shared allocations
-    // Also enabled for device memory to allow device to device memcpy
-
-    // Enable all kernel agents to access the buffer.
-    if (auto Err = MemoryPool->enableAccess(Alloc, Size, KernelAgents)) {
+    // Get a list of agents that can access this memory pool. Inherently
+    // necessary for host or shared allocations Also enabled for device memory
+    // to allow device to device memcpy
+    llvm::SmallVector<hsa_agent_t> Agents;
+    llvm::copy_if(static_cast<AMDGPUPluginTy &>(Plugin).getKernelAgents(),
+                  std::back_inserter(Agents), [&](hsa_agent_t Agent) {
+                    return MemoryPool->canAccess(Agent);
+                  });
+
+    // Enable all valid kernel agents to access the buffer.
+    if (auto Err = MemoryPool->enableAccess(Alloc, Size, Agents)) {
       REPORT("%s\n", toString(std::move(Err)).data());
       return nullptr;
     }

jdoerfert

LG, but I think this should be reworked to minimize what we do per allocation.

…#93969) Summary: The logic since the next-gen plugins was added was that every single agent would get access to a memory pool we allocated. This is necessary for things like fine-grained memory and to faciliate d2d copied. However, there are cases where an agent cannot legally access a memory pool. We have a debug check for this, but it would always be triggered in these situations because both uses of the function simply passed every agent. This patch changes the behavior by only enabling memory pool access for agents that can access the memory pool. Change-Id: I4761963f82a2c8ddcf152ba254f6d662c495dd4a

jhuber6 requested review from jdoerfert, JonChesterfield, jplehr, ronlieb, saiislam and shiltian May 31, 2024 14:32

llvmbot added backend:AMDGPU offload labels May 31, 2024

jdoerfert approved these changes May 31, 2024

View reviewed changes

jhuber6 merged commit e19565c into llvm:main May 31, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Offload][AMDGPU] Only allow memory pool access to valid agents #93969

[Offload][AMDGPU] Only allow memory pool access to valid agents #93969

Uh oh!

jhuber6 commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024 •

edited

Loading

Uh oh!

jdoerfert left a comment

Uh oh!

Uh oh!

Uh oh!

[Offload][AMDGPU] Only allow memory pool access to valid agents #93969

[Offload][AMDGPU] Only allow memory pool access to valid agents #93969

Uh oh!

Conversation

jhuber6 commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented May 31, 2024 •

edited

Loading