-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[libomptarget][nextgen-plugin] Always use a signal to trigger completion of H2D data transfers #83475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Gheorghe-Teodor Bercea (doru1004) ChangesAlways use a signal to trigger completion of H2D data transfers. Full diff: https://github.com/llvm/llvm-project/pull/83475.diff 1 Files Affected:
diff --git a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
index 81634ae1edc490..55794d8bbf2264 100644
--- a/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -1354,12 +1354,10 @@ struct AMDGPUStreamTy {
Signal->increaseUseCount();
}
- AMDGPUSignalTy *OutputSignal = OutputSignals[0];
-
std::lock_guard<std::mutex> Lock(Mutex);
// Consume stream slot and compute dependencies.
- auto [Curr, InputSignal] = consume(OutputSignal);
+ auto [Curr, InputSignal] = consume(OutputSignals[0]);
// Avoid defining the input dependency if already satisfied.
if (InputSignal && !InputSignal->load())
@@ -1383,22 +1381,17 @@ struct AMDGPUStreamTy {
if (auto Err = Plugin::check(Status,
"Error in hsa_amd_signal_async_handler: %s"))
return Err;
-
- // Let's use now the second output signal.
- OutputSignal = OutputSignals[1];
-
- // Consume another stream slot and compute dependencies.
- std::tie(Curr, InputSignal) = consume(OutputSignal);
} else {
// All preceding operations completed, copy the memory synchronously.
std::memcpy(Inter, Src, CopySize);
- // Return the second signal because it will not be used.
- OutputSignals[1]->decreaseUseCount();
- if (auto Err = SignalManager.returnResource(OutputSignals[1]))
- return Err;
+ // Signal the end of the operation.
+ Slots[Curr].Signal->signal();
}
+ // Consume another stream slot and compute dependencies.
+ std::tie(Curr, InputSignal) = consume(OutputSignals[1]);
+
// Setup the post action to release the intermediate pinned buffer.
if (auto Err = Slots[Curr].schedReleaseBuffer(Inter, MemoryManager))
return Err;
@@ -1409,10 +1402,10 @@ struct AMDGPUStreamTy {
hsa_signal_t InputSignalRaw = InputSignal->get();
return utils::asyncMemCopy(UseMultipleSdmaEngines, Dst, Agent, Inter,
Agent, CopySize, 1, &InputSignalRaw,
- OutputSignal->get());
+ OutputSignals[1]->get());
}
return utils::asyncMemCopy(UseMultipleSdmaEngines, Dst, Agent, Inter, Agent,
- CopySize, 0, nullptr, OutputSignal->get());
+ CopySize, 0, nullptr, OutputSignals[1]->get());
}
// AMDGPUDeviceTy is incomplete here, passing the underlying agent instead
|
Could you add some more context to your commit messages? Just from looking at it I can't tell why this is a necessary change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work, let's hope it doesn't break anything.
} | ||
|
||
// Consume another stream slot and compute dependencies. | ||
std::tie(Curr, InputSignal) = consume(OutputSignals[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for std::tie
, we're on C++17.
Always use a signal to trigger completion of H2D data transfers.
I refactored this to avoid relying solely on the built-in synchronous behavior of memcpy and reuse the signal-based approach used everywhere else in the asynchronous implementation. From a consistency perspective this code now performs the same steps as the
asyncActionCallback
which also does amemcpy()
+Slot.Signal->signal()
.