Skip to content

[libomptarget][nextgen-plugin] Always use a signal to trigger completion of H2D data transfers #83475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 8 additions & 15 deletions openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1354,12 +1354,10 @@ struct AMDGPUStreamTy {
Signal->increaseUseCount();
}

AMDGPUSignalTy *OutputSignal = OutputSignals[0];

std::lock_guard<std::mutex> Lock(Mutex);

// Consume stream slot and compute dependencies.
auto [Curr, InputSignal] = consume(OutputSignal);
auto [Curr, InputSignal] = consume(OutputSignals[0]);

// Avoid defining the input dependency if already satisfied.
if (InputSignal && !InputSignal->load())
Expand All @@ -1383,22 +1381,17 @@ struct AMDGPUStreamTy {
if (auto Err = Plugin::check(Status,
"Error in hsa_amd_signal_async_handler: %s"))
return Err;

// Let's use now the second output signal.
OutputSignal = OutputSignals[1];

// Consume another stream slot and compute dependencies.
std::tie(Curr, InputSignal) = consume(OutputSignal);
} else {
// All preceding operations completed, copy the memory synchronously.
std::memcpy(Inter, Src, CopySize);

// Return the second signal because it will not be used.
OutputSignals[1]->decreaseUseCount();
if (auto Err = SignalManager.returnResource(OutputSignals[1]))
return Err;
// Signal the end of the operation.
Slots[Curr].Signal->signal();
}

// Consume another stream slot and compute dependencies.
std::tie(Curr, InputSignal) = consume(OutputSignals[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for std::tie, we're on C++17.


// Setup the post action to release the intermediate pinned buffer.
if (auto Err = Slots[Curr].schedReleaseBuffer(Inter, MemoryManager))
return Err;
Expand All @@ -1409,10 +1402,10 @@ struct AMDGPUStreamTy {
hsa_signal_t InputSignalRaw = InputSignal->get();
return utils::asyncMemCopy(UseMultipleSdmaEngines, Dst, Agent, Inter,
Agent, CopySize, 1, &InputSignalRaw,
OutputSignal->get());
OutputSignals[1]->get());
}
return utils::asyncMemCopy(UseMultipleSdmaEngines, Dst, Agent, Inter, Agent,
CopySize, 0, nullptr, OutputSignal->get());
CopySize, 0, nullptr, OutputSignals[1]->get());
}

// AMDGPUDeviceTy is incomplete here, passing the underlying agent instead
Expand Down