-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lldb] Make sure the process is stopped when computing the symbol context #134757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…text Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread. Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped.
@llvm/pr-subscribers-lldb Author: Jonas Devlieghere (JDevlieghere) ChangesMake sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread. Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped. Full diff: https://github.com/llvm/llvm-project/pull/134757.diff 1 Files Affected:
diff --git a/lldb/source/Core/Statusline.cpp b/lldb/source/Core/Statusline.cpp
index b7650503e16bc..a2ecebbefbfb1 100644
--- a/lldb/source/Core/Statusline.cpp
+++ b/lldb/source/Core/Statusline.cpp
@@ -12,6 +12,7 @@
#include "lldb/Host/StreamFile.h"
#include "lldb/Interpreter/CommandInterpreter.h"
#include "lldb/Symbol/SymbolContext.h"
+#include "lldb/Target/Process.h"
#include "lldb/Target/StackFrame.h"
#include "lldb/Utility/AnsiTerminal.h"
#include "lldb/Utility/StreamString.h"
@@ -126,9 +127,7 @@ void Statusline::Redraw(bool update) {
return;
}
- StreamString stream;
- ExecutionContext exe_ctx =
- m_debugger.GetCommandInterpreter().GetExecutionContext();
+ ExecutionContext exe_ctx = m_debugger.GetSelectedExecutionContext();
// For colors and progress events, the format entity needs access to the
// debugger, which requires a target in the execution context.
@@ -136,9 +135,15 @@ void Statusline::Redraw(bool update) {
exe_ctx.SetTargetPtr(&m_debugger.GetSelectedOrDummyTarget());
SymbolContext symbol_ctx;
- if (auto frame_sp = exe_ctx.GetFrameSP())
- symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything);
+ if (ProcessSP process_sp = exe_ctx.GetProcessSP()) {
+ Process::StopLocker stop_locker;
+ if (stop_locker.TryLock(&process_sp->GetRunLock())) {
+ if (auto frame_sp = exe_ctx.GetFrameSP())
+ symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything);
+ }
+ }
+ StreamString stream;
if (auto *format = m_debugger.GetStatuslineFormat())
FormatEntity::Format(*format, stream, &symbol_ctx, &exe_ctx, nullptr,
nullptr, false, false);
|
symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything); | ||
if (ProcessSP process_sp = exe_ctx.GetProcessSP()) { | ||
Process::StopLocker stop_locker; | ||
if (stop_locker.TryLock(&process_sp->GetRunLock())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we comment why we're doing this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
…text (llvm#134757) Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread. Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped. (cherry picked from commit e84a804)
…-e84a80408523 [🍒 swift/release/6.2] [lldb] Make sure the process is stopped when computing the symbol context (llvm#134757)
When using LLDB, built with assertions enabled (cmake -DLLVM_ENABLE_ASSERTIONS=ON), this is appears to be causing some bad behavior, sometimes (there's a tiny race condition?). If I start up a program under LLDB and then give it a 'run' command (or set a breakpoint on 'main' and try to run), I get an error to the effect that the process is already running:
Sometimes, if I type the 'run' command very fast I don't get the error, I get normal execution. If I revert this change (locally) I no longer see this 'bad' behavior. |
Thanks for the report, @cmtice. I haven't seen this issue and I'm pretty baffled. I wonder if it has something to do with the implementation of GetRunLock:
I presume that somehow we end up in situation where one is locked and the other is not. I'll try to do some investigating. |
I can see this too, and can reproduce it pretty much 100% of the time. The It's also possible this doesn't reproduce on darwin because it goes through a different launching code path (basically it never launches, only attaches). I think we'll need to revert this for now, as it makes it more or less impossible to have a normal debug session (things actually work if you type "continue" after this error, but there's no indication that that is an option). |
…mbol context (llvm#134757)" This reverts commit e84a804 because on Linux there seems to be a race around GetRunLock.
I put up a revert in #135408. I'm going to see if I can reproduce this myself on Linux. I want to see what happens if the statusline uses the public API mutex, I don't think it needs the private one at all. If I can't reproduce I might ask one of you to try out a small patch. |
I assume something else changed in the meantime (maybe the extra locking added in #134759?) but now if I revert this change I always crash on
Edit: nvm, this is unrelated and caused by 2fd860c1f559c0b0be66cc000e38270a04d0a1a3 |
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the call to ProcessRunLock::TrySetRunning. When that fails, we get a somewhat misleading error message: > process resume at entry point failed: Resume request failed - process still running. This is incorrect: the problem was not that the process was in a running state, but rather that the RunLock was being held by another thread (i.e. the Statusline). TrySetRunning would return false in both cases and the call site only accounted for the former. Besides the odd semantics, the current implementation is inherently race-y and I believe incorrect. If someone is holding the RunLock, the resume call should block, rather than give up, and with the lock held, switch the running state and report the old running state. This patch removes ProcessRunLock::TrySetRunning and updates all callers to use ProcessRunLock::SetRunning instead. To support that, ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for consistency) now report whether the process was stopped or running respectively. Previously, both methods returned true unconditionally. The old code has been around pretty much pretty much forever, there's nothing in the git history to indicate that this was done purposely to solve a particular issue. I've tested this on both Linux and macOS and confirmed that this solves the statusline issue. A big thank you to Jim for reviewing my proposed solution offline and trying to poke holes in it.
I traced the issue reported by Caroline and Pavel in #134757 back to the call to ProcessRunLock::TrySetRunning. When that fails, we get a somewhat misleading error message: > process resume at entry point failed: Resume request failed - process still running. This is incorrect: the problem was not that the process was in a running state, but rather that the RunLock was being held by another thread (i.e. the Statusline). TrySetRunning would return false in both cases and the call site only accounted for the former. Besides the odd semantics, the current implementation is inherently race-y and I believe incorrect. If someone is holding the RunLock, the resume call should block, rather than give up, and with the lock held, switch the running state and report the old running state. This patch removes ProcessRunLock::TrySetRunning and updates all callers to use ProcessRunLock::SetRunning instead. To support that, ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for consistency) now report whether the process was stopped or running respectively. Previously, both methods returned true unconditionally. The old code has been around pretty much pretty much forever, there's nothing in the git history to indicate that this was done purposely to solve a particular issue. I've tested this on both Linux and macOS and confirmed that this solves the statusline issue. A big thank you to Jim for reviewing my proposed solution offline and trying to poke holes in it.
…text (llvm#134757) Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread. Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped.
…mbol context (llvm#134757)" (llvm#135408) This reverts commit e84a804 because on Linux there seems to be a race around GetRunLock. See llvm#134757 for more context.
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the call to ProcessRunLock::TrySetRunning. When that fails, we get a somewhat misleading error message: > process resume at entry point failed: Resume request failed - process still running. This is incorrect: the problem was not that the process was in a running state, but rather that the RunLock was being held by another thread (i.e. the Statusline). TrySetRunning would return false in both cases and the call site only accounted for the former. Besides the odd semantics, the current implementation is inherently race-y and I believe incorrect. If someone is holding the RunLock, the resume call should block, rather than give up, and with the lock held, switch the running state and report the old running state. This patch removes ProcessRunLock::TrySetRunning and updates all callers to use ProcessRunLock::SetRunning instead. To support that, ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for consistency) now report whether the process was stopped or running respectively. Previously, both methods returned true unconditionally. The old code has been around pretty much pretty much forever, there's nothing in the git history to indicate that this was done purposely to solve a particular issue. I've tested this on both Linux and macOS and confirmed that this solves the statusline issue. A big thank you to Jim for reviewing my proposed solution offline and trying to poke holes in it.
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the call to ProcessRunLock::TrySetRunning. When that fails, we get a somewhat misleading error message: > process resume at entry point failed: Resume request failed - process still running. This is incorrect: the problem was not that the process was in a running state, but rather that the RunLock was being held by another thread (i.e. the Statusline). TrySetRunning would return false in both cases and the call site only accounted for the former. Besides the odd semantics, the current implementation is inherently race-y and I believe incorrect. If someone is holding the RunLock, the resume call should block, rather than give up, and with the lock held, switch the running state and report the old running state. This patch removes ProcessRunLock::TrySetRunning and updates all callers to use ProcessRunLock::SetRunning instead. To support that, ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for consistency) now report whether the process was stopped or running respectively. Previously, both methods returned true unconditionally. The old code has been around pretty much pretty much forever, there's nothing in the git history to indicate that this was done purposely to solve a particular issue. I've tested this on both Linux and macOS and confirmed that this solves the statusline issue. A big thank you to Jim for reviewing my proposed solution offline and trying to poke holes in it. (cherry picked from commit d792094)
Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread.
Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped.