Skip to content

[lldb] Make sure the process is stopped when computing the symbol context #134757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 8, 2025

Conversation

JDevlieghere
Copy link
Member

Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread.

Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped.

…text

Make sure the process is stopped when computing the symbol context. Both
Adrian and Felipe reported a handful of crashes in GetSymbolContext
called from Statusline::Redraw on the default event thread.

Given that we're handling a StackFrameSP, it's not clear to me how that
could have gotten invalidated, but Jim points out that it doesn't make
sense to compute the symbol context for the frame when the process isn't
stopped.
@JDevlieghere JDevlieghere marked this pull request as ready for review April 8, 2025 00:14
@llvmbot llvmbot added the lldb label Apr 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Make sure the process is stopped when computing the symbol context. Both Adrian and Felipe reported a handful of crashes in GetSymbolContext called from Statusline::Redraw on the default event thread.

Given that we're handling a StackFrameSP, it's not clear to me how that could have gotten invalidated, but Jim points out that it doesn't make sense to compute the symbol context for the frame when the process isn't stopped.


Full diff: https://github.com/llvm/llvm-project/pull/134757.diff

1 Files Affected:

  • (modified) lldb/source/Core/Statusline.cpp (+10-5)
diff --git a/lldb/source/Core/Statusline.cpp b/lldb/source/Core/Statusline.cpp
index b7650503e16bc..a2ecebbefbfb1 100644
--- a/lldb/source/Core/Statusline.cpp
+++ b/lldb/source/Core/Statusline.cpp
@@ -12,6 +12,7 @@
 #include "lldb/Host/StreamFile.h"
 #include "lldb/Interpreter/CommandInterpreter.h"
 #include "lldb/Symbol/SymbolContext.h"
+#include "lldb/Target/Process.h"
 #include "lldb/Target/StackFrame.h"
 #include "lldb/Utility/AnsiTerminal.h"
 #include "lldb/Utility/StreamString.h"
@@ -126,9 +127,7 @@ void Statusline::Redraw(bool update) {
     return;
   }
 
-  StreamString stream;
-  ExecutionContext exe_ctx =
-      m_debugger.GetCommandInterpreter().GetExecutionContext();
+  ExecutionContext exe_ctx = m_debugger.GetSelectedExecutionContext();
 
   // For colors and progress events, the format entity needs access to the
   // debugger, which requires a target in the execution context.
@@ -136,9 +135,15 @@ void Statusline::Redraw(bool update) {
     exe_ctx.SetTargetPtr(&m_debugger.GetSelectedOrDummyTarget());
 
   SymbolContext symbol_ctx;
-  if (auto frame_sp = exe_ctx.GetFrameSP())
-    symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything);
+  if (ProcessSP process_sp = exe_ctx.GetProcessSP()) {
+    Process::StopLocker stop_locker;
+    if (stop_locker.TryLock(&process_sp->GetRunLock())) {
+      if (auto frame_sp = exe_ctx.GetFrameSP())
+        symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything);
+    }
+  }
 
+  StreamString stream;
   if (auto *format = m_debugger.GetStatuslineFormat())
     FormatEntity::Format(*format, stream, &symbol_ctx, &exe_ctx, nullptr,
                          nullptr, false, false);

symbol_ctx = frame_sp->GetSymbolContext(eSymbolContextEverything);
if (ProcessSP process_sp = exe_ctx.GetProcessSP()) {
Process::StopLocker stop_locker;
if (stop_locker.TryLock(&process_sp->GetRunLock())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we comment why we're doing this here?

Copy link
Collaborator

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@JDevlieghere JDevlieghere merged commit e84a804 into llvm:main Apr 8, 2025
10 checks passed
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 8, 2025
…text (llvm#134757)

Make sure the process is stopped when computing the symbol context. Both
Adrian and Felipe reported a handful of crashes in GetSymbolContext
called from Statusline::Redraw on the default event thread.

Given that we're handling a StackFrameSP, it's not clear to me how that
could have gotten invalidated, but Jim points out that it doesn't make
sense to compute the symbol context for the frame when the process isn't
stopped.

(cherry picked from commit e84a804)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 9, 2025
…-e84a80408523

[🍒 swift/release/6.2] [lldb] Make sure the process is stopped when computing the symbol context (llvm#134757)
@JDevlieghere JDevlieghere deleted the statusline-stop-locker branch April 10, 2025 18:47
@cmtice
Copy link
Contributor

cmtice commented Apr 10, 2025

When using LLDB, built with assertions enabled (cmake -DLLVM_ENABLE_ASSERTIONS=ON), this is appears to be causing some bad behavior, sometimes (there's a tiny race condition?). If I start up a program under LLDB and then give it a 'run' command (or set a breakpoint on 'main' and try to run), I get an error to the effect that the process is already running:

$ ./bin/lldb ~/c++-tests/fibonacci-bad
(lldb) target create "/usr/local/google/home/cmtice/c++-tests/fibonacci-bad"
Current executable set to '/usr/local/google/home/cmtice/c++-tests/fibonacci-bad' (x86_64).
(lldb) run
error: process resume at entry point failed: Resume request failed - process still running.
(lldb) quit
$ ./bin/lldb ~/c++-tests/fibonacci-bad
(lldb) target create "/usr/local/google/home/cmtice/c++-tests/fibonacci-bad"
Current executable set to '/usr/local/google/home/cmtice/c++-tests/fibonacci-bad' (x86_64).
(lldb) b main
Breakpoint 1: where = fibonacci-bad`main + 22 at fibonacci-bad.cc:45:12, address = 0x00000000000012a6
(lldb) run
error: process resume at entry point failed: Resume request failed - process still running.
(lldb) quit
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
$

Sometimes, if I type the 'run' command very fast I don't get the error, I get normal execution. If I revert this change (locally) I no longer see this 'bad' behavior.

@JDevlieghere
Copy link
Member Author

Thanks for the report, @cmtice. I haven't seen this issue and I'm pretty baffled. I wonder if it has something to do with the implementation of GetRunLock:

ProcessRunLock &Process::GetRunLock() {
  if (m_private_state_thread.EqualsThread(Host::GetCurrentThread()))
    return m_private_run_lock;
  else
    return m_public_run_lock;
}

I presume that somehow we end up in situation where one is locked and the other is not. I'll try to do some investigating.

@labath
Copy link
Collaborator

labath commented Apr 11, 2025

I can see this too, and can reproduce it pretty much 100% of the time. The GetRunLock hypothesis seems believable because the process launching code (Process::LaunchPrivate) is mucking around with the private state thread.

It's also possible this doesn't reproduce on darwin because it goes through a different launching code path (basically it never launches, only attaches).

I think we'll need to revert this for now, as it makes it more or less impossible to have a normal debug session (things actually work if you type "continue" after this error, but there's no indication that that is an option).

JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Apr 11, 2025
…mbol context (llvm#134757)"

This reverts commit e84a804 because on
Linux there seems to be a race around GetRunLock.
@JDevlieghere
Copy link
Member Author

I put up a revert in #135408. I'm going to see if I can reproduce this myself on Linux. I want to see what happens if the statusline uses the public API mutex, I don't think it needs the private one at all. If I can't reproduce I might ask one of you to try out a small patch.

@JDevlieghere
Copy link
Member Author

JDevlieghere commented Apr 11, 2025

I assume something else changed in the meantime (maybe the extra locking added in #134759?) but now if I revert this change I always crash on run:

* thread #51, name = '<lldb.process.internal-state(pid=31609)>', stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffed77f38f8)
    frame #0: 0x000000011332b66c liblldb.21.0.0git.dylib`lldb_private::DynamicRegisterInfo::IsReconfigurable(this=0xfffffffed77f383f) at DynamicRegisterInfo.cpp:682:55 [opt]
    frame #1: 0x00000001136046a8 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ThreadGDBRemote::ThreadGDBRemote(this=0x0000000121704098, process=0x000000012880b400, tid=<unavailable>) at ThreadGDBRemote.cpp:48:40 [opt]
    frame #2: 0x00000001135ec940 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] void std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>::construct[abi:nn190102]<lldb_private::process_gdb_remote::ThreadGDBRemote, lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&>(this=<unavailable>, __p=0x0000000121704098, __args=0x000000012880b400, __args=<unavailable>) at allocator.h:165:24 [opt]
    frame #3: 0x00000001135ec930 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] void std::__1::allocator_traits<std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>>::construct[abi:nn190102]<lldb_private::process_gdb_remote::ThreadGDBRemote, lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&, 0>(__a=<unavailable>, __p=0x0000000121704098, __args=0x000000012880b400, __args=<unavailable>) at allocator_traits.h:319:9 [opt]
  * frame #4: 0x00000001135ec930 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] std::__1::__shared_ptr_emplace<lldb_private::process_gdb_remote::ThreadGDBRemote, std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>>::__shared_ptr_emplace[abi:nn190102]<lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&, std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>, 0>(this=0x0000000121704080, __a=<unavailable>, __args=0x000000012880b400, __args=<unavailable>) at shared_ptr.h:266:5 [opt]
    frame #5: 0x00000001135ec918 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] std::__1::__shared_ptr_emplace<lldb_private::process_gdb_remote::ThreadGDBRemote, std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>>::__shared_ptr_emplace[abi:nn190102]<lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&, std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>, 0>(this=0x0000000121704080, __a=<unavailable>, __args=0x000000012880b400, __args=<unavailable>) at shared_ptr.h:263:115 [opt]
    frame #6: 0x00000001135ec918 liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] std::__1::shared_ptr<lldb_private::process_gdb_remote::ThreadGDBRemote> std::__1::allocate_shared[abi:nn190102]<lldb_private::process_gdb_remote::ThreadGDBRemote, std::__1::allocator<lldb_private::process_gdb_remote::ThreadGDBRemote>, lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&, 0>(__a=<unavailable>, __args=0x000000012880b400, __args=<unavailable>) at shared_ptr.h:845:51 [opt]
    frame #7: 0x00000001135ec90c liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(lldb_private::ThreadList&, lldb_private::ThreadList&) [inlined] std::__1::shared_ptr<lldb_private::process_gdb_remote::ThreadGDBRemote> std::__1::make_shared[abi:nn190102]<lldb_private::process_gdb_remote::ThreadGDBRemote, lldb_private::process_gdb_remote::ProcessGDBRemote&, unsigned long long&, 0>(__args=0x000000012880b400, __args=<unavailable>) at shared_ptr.h:853:10 [opt]
    frame #8: 0x00000001135ec90c liblldb.21.0.0git.dylib`lldb_private::process_gdb_remote::ProcessGDBRemote::DoUpdateThreadList(this=0x000000012880b400, old_thread_list=<unavailable>, new_thread_list=0x000000017d1cac28) at ProcessGDBRemote.cpp:1589:21 [opt]
    frame #9: 0x0000000113349880 liblldb.21.0.0git.dylib`lldb_private::Process::UpdateThreadListIfNeeded() [inlined] lldb_private::Process::UpdateThreadList(this=0x000000012880b400, old_thread_list=<unavailable>, new_thread_list=0x000000017d1cac28) at Process.cpp:1124:10 [opt]
    frame #10: 0x0000000113349874 liblldb.21.0.0git.dylib`lldb_private::Process::UpdateThreadListIfNeeded(this=0x000000012880b400) at Process.cpp:1145:11 [opt]
    frame #11: 0x00000001133b4410 liblldb.21.0.0git.dylib`lldb_private::ThreadList::SetShouldReportStop(this=0x000000012880b750, vote=eVoteNo) at ThreadList.cpp:413:13 [opt]
    frame #12: 0x0000000113350190 liblldb.21.0.0git.dylib`lldb_private::Process::AttachCompletionHandler::PerformAction(this=0x0000600000404330, event_sp=<unavailable>) at Process.cpp:2907:32 [opt]
    frame #13: 0x000000011334e858 liblldb.21.0.0git.dylib`lldb_private::Process::HandlePrivateEvent(this=0x000000012880b400, event_sp=std::__1::shared_ptr<lldb_private::Event>::element_type @ 0x00006000038ac900 strong=2 weak=2) at Process.cpp:3941:33 [opt]

Edit: nvm, this is unrelated and caused by 2fd860c1f559c0b0be66cc000e38270a04d0a1a3

JDevlieghere added a commit that referenced this pull request Apr 11, 2025
…mbol context (#134757)" (#135408)

This reverts commit e84a804 because on
Linux there seems to be a race around GetRunLock. See #134757 for more
context.
JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Apr 11, 2025
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the
call to ProcessRunLock::TrySetRunning. When that fails, we get a
somewhat misleading error message:

> process resume at entry point failed: Resume request failed - process still running.

This is incorrect: the problem was not that the process was in a
running state, but rather that the RunLock was being held by another
thread (i.e. the Statusline). TrySetRunning would return false in both
cases and the call site only accounted for the former.

Besides the odd semantics, the current implementation is inherently
race-y and I believe incorrect. If someone is holding the RunLock, the
resume call should block, rather than give up, and with the lock held,
switch the running state and report the old running state.

This patch removes ProcessRunLock::TrySetRunning and updates all callers
to use ProcessRunLock::SetRunning instead. To support that,
ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for
consistency) now report  whether the process was stopped or running
respectively. Previously, both methods returned true unconditionally.

The old code has been around pretty much pretty much forever, there's
nothing in the git history to indicate that this was done purposely to
solve a particular issue. I've tested this on both Linux and macOS and
confirmed that this solves the statusline issue.

A big thank you to Jim for reviewing my proposed solution offline and
trying to poke holes in it.
JDevlieghere added a commit that referenced this pull request Apr 14, 2025
I traced the issue reported by Caroline and Pavel in #134757 back to the
call to ProcessRunLock::TrySetRunning. When that fails, we get a
somewhat misleading error message:

> process resume at entry point failed: Resume request failed - process
still running.

This is incorrect: the problem was not that the process was in a running
state, but rather that the RunLock was being held by another thread
(i.e. the Statusline). TrySetRunning would return false in both cases
and the call site only accounted for the former.

Besides the odd semantics, the current implementation is inherently
race-y and I believe incorrect. If someone is holding the RunLock, the
resume call should block, rather than give up, and with the lock held,
switch the running state and report the old running state.

This patch removes ProcessRunLock::TrySetRunning and updates all callers
to use ProcessRunLock::SetRunning instead. To support that,
ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for
consistency) now report whether the process was stopped or running
respectively. Previously, both methods returned true unconditionally.

The old code has been around pretty much pretty much forever, there's
nothing in the git history to indicate that this was done purposely to
solve a particular issue. I've tested this on both Linux and macOS and
confirmed that this solves the statusline issue.

A big thank you to Jim for reviewing my proposed solution offline and
trying to poke holes in it.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…text (llvm#134757)

Make sure the process is stopped when computing the symbol context. Both
Adrian and Felipe reported a handful of crashes in GetSymbolContext
called from Statusline::Redraw on the default event thread.

Given that we're handling a StackFrameSP, it's not clear to me how that
could have gotten invalidated, but Jim points out that it doesn't make
sense to compute the symbol context for the frame when the process isn't
stopped.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…mbol context (llvm#134757)" (llvm#135408)

This reverts commit e84a804 because on
Linux there seems to be a race around GetRunLock. See llvm#134757 for more
context.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the
call to ProcessRunLock::TrySetRunning. When that fails, we get a
somewhat misleading error message:

> process resume at entry point failed: Resume request failed - process
still running.

This is incorrect: the problem was not that the process was in a running
state, but rather that the RunLock was being held by another thread
(i.e. the Statusline). TrySetRunning would return false in both cases
and the call site only accounted for the former.

Besides the odd semantics, the current implementation is inherently
race-y and I believe incorrect. If someone is holding the RunLock, the
resume call should block, rather than give up, and with the lock held,
switch the running state and report the old running state.

This patch removes ProcessRunLock::TrySetRunning and updates all callers
to use ProcessRunLock::SetRunning instead. To support that,
ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for
consistency) now report whether the process was stopped or running
respectively. Previously, both methods returned true unconditionally.

The old code has been around pretty much pretty much forever, there's
nothing in the git history to indicate that this was done purposely to
solve a particular issue. I've tested this on both Linux and macOS and
confirmed that this solves the statusline issue.

A big thank you to Jim for reviewing my proposed solution offline and
trying to poke holes in it.
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 22, 2025
I traced the issue reported by Caroline and Pavel in llvm#134757 back to the
call to ProcessRunLock::TrySetRunning. When that fails, we get a
somewhat misleading error message:

> process resume at entry point failed: Resume request failed - process
still running.

This is incorrect: the problem was not that the process was in a running
state, but rather that the RunLock was being held by another thread
(i.e. the Statusline). TrySetRunning would return false in both cases
and the call site only accounted for the former.

Besides the odd semantics, the current implementation is inherently
race-y and I believe incorrect. If someone is holding the RunLock, the
resume call should block, rather than give up, and with the lock held,
switch the running state and report the old running state.

This patch removes ProcessRunLock::TrySetRunning and updates all callers
to use ProcessRunLock::SetRunning instead. To support that,
ProcessRunLock::SetRunning (and ProcessRunLock::SetStopped, for
consistency) now report whether the process was stopped or running
respectively. Previously, both methods returned true unconditionally.

The old code has been around pretty much pretty much forever, there's
nothing in the git history to indicate that this was done purposely to
solve a particular issue. I've tested this on both Linux and macOS and
confirmed that this solves the statusline issue.

A big thank you to Jim for reviewing my proposed solution offline and
trying to poke holes in it.

(cherry picked from commit d792094)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants