Skip to content

TSan: async signals are never being delivered when the target thread is blocked waiting for a mutex lock #83561

Closed
@canova

Description

@canova

Here's my test case: Compiler explorer

It looks like tsan delays dispatching signals until it finds a blocking function. In Firefox, we have a sampling profiler that sends SIGPROF signals every interval and we have a mechanism to wait for the SIGPROF signal to finish some work with semaphores. This example test case above describes our situation.

There are 2 threads, let's call them "thread 1" and "thread 2", and "thread 2" sends a SIGPROF signal to "thread 1" to get some information. At the end of SIGPROF handler, we post a message with sem_post to notify that the work is done. And "thread 2" waits until it gets that semaphore.

It's visualized like this:

Thread 1 Thread 2
1 lock_guard profiler_mutex
2 pthread_mutex_lock for profiler_mutex pthread_kill(*thread_1, SIGPROF)
3 pthread_mutex_lock for profiler_mutex sem_wait(&message) from "thread 1"
4 (deadlock as signal never arrives) (deadlock)

Without TSan, thread 1 gets the SIGPROF signal and does some work, then sends the semaphore. Then thread 2 unlocks the mutex at the end and thread 1 continues by acquiring it. So the execution happens without any deadlocks. But for TSan builds this deadlocks happen frequently.

Also, as you can see in the Compiler explorer example, it works without a problem when TSan is not enabled, but hangs and times out when it's enabled.

While I was investigating I came across this github comment that recommends changing pthread_mutex_lock to a BLOCK_REAL intead of REAL, which fixes this test case.
Would you be open to accepting this change? I'm happy to send a PR if so.

The patch in the PR is fairly outdated and these are not in the same file anymore. Updated patch would be something like this:

diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index a9f6673ac44e..0359bc3581a6 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -1340,7 +1340,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_destroy, void *m) {
 TSAN_INTERCEPTOR(int, pthread_mutex_lock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(pthread_mutex_lock, m);
   MutexPreLock(thr, pc, (uptr)m);
-  int res = REAL(pthread_mutex_lock)(m);
+  int res = BLOCK_REAL(pthread_mutex_lock)(m);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
   if (res == 0 || res == errno_EOWNERDEAD)
@@ -1403,7 +1403,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_clocklock, void *m,
 TSAN_INTERCEPTOR(int, __pthread_mutex_lock, void *m) {
   SCOPED_TSAN_INTERCEPTOR(__pthread_mutex_lock, m);
   MutexPreLock(thr, pc, (uptr)m);
-  int res = REAL(__pthread_mutex_lock)(m);
+  int res = BLOCK_REAL(__pthread_mutex_lock)(m);
   if (res == errno_EOWNERDEAD)
     MutexRepair(thr, pc, (uptr)m);
   if (res == 0 || res == errno_EOWNERDEAD)

But also I have another test case, where "thread 1" doesn't do anything (with just an empty infinite while loop) here: Compiler explorer
This is even harder to fix as there is no blocking function in thread 1, and SIGPROF never arrives because of it. I think they can be fixed separately though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions