Description
Here's my test case: Compiler explorer
It looks like tsan delays dispatching signals until it finds a blocking function. In Firefox, we have a sampling profiler that sends SIGPROF
signals every interval and we have a mechanism to wait for the SIGPROF
signal to finish some work with semaphores. This example test case above describes our situation.
There are 2 threads, let's call them "thread 1" and "thread 2", and "thread 2" sends a SIGPROF
signal to "thread 1" to get some information. At the end of SIGPROF handler, we post a message with sem_post
to notify that the work is done. And "thread 2" waits until it gets that semaphore.
It's visualized like this:
Thread 1 | Thread 2 | |
---|---|---|
1 | lock_guard profiler_mutex |
|
2 | pthread_mutex_lock for profiler_mutex |
pthread_kill(*thread_1, SIGPROF) |
3 | pthread_mutex_lock for profiler_mutex |
sem_wait(&message) from "thread 1" |
4 | (deadlock as signal never arrives) | (deadlock) |
Without TSan, thread 1 gets the SIGPROF
signal and does some work, then sends the semaphore. Then thread 2 unlocks the mutex at the end and thread 1 continues by acquiring it. So the execution happens without any deadlocks. But for TSan builds this deadlocks happen frequently.
Also, as you can see in the Compiler explorer example, it works without a problem when TSan is not enabled, but hangs and times out when it's enabled.
While I was investigating I came across this github comment that recommends changing pthread_mutex_lock
to a BLOCK_REAL
intead of REAL
, which fixes this test case.
Would you be open to accepting this change? I'm happy to send a PR if so.
The patch in the PR is fairly outdated and these are not in the same file anymore. Updated patch would be something like this:
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index a9f6673ac44e..0359bc3581a6 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -1340,7 +1340,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_destroy, void *m) {
TSAN_INTERCEPTOR(int, pthread_mutex_lock, void *m) {
SCOPED_TSAN_INTERCEPTOR(pthread_mutex_lock, m);
MutexPreLock(thr, pc, (uptr)m);
- int res = REAL(pthread_mutex_lock)(m);
+ int res = BLOCK_REAL(pthread_mutex_lock)(m);
if (res == errno_EOWNERDEAD)
MutexRepair(thr, pc, (uptr)m);
if (res == 0 || res == errno_EOWNERDEAD)
@@ -1403,7 +1403,7 @@ TSAN_INTERCEPTOR(int, pthread_mutex_clocklock, void *m,
TSAN_INTERCEPTOR(int, __pthread_mutex_lock, void *m) {
SCOPED_TSAN_INTERCEPTOR(__pthread_mutex_lock, m);
MutexPreLock(thr, pc, (uptr)m);
- int res = REAL(__pthread_mutex_lock)(m);
+ int res = BLOCK_REAL(__pthread_mutex_lock)(m);
if (res == errno_EOWNERDEAD)
MutexRepair(thr, pc, (uptr)m);
if (res == 0 || res == errno_EOWNERDEAD)
But also I have another test case, where "thread 1" doesn't do anything (with just an empty infinite while loop) here: Compiler explorer
This is even harder to fix as there is no blocking function in thread 1, and SIGPROF
never arrives because of it. I think they can be fixed separately though.