Skip to content

TSan: async signals are never being delivered when the target thread is blocked waiting for a FUTEX_WAIT syscall #83844

@canova

Description

@canova

This issue is very similar to #83561, but the underying problem has different API calls, so I think it makes sense to create another issue for tracking.

Here are my test cases:

This was initially found in Firefox in the Rust codebase. That's why I wanted to create a test case for both Rust and C++.

It looks like tsan delays dispatching signals until it finds a blocking function. In Firefox, we have a sampling profiler that sends SIGPROF signals every interval and we have a mechanism to wait for the SIGPROF signal to finish some work with semaphores. This example test cases above describe our situation.

This time it hangs during FUTEX_WAIT syscall because TSan doesn't know that this is a blocking call. Again there are 2 threads involved, "main thread" and the "thread 1". Main thread sets up the profiler signal handler, locks the futex, and creates the 'thread 1". "thread 1" starts to wait for the futex with Mutex.lock() and syscall(SYS_futex, uaddr, FUTEX_WAIT_PRIVATE...). Then we send the signal from the main thread using pthread_kill. Normally without TSan, SigprofHandler gets executed and then it wakes the futex with syscall(SYS_futex, uaddr, FUTEX_WAKE_PRIVATE...). After this, "thread 1" gets unblocked and exits. But with TSan SigprofHandler never gets executed because TSan never thinks that "thread 1" executes a blocking call.

Normal execution:

Main thread Thread 1
1 Mutex.lock() (with syscall FUTEX_WAIT)
2 Mutex.lock() (with syscall FUTEX_WAIT)
3 pthread_kill(...SIGPROF)
4 mutex.unlock() inside SigprofHandler
5 mutex.lock() inside Thread1
6 Exits
7 pthread_join
7 (done)

TSan execution:

Main thread Thread 1
1 Mutex.lock() (with syscall FUTEX_WAIT)
2 Mutex.lock() (with syscall FUTEX_WAIT)
3 pthread_kill(...SIGPROF)
4 (SigprofHandler never gets executed because
TSan thinks we are not in a blocking call)
5 (deadlock) (deadlock since futex never unlocks)

I believe this can be solved similar to #83561, but I'm not so sure how yet as this is a syscall. There should be a way to intercept syscalls I assume?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions