Description
While working on a refactor of std::sync::Once
in #65719 I stumbled across multiple soundness issues. I now believe the current algorithm that queues waiting threads in a linked list with nodes an the stack of each waiting thread is not worth the complexity.
Thread may be parked forever
Edit: this is not a valid concern, see #65796 (comment).
Code:
while !node.signaled.load(Ordering::SeqCst) {
// HERE
thread::park();
}
The thread managing the waiter queue (thread 1) sets node.signaled
and unparkes a thread.
The thread that wants to wait (thread 2) checks node.signaled
before parking itself.
But at HERE thread 1 may set node.signaled
and unpark thread 2 (which is not parked yet). Afterwards thread 2 will park itself, and never receive an unpark again.
This can be solved by using park_timeout
. It does seems suboptimal to me though.
Aliasing of a mutable reference
let me = &mut node as *mut Waiter as usize;
/* ... */
// the mutable reference to me is send to another thread
let old = self.state.compare_and_swap(state, me | RUNNING, Ordering::SeqCst);
/* ... */
// We are at the same time checking `node` ourselves
while !node.signaled.load(Ordering::SeqCst) { /* ... */ }
This can be solved by using shared references and interior mutability.
Use of a potentially dangling shared reference (#55005)
(*queue).signaled.store(true, Ordering::SeqCst);
The waiting thread can free queue
after signaled
is set after a spurious wakeup. At this point store
can in theory still hold a dangling reference.
There is not much the implementation of OnceCell
can do here, but I suppose if will be solved in the same way as #55005.
This reason std::sync::Once
does not go with the obvious implementation that uses a mutex, is because Mutex::new
is not const. This is the explanation in the comments:
// So to implement this, one might first reach for a `Mutex`, but those cannot
// be put into a `static`. It also gets a lot harder with poisoning to figure
// out when the mutex needs to be deallocated because it's not after the closure
// finishes, but after the first successful closure finishes.
//
// All in all, this is instead implemented with atomics and lock-free
// operations! Whee! Each `Once` has one word of atomic state, and this state is
// CAS'd on to determine what to do.
The talk about atomics and lock-free make it sound like this implementation might be more optimal then using a mutex. But the park/unpark machinery currently relies on a mutex+condvar per thread. So instead of using one mutex, it is using as many as there are waiting threads.
Alternative implementation
While also a bit tricky, it is still possible to just use a mutex: encode a reference to an Arc<Mutex>
in Once.state
. The reference will be created when the Once
starts running, and not be dropped until the Once
is dropped.
I'd like to include the fixes in #65719, and make unother PR with an alternative implementation using a mutex. Also a thing to remember is that if #65719 ever lands, the standard library can just use the Once
implementation from parking_lot
.