-
Notifications
You must be signed in to change notification settings - Fork 13.3k
std: use futex-based locks on Fuchsia #98707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f7ae92c
std: use futex-based locks on Fuchsia
joboet 0d91b08
std: fix issue with perma-locked mutexes on Fuchsia
joboet f357926
std: panic instead of deadlocking in mutex implementation on Fuchsia
joboet c72a77e
owner is not micro (correct typo)
joboet 8ba02f1
remove unused import
joboet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
//! A priority inheriting mutex for Fuchsia. | ||
//! | ||
//! This is a port of the [mutex in Fuchsia's libsync]. Contrary to the original, | ||
//! it does not abort the process when reentrant locking is detected, but deadlocks. | ||
//! | ||
//! Priority inheritance is achieved by storing the owning thread's handle in an | ||
//! atomic variable. Fuchsia's futex operations support setting an owner thread | ||
//! for a futex, which can boost that thread's priority while the futex is waited | ||
//! upon. | ||
//! | ||
//! libsync is licenced under the following BSD-style licence: | ||
//! | ||
//! Copyright 2016 The Fuchsia Authors. | ||
//! | ||
//! Redistribution and use in source and binary forms, with or without | ||
//! modification, are permitted provided that the following conditions are | ||
//! met: | ||
//! | ||
//! * Redistributions of source code must retain the above copyright | ||
//! notice, this list of conditions and the following disclaimer. | ||
//! * Redistributions in binary form must reproduce the above | ||
//! copyright notice, this list of conditions and the following | ||
//! disclaimer in the documentation and/or other materials provided | ||
//! with the distribution. | ||
//! | ||
//! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
//! "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
//! LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
//! A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
//! OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
//! SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
//! LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
//! DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
//! THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
//! (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
//! OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
//! | ||
//! [mutex in Fuchsia's libsync]: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/system/ulib/sync/mutex.c | ||
|
||
use crate::sync::atomic::{ | ||
AtomicU32, | ||
Ordering::{Acquire, Relaxed, Release}, | ||
}; | ||
use crate::sys::futex::zircon::{ | ||
zx_futex_wait, zx_futex_wake_single_owner, zx_handle_t, zx_thread_self, ZX_ERR_BAD_HANDLE, | ||
ZX_ERR_BAD_STATE, ZX_ERR_INVALID_ARGS, ZX_ERR_TIMED_OUT, ZX_ERR_WRONG_TYPE, ZX_OK, | ||
ZX_TIME_INFINITE, | ||
}; | ||
|
||
// The lowest two bits of a `zx_handle_t` are always set, so the lowest bit is used to mark the | ||
// mutex as contested by clearing it. | ||
const CONTESTED_BIT: u32 = 1; | ||
// This can never be a valid `zx_handle_t`. | ||
const UNLOCKED: u32 = 0; | ||
|
||
pub type MovableMutex = Mutex; | ||
|
||
pub struct Mutex { | ||
futex: AtomicU32, | ||
} | ||
|
||
#[inline] | ||
fn to_state(owner: zx_handle_t) -> u32 { | ||
owner | ||
} | ||
|
||
#[inline] | ||
fn to_owner(state: u32) -> zx_handle_t { | ||
state | CONTESTED_BIT | ||
} | ||
|
||
#[inline] | ||
fn is_contested(state: u32) -> bool { | ||
state & CONTESTED_BIT == 0 | ||
} | ||
|
||
#[inline] | ||
fn mark_contested(state: u32) -> u32 { | ||
state & !CONTESTED_BIT | ||
} | ||
|
||
impl Mutex { | ||
#[inline] | ||
pub const fn new() -> Mutex { | ||
Mutex { futex: AtomicU32::new(UNLOCKED) } | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn init(&mut self) {} | ||
|
||
#[inline] | ||
pub unsafe fn try_lock(&self) -> bool { | ||
let thread_self = zx_thread_self(); | ||
self.futex.compare_exchange(UNLOCKED, to_state(thread_self), Acquire, Relaxed).is_ok() | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn lock(&self) { | ||
let thread_self = zx_thread_self(); | ||
if let Err(state) = | ||
self.futex.compare_exchange(UNLOCKED, to_state(thread_self), Acquire, Relaxed) | ||
{ | ||
self.lock_contested(state, thread_self); | ||
} | ||
} | ||
|
||
#[cold] | ||
fn lock_contested(&self, mut state: u32, thread_self: zx_handle_t) { | ||
let owned_state = mark_contested(to_state(thread_self)); | ||
loop { | ||
// Mark the mutex as contested if it is not already. | ||
let contested = mark_contested(state); | ||
if is_contested(state) | ||
|| self.futex.compare_exchange(state, contested, Relaxed, Relaxed).is_ok() | ||
{ | ||
// The mutex has been marked as contested, wait for the state to change. | ||
unsafe { | ||
match zx_futex_wait( | ||
&self.futex, | ||
AtomicU32::new(contested), | ||
to_owner(state), | ||
ZX_TIME_INFINITE, | ||
) { | ||
ZX_OK | ZX_ERR_BAD_STATE | ZX_ERR_TIMED_OUT => (), | ||
// Note that if a thread handle is reused after its associated thread | ||
// exits without unlocking the mutex, an arbitrary thread's priority | ||
// could be boosted by the wait, but there is currently no way to | ||
// prevent that. | ||
ZX_ERR_INVALID_ARGS | ZX_ERR_BAD_HANDLE | ZX_ERR_WRONG_TYPE => { | ||
panic!( | ||
"either the current thread is trying to lock a mutex it has | ||
already locked, or the previous owner did not unlock the mutex | ||
before exiting" | ||
) | ||
} | ||
error => panic!("unexpected error in zx_futex_wait: {error}"), | ||
} | ||
} | ||
} | ||
|
||
// The state has changed or a wakeup occured, try to lock the mutex. | ||
match self.futex.compare_exchange(UNLOCKED, owned_state, Acquire, Relaxed) { | ||
Ok(_) => return, | ||
Err(updated) => state = updated, | ||
} | ||
} | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn unlock(&self) { | ||
if is_contested(self.futex.swap(UNLOCKED, Release)) { | ||
// The woken thread will mark the mutex as contested again, | ||
// and return here, waking until there are no waiters left, | ||
// in which case this is a noop. | ||
self.wake(); | ||
} | ||
} | ||
|
||
#[cold] | ||
fn wake(&self) { | ||
unsafe { | ||
zx_futex_wake_single_owner(&self.futex); | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
use super::Mutex; | ||
use crate::sync::atomic::{AtomicU32, Ordering::Relaxed}; | ||
use crate::sys::futex::{futex_wait, futex_wake, futex_wake_all}; | ||
use crate::time::Duration; | ||
|
||
pub type MovableCondvar = Condvar; | ||
|
||
pub struct Condvar { | ||
// The value of this atomic is simply incremented on every notification. | ||
// This is used by `.wait()` to not miss any notifications after | ||
// unlocking the mutex and before waiting for notifications. | ||
futex: AtomicU32, | ||
} | ||
|
||
impl Condvar { | ||
#[inline] | ||
pub const fn new() -> Self { | ||
Self { futex: AtomicU32::new(0) } | ||
} | ||
|
||
// All the memory orderings here are `Relaxed`, | ||
// because synchronization is done by unlocking and locking the mutex. | ||
|
||
pub unsafe fn notify_one(&self) { | ||
self.futex.fetch_add(1, Relaxed); | ||
futex_wake(&self.futex); | ||
} | ||
|
||
pub unsafe fn notify_all(&self) { | ||
self.futex.fetch_add(1, Relaxed); | ||
futex_wake_all(&self.futex); | ||
} | ||
|
||
pub unsafe fn wait(&self, mutex: &Mutex) { | ||
self.wait_optional_timeout(mutex, None); | ||
} | ||
|
||
pub unsafe fn wait_timeout(&self, mutex: &Mutex, timeout: Duration) -> bool { | ||
self.wait_optional_timeout(mutex, Some(timeout)) | ||
} | ||
|
||
unsafe fn wait_optional_timeout(&self, mutex: &Mutex, timeout: Option<Duration>) -> bool { | ||
// Examine the notification counter _before_ we unlock the mutex. | ||
let futex_value = self.futex.load(Relaxed); | ||
|
||
// Unlock the mutex before going to sleep. | ||
mutex.unlock(); | ||
|
||
// Wait, but only if there hasn't been any | ||
// notification since we unlocked the mutex. | ||
let r = futex_wait(&self.futex, futex_value, timeout); | ||
|
||
// Lock the mutex again. | ||
mutex.lock(); | ||
|
||
r | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it should be noted that the current libsync condvar implementation is not in great shape, and needs to be revisited/simplified. It probably is not the best example to be following.
This said: beware the thundering herd.
If there are 100 threads waiting for the condition to be signaled, and they are all woken up at once, one of them is going to grab the mutex, and all of the others are going to immediately block behind whichever thread made it into the mutex first. This type of thrashing is not going to be particularly efficient.
To solve this problem, we have a tool similar to what other OSes have;
zx_futex_requeue
(andzx_futex_requeue_single_owner
)Requeue takes two futexes (the "wake" futext and the "requeue" futex) and two counts (the wake_count and the requeue_count). It will logically wake up to wake_count threads from the wake futex, and then (if there are still waiters remaining) move up to requeue_count waiters from wake -> requeue.
When applied to a condvar, the notify_all operation becomes a requeue(condvar_futex, 1, mutex_futex, 0xFFFFFFFF). Basically, wake up only one thread, and place all of the remaining threads into the mutex futex wait queue (IOW - just proactively assume that those threads are now blocking on the lock).
If the mutex here implements PI, then requeue_single_owner can be used instead. It is supposed to wake a single thread from the wait mutex, then move the specified number of threads from wait -> requeue, and finally assign ownership of the requeue target to the woken thread.
Note, the docs on this are either wrong, or the implementation is wrong (https://fuchsia.dev/fuchsia-src/reference/syscalls/futex_requeue_single_owner?hl=en). It claims that the wake futex is the futex whose ownership gets assigned, when it should be the requeue futex (as this is the futex representing the lock, not the notification state). I'm going to file a bug about this and look into it.
In addition to avoiding the thundering herd in general, using requeue allows the scheduler to make better choices. The scheduler can choose to wake the "most important" thread from the futex's blocking queue first, and requeue the rest. If all of threads are simply woken and assigned to different CPUs, the "most important" thread might end up losing the mutex race and will end up blocking again when it really should be running.
Also note that to make this work, the notification futex and the lock futex must logically be fused together. The API should not allow users to make the mistake of failing to acquire the mutex after being notified. Something like
The implementation here does not have a mutex specifically associated with the condvar, meaning that users could accidentally pass different mutexes to the wait operation, and prevents the use of requeue (since it is unclear which lock needs to dropped during a notify operation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've looked into this a bit more. The re-queue operations as defined really do not seem to be all that helpful if the goal is to implement a condvar whose associated lock implements PI. I'm going to need to take some time to sort this out; and may need to go through the full RFC process in order to make a change to the API which fixes the issue. In the meantime, I'll offer a few reasonable paths forward for this code.
Option 1: Do nothing.
Leave the existing thundering herd behavior in place, and for now, simply assume that having a large number of waiters will be uncommon. I still think that it would be a a good idea to add a mutex to the condvar object itself, and demand that users hold this specific lock when waiting. It would also be good to leave a comment in https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=104478 saying that we should come back and fix up the Rust implementation once the underlying syscall definitions have been fixed.
Option 2: Just use requeue, and ignore requeue_single_owner.
If we can make the mutex used with the condvar a property of the condvar object itself, it allows us to address the issue in a less than perfect way, but come back later on and make it better. The idea here is we would:
Now, notify_all_and_release can become
This will preserve the goal of avoiding the herd, and also implement PI in the lock. The downside is that it will cost two syscalls instead of one. Once the futex requeue API is improved, this can be dropped back down to just a single call.
Sorry about this☹️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Condvar
implementation is already used on Linux, WASM and some BSDs. This PR just moves it to a different file so it can be shared by Fuchsia.For these systems, the library team decided not to requeue, but if you feel it is important, I can specialize the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more important if you frequently have large numbers of waiters waiting on the condvar. If most of your users tend to only have one waiter on the condition, then it is certainly fine as it is. If N tends to be low, but not one, then it becomes more likely that there is some lock thrash. While this may be an issue, it may not be a super serious one.
TL;DR - This was just a suggestion. You know your users and their patterns better than I do, so feel free to continue doing it as you are. If you ever encounter a situation where the herd becomes a serious issue for one of your users, you can always come back and change course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's hard to optimize for every possible use case at once. My assumption is that it's quite uncommon to notify many waiters at once in programs optimized for performance, since it's a bit of an anti-pattern regardless of requeuing. A requeuing implementation just means that the threads will more efficiently wait in line, but their work still ends up being serialized, which arguably defeats the point of paralellization.
I looked a bit through use cases of notify_all() on crates.io to validate my assumptions, but am happy to consider examples that support an argument in favor of requeueing.