Description
A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to improve std::sync::Mutex
, std::sync::Condvar
, and std::sync::RwLock
.
On most platforms, these structures are currently wrappers around their pthread
equivalent, such as pthread_mutex_t
. These types are not movable, however, forcing us to wrap them in a Box
, resulting in an allocation and indirection for our lock types. This also gets in the way of a const
constructor for these types, which makes static
locks more complicated than necessary.
@Amanieu presented his library, parking_lot
, which implements the 'parking lot' structure from WebKit. Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.
There has been an attempt at merging parking_lot
into std, as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also my RustConf talk for some of this history.) Many of these blockers have been resolved since.
One of the problems with replacing std's lock implementations by parking_lot
is that parking_lot
allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.
A possible solution is to use a static
table with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such as mmap
on Unix. (See this thread for some discussion.)
In our meeting, we discussed what's natively available on various operating systems/platforms:
-
On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and
const
constructible. Just for Windows, there does not seem much advantage to switching toparking_lot
, although there might be some (small) performance difference. -
On both Linux and some BSDs there's a native
futex()
syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality is available in Wasm. While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, andconst
constructible. -
On macOS and some other Unixes there doesn't seem to be anything that's similar futex syscall. (
parking_lot
usespthread
for its thread parking operations on those platforms.) -
Some smaller platforms that are (partially) supported by the standard library have their own non-trivial implementation of the standard locks, which are hard to maintain and have not gotten much validation.
After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and const
constructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, a parking_lot
-based implementation using one of the possible workarounds for the allocation problem.
This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.
It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement a thread parker and can rely on a performant and correct implementation of the standard locks on top of that.
Update: We've decided to not directly use parking lot's one-byte (two bit) locks, but instead use the equivalent of its internal WordLock
or Windows' SRW locks, which are one pointer in size and require no global state in the program. That solves the allocation problem.
Update 2: To maintain priority inheritance of the native locks (e.g. on macOS), we've kept using pthread locks on non-futex unix platforms, at least for now. Using #97647, we've made it possible for the locks to have const fn new
.
Primary goals:
- Efficient non-allocating locks
- Windows
- Linux
-
macOS- We can't avoid pthread if we want to keep features of the OS' native locks, like priority iniheritance.
- {Free, Open}BSD
- Turn
std::sync::{Mutex, Condvar, RwLock}::new
intoconst
functions: Make {Mutex, Condvar, RwLock}::new() const. #97791 - Resolve some bugs around pthreads
Possible later goals:
- Add a
Condvar::wait_rwlock
to makeCondvar
usable withRwLock
s? - Allow
Send
ingMutexGuard
s to other threads?
To do:
- Relax
Condvar
requirements to allow for unboxed mutexes. Relax promises about condition variable. #76932 - Use unboxed SRW locks on Windows.
- Make Microsoft promise that SRW locks are indeed movable. Clarify destruction of SRW locks and condition variables. MicrosoftDocs/sdk-api#447
- Small cleanups in Windows Mutex. #76645
- Refactor
sys_common::Mutex
to have a separateMovableMutex
type to allow unboxing on some platforms. Split sys_common::Mutex in StaticMutex and MovableMutex. #77147 - Remove the
Box
es inMutex
andCondvar
on Windows and Wasm. Unbox mutexes and condvars on some platforms #77380 - Remove the
Box
fromRwLock
on Windows and Wasm. Multiple improvements to RwLocks #84687 - (Start using
WaitOnAddress
andNtWaitForKeyedEvent
in the standard library.) Add fast WaitOnAddress-based thread parker for Windows. #77618- This unblocks usage of these APIs in
parking_lot
if we want to use that on Windows too. But if we just use SRW locks, this was not necessary to unblock the lock improvements.
- This unblocks usage of these APIs in
- Use futex-based locks on Linux.
- Start using the futex syscall to get some experience with it in the standard library. Use futex-based thread::park/unpark on Linux. #76919
- Implement the
futex()
syscall in Miri to allow Miri to run standard library tests. Implement futex_wait and futex_wake. miri#1568- Also implement the bitset futex operations in Miri. Add support for FUTEX_{WAIT,WAKE}_BITSET miri#2054
- Implement
Mutex
andCondvar
usingfutex()
.- Design these.
- Experimentation: https://github.com/m-ou-se/futex-lock-experiment/
- Investigate other implementations (glibc, musl, Windows, etc.)
- musl: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- glibc
- Boost: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- Windows' SRW locks: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- Wine's SRW locks: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- Apple Darwin libpthread: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- FreeBSD's libpthread: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- Make some design decisions: Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)
- Implementation: Replace Linux Mutex and Condvar with futex based ones. #95035
- Design these.
- Implement
ReentrantMutex
usingfutex()
: Replace ReentrantMutex by a futex-based one on Linux. #95727 - Implement
RwLock
usingfutex()
: Replace RwLock by a futex based one on Linux #95801
- Use the same
ReentrantMutex
on all platforms: Use a single ReentrantMutex implementation on all platforms. #96042 - Use futex-based locks on *BSD.
- Add NetBSD's FUTEX_* constants to the libc crate. Add NetBSD's FUTEX_* constants. libc#2762
- Add OpenBSD's futex() to the libc crate. Add OpenBSD's futex.h. libc#2761
- Add FreeBSD umtx functions and constants to the libc crate. Add umtx_op to FreeBSD. libc#2770
- Add DragonFlyBSD's umtx_* functions to the libc crate. Add DragonFly umtx_{sleep, wakeup}. libc#2763
- Switch *BSD over to the same futex locks as on Linux. Use futex-based locks and thread parker on {Free, Open, DragonFly}BSD. #96510
- Use futex based locks on all other platforms that support a futex-like API.
- Use futex based locks on Emscripten. Use futex locks on emscripten. #96205
- Use
atomic.wait
/atomic.notify
based locks on Wasm. Use sys::unix::locks::futex* on wasm+atomics. #96206 - Use zx_futex_wait based locks on Fuchsia.
- (Tier 2 platform.)
- Use syscall::call::futex based locks on Redox.
- (Tier 3 platform.)
- Use futexes on Hermit. Use futex-based locks and thread parker on Hermit #101475
- Lazily allocate on platforms where we still use pthread: Lazily allocate and initialize pthread locks. #97647
- macOS
- NetBSD, other unix
- Make the
new
functionsconst
. Make {Mutex, Condvar, RwLock}::new() const. #97791 - [future / nice to have] Use 'WordLock' (aka SRW lock, aka 'user space linked-list queue lock') on other platforms.
- Remove locks from
Instant::now()
, to avoid a cyclic dependency. - Stop using
std::sync::{Mutex, Condvar}
in std::sys's thread parker, to avoid a cyclic dependency. std: directly use pthread in UNIX parker implementation #96393 - Implement this type of locks, and replace the lock implementations of non-futex platforms by it. (Replace pthread
RwLock
with custom implementation #110211)
- Remove locks from
- Mark this issue as fixed: Destroying locked Mutex in libstd triggers miri in safe code #85434
-
Write some blog post about all this.(Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment)) - Celebrate. 🎉