Skip to content

Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740

Closed
@m-ou-se

Description

@m-ou-se

A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to improve std::sync::Mutex, std::sync::Condvar, and std::sync::RwLock.

On most platforms, these structures are currently wrappers around their pthread equivalent, such as pthread_mutex_t. These types are not movable, however, forcing us to wrap them in a Box, resulting in an allocation and indirection for our lock types. This also gets in the way of a const constructor for these types, which makes static locks more complicated than necessary.

@Amanieu presented his library, parking_lot, which implements the 'parking lot' structure from WebKit. Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.

There has been an attempt at merging parking_lot into std, as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also my RustConf talk for some of this history.) Many of these blockers have been resolved since.

One of the problems with replacing std's lock implementations by parking_lot is that parking_lot allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.

A possible solution is to use a static table with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such as mmap on Unix. (See this thread for some discussion.)

In our meeting, we discussed what's natively available on various operating systems/platforms:

  • On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and const constructible. Just for Windows, there does not seem much advantage to switching to parking_lot, although there might be some (small) performance difference.

  • On both Linux and some BSDs there's a native futex() syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality is available in Wasm. While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, and const constructible.

  • On macOS and some other Unixes there doesn't seem to be anything that's similar futex syscall. (parking_lot uses pthread for its thread parking operations on those platforms.)

  • Some smaller platforms that are (partially) supported by the standard library have their own non-trivial implementation of the standard locks, which are hard to maintain and have not gotten much validation.

After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and const constructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, a parking_lot-based implementation using one of the possible workarounds for the allocation problem.

This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.

It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement a thread parker and can rely on a performant and correct implementation of the standard locks on top of that.

Update: We've decided to not directly use parking lot's one-byte (two bit) locks, but instead use the equivalent of its internal WordLock or Windows' SRW locks, which are one pointer in size and require no global state in the program. That solves the allocation problem.

Update 2: To maintain priority inheritance of the native locks (e.g. on macOS), we've kept using pthread locks on non-futex unix platforms, at least for now. Using #97647, we've made it possible for the locks to have const fn new.


Primary goals:

Possible later goals:

  • Add a Condvar::wait_rwlock to make Condvar usable with RwLocks?
  • Allow Sending MutexGuards to other threads?

To do:

Metadata

Metadata

Labels

A-atomicArea: Atomics, barriers, and sync primitivesA-concurrencyArea: ConcurrencyC-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions