Skip to content

Add lightweight locking C API #108724

Closed
Closed
@colesbury

Description

@colesbury

Feature or enhancement

Implementing PEP 703 will require adding additional fine grained locks and other synchronization mechanisms. For good performance, it's important that these locks be "lightweight" in the sense that they don't take up much space and don't require memory allocations to create. Additionally, it's important that these locks are fast in the common uncontended case, perform reasonably under contention, and avoid thread starvation.

Platform provided mutexes like pthread_mutex_t are large (40 bytes on x86-64 Linux) and our current cross-platform wrappers ([1], [2], [3]) require additional memory allocations.

I'm proposing a lightweight mutex (PyMutex) along with internal-only APIs used for building an efficient PyMutex as well as other synchronization primitives. The design is based on WebKit's WTF::Lock and WTF::ParkingLot, which is described in detail in the Locking in WebKit blog post. (The design has also been ported to Rust in the parking_lot crate.)

Public API

The public API (in Include/cpython) would provide a PyMutex that occupies one byte and can be zero-initialized:

typedef struct PyMutex { uint8_t state; } PyMutex;
void PyMutex_Lock(PyMutex *m);
void PyMutex_Unlock(PyMutex *m);

I'm proposing making PyMutex public because it's useful in C extensions, such as NumPy, (as opposed to C++) where it can be a pain to wrap cross-platform synchronization primitives.

Internal APIs

The internal only API (in Include/internal) would provide APIs for building PyMutex and other synchronization primitives. The main addition is a compare-and-wait primitive, like Linux's futex or Window's WaitOnAdress.

int _PyParkingLot_Park(const void *address, const void *expected, size_t address_size,
                       _PyTime_t timeout_ns, void *arg, int detach)

The API closely matches WaitOnAddress but with two additions: arg is an optional, arbitrary pointer passed to the wake-up thread and detach indicates whether to release the GIL (or detach in --disable-gil builds) while waiting. The additional arg pointer allows the locks to be only one byte (instead of at least pointer sized), since it allows passing additional (stack allocated) data between the waiting and the waking thread.

The wakeup API looks like:

// wake up all threads waiting on `address`
void _PyParkingLot_UnparkAll(const void *address);

// or wake up a single thread
_PyParkingLot_Unpark(address, unpark, {
    // code here is executed after the thread to be woken up is identified but before we wake it up
    void *arg = unpark->arg;
    int more_waiters = unpark->more_waiters;
    ...
});

_PyParkingLot_Unpark is currently a macro that takes a code block. For PyMutex we need to update the mutex bits after we identify the thread but before we actually wake it up.

cc @ericsnowcurrently

Linked PRs

Metadata

Metadata

Assignees

Labels

3.13bugs and security fixestopic-C-APItype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions