Description
Feature or enhancement
The object_lookup_special
microbenchmark in Tools/ftscalingbench/ftscalingbench.py
currently doesn't scale well and is indicative of a broader FT performance issue that we should fix. The benchmark just calls round()
from multiple threads concurrently:
cpython/Tools/ftscalingbench/ftscalingbench.py
Lines 62 to 66 in 56d0f9a
The issue is that round()
calls _PyObject_LookupSpecial(number, &_Py_ID(__round__))
, which increments the reference count of the returned function (i.e., of float.round
). The underlying function supports deferred reference counting, but _PyObject_LookupSpecial
and _PyType_LookupRef
do not take advantage of it.
For the FT build, we also need some extra support in order to safely use _PyStackRef
in builtin_round_impl
, because it's important that all _PyStackRef
s are visible to the GC. To support this, we can add a singly linked list of active _PyStackRef
s to _PyThreadStateImpl
.
The struct _PyCStackRef
implements this linked list pointer + a _PyStackRef
. In the GIL-enabled build, there's no linked list and it's essentially the same as _PyStackRef
.
// A stackref that can be stored in a regular C local variable and be visible
// to the GC in the free threading build.
// Used in combination with _PyThreadState_PushCStackRef().
typedef struct _PyCStackRef {
_PyStackRef ref;
#ifdef Py_GIL_DISABLED
struct _PyCStackRef *next;
#endif
} _PyCStackRef;
struct _PyThreadStateImpl {
...
// Linked list (stack) of active _PyCStackRef
struct _PyCStackRef *c_stack_refs;
...
}
static inline void _PyThreadState_PushCStackRef(PyThreadState *tstate, _PyCStackRef *ref) { ... }
static inline void _PyThreadState_PopCStackRef(PyThreadState *tstate, _PyCStackRef *ref) { ... }