gh-116909: fix data race with versions in typeobject #134651

duaneg · 2025-05-25T12:02:36Z

Global state _PyRuntime.types.next_version_tag is being accessed without synchronization or atomics. This could potentially result in the same version being used in two different type objects, incrementing past the maximum limit, and the usual non-atomic memory access issues.

Fix this by using atomics to ensure the version is accessed and updated in a race-free manner, while also ensuring it is never incremented past the expected (maximum + 1).

Note there is a theoretical change in behaviour with the second use, for static builtin types, if their versions exceed the maximum, with assertions disabled: previously they would have continued incrementing past the maximum and using the increasing version number; now they will all use the maximum. It might be better to replace the assert with an abort(): I assume we would be crashing shortly if we continue after this, both before and after this change.

Issue: TSan: data race with PyTypeObject version tag #116909

Global state `_PyRuntime.types.next_version_tag` is being accessed without synchronization or atomics. This could potentially result in the same version being used in two different type objects, incrementing past the maximum limit, and the usual non-atomic memory access issues. Fix this by using atomics to ensure the version is accessed and updated in a race-free manner, while also ensuring it is never incremented past the expected (maximum + 1). Note there is a theoretical change in behaviour with the second use, for static builtin types, if their versions exceed the maximum, with assertions disabled: previously they would have continued incrementing past the maximum and using the increasing version number; now they will all use the maximum. It might be better to replace the `assert` with an `abort()`: I assume we would be crashing shortly if we continue after this, both before and after this change.

ZeroIntensity

Thanks for doing this. I'm not sure we're able to skip the subinterpreter test yet, because there are still races when doing non-atomic stores of the version on static types. I've looked into fixing those myself, and it's signficantly more complex.

This change may fix one issue, but others remain

duaneg · 2025-05-26T04:55:00Z

Thanks for doing this

My pleasure!

I'm not sure we're able to skip the subinterpreter test yet, because there are still races when doing non-atomic stores of the version on static types. I've looked into fixing those myself, and it's signficantly more complex.

Hmm, yes. I had tested running it with thread sanitizer enabled, with-and-without the GIL, and didn't trigger any data race reports. However, adding a new test specifically to try and exercise static type initialization indeed shows a bunch of problems remain. Not only with version, but various other accesses (most often tp_flags).

I suppose those problems may also be hit with the existing test, depending on timing. I'll drop re-enabling that test for now. I might come back and try and fix some of these issues separately, if I have time and no-one else gets there first.

For reference, the test I used:

    @requires_subinterpreters
    @threading_helper.requires_working_threading()
    def test_static_type_initialization(self):

        # This is specifically testing static type initialization thread safety
        # and is intended to be run with the thread-santizer enabled
        def init(barrier):
            interpid = _interpreters.create()
            barrier.wait()
            try:
                _interpreters.run_string(interpid, "import datetime")
            finally:
                _interpreters.destroy(interpid)

        N = 3
        ITERATIONS = 10
        barrier = threading.Barrier(N)
        for _ in range(ITERATIONS):
            threads = [threading.Thread(target=init, args=(barrier,))
                       for _ in range(N)]
            with threading_helper.start_threads(threads):
                pass

I'm sure using subinterpreters is not strictly required, but they seemed like an easy way to reliably run static type initialisation. Note that if assertions are enabled this test immediately fails:

Objects/typeobject.c:208: managed_static_type_state_init: Assertion `!managed_static_type_index_is_set(self)' failed.

It definitely looks like this is all very racy.

ZeroIntensity

Mostly LGTM

Misc/NEWS.d/next/Core_and_Builtins/2025-05-25-23-55-43.gh-issue-116909.FGbNKx.rst

…e-116909.FGbNKx.rst Co-authored-by: Peter Bierma <[email protected]>

duaneg requested a review from markshannon as a code owner May 25, 2025 12:02

bedevere-app bot added the awaiting review label May 25, 2025

bedevere-app bot mentioned this pull request May 25, 2025

TSan: data race with PyTypeObject version tag #116909

Open

ZeroIntensity reviewed May 25, 2025

View reviewed changes

ZeroIntensity added the topic-subinterpreters label May 25, 2025

duaneg added 3 commits May 26, 2025 16:46

Continue to skip test_isolated_subinterpreter under TSAN

b3a63c8

This change may fix one issue, but others remain

Correct assertion that should always fail if reached

33adfc3

Don't allocate a global type version tag unless required

d9e1c23

ZeroIntensity reviewed May 26, 2025

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2025-05-25-23-55-43.gh-issue-116909.FGbNKx.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Core_and_Builtins/2025-05-25-23-55-43.gh-issu…

1eb121d

…e-116909.FGbNKx.rst Co-authored-by: Peter Bierma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-116909: fix data race with versions in typeobject #134651

gh-116909: fix data race with versions in typeobject #134651

duaneg commented May 25, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

ZeroIntensity left a comment

Uh oh!

duaneg commented May 26, 2025

Uh oh!

ZeroIntensity left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-116909: fix data race with versions in typeobject #134651

Are you sure you want to change the base?

gh-116909: fix data race with versions in typeobject #134651

Conversation

duaneg commented May 25, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

duaneg commented May 26, 2025

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

duaneg commented May 25, 2025 •

edited by bedevere-app bot

Loading