Skip to content

ENV is not thread safe with glibc #34726

Open
@c42f

Description

@c42f

For some reason I was reading our ENV code, and I noticed we don't have a lock around our calls to getenv and setenv on linux, or for iterating the environment via the environ variable.

This makes ENV unsafe for use with multithreaded julia, as getenv and setenv are not mutually threadsafe in glibc. (See https://github.com/bminor/glibc/blob/master/stdlib/getenv.c and https://github.com/bminor/glibc/blob/master/stdlib/setenv.c. Unsurprisingly getenv is safe by itself, but mysteriously setenv is protected by a lock which getenv ignores! So you can mutually setenv safely, but nobody can presume to getenv elsewhere in the same multithreaded code.)

We can easily add some more locks on our side as mitigation, but unfortunately we can't really fix setenv without fixing glibc; a random C library we link against may decide to call getenv at any point. This definitely happens in practice and I've experienced it personally: OpenMathLib/OpenBLAS#716 (comment). There's a nice discussion on the rust issue tracker including this gem: rust-lang/rust#24741 (comment).

Some options:

  1. Mitigation: Put our own lock around getenv/setenv/environ access, and just hope that no C library we link against calls these functions itself. Add a big warning to the docs. This is the current rust approach though it's fragile.
  2. Avoidance: Ban using libc setenv entirely; create a shadow environment which is a copy of the system environment. This is the C# approach but with Julia's tradition of calling into C libraries I doubt that will work out (it creates surprises even in C#; see https://yizhang82.dev/set-environment-variable)
  3. Fix glibc??: Every portable language runtime has this exact problem, so people have tried fixing this in the past. In older times they were met with surprising hostility. The seemingly-current bug is marked suspended https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c2 whatever that means.

For now the only easy / possible thing to do is to clearly option (1): add some locking and a big warning. If glibc could be fixed it could morph into a long term plan. More recent bugs suggest the glibc maintainers may possibly accept a patch.

As a side note, I feel we should consider removing withenv in the future because the shape of that API misrepresents reality. On the surface withenv appears to offer dynamic scoping, but ENV is global state. So withenv can never work reliably for concurrent code. That is, unless we took option (2) and avoid the C environment completely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    multithreadingBase.Threads and related functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions