Description
For some reason I was reading our ENV
code, and I noticed we don't have a lock around our calls to getenv
and setenv
on linux, or for iterating the environment via the environ
variable.
This makes ENV
unsafe for use with multithreaded julia, as getenv
and setenv
are not mutually threadsafe in glibc. (See https://github.com/bminor/glibc/blob/master/stdlib/getenv.c and https://github.com/bminor/glibc/blob/master/stdlib/setenv.c. Unsurprisingly getenv
is safe by itself, but mysteriously setenv
is protected by a lock which getenv
ignores! So you can mutually setenv
safely, but nobody can presume to getenv
elsewhere in the same multithreaded code.)
We can easily add some more locks on our side as mitigation, but unfortunately we can't really fix setenv
without fixing glibc; a random C library we link against may decide to call getenv
at any point. This definitely happens in practice and I've experienced it personally: OpenMathLib/OpenBLAS#716 (comment). There's a nice discussion on the rust issue tracker including this gem: rust-lang/rust#24741 (comment).
Some options:
- Mitigation: Put our own lock around
getenv/setenv/environ
access, and just hope that no C library we link against calls these functions itself. Add a big warning to the docs. This is the current rust approach though it's fragile. - Avoidance: Ban using libc
setenv
entirely; create a shadow environment which is a copy of the system environment. This is the C# approach but with Julia's tradition of calling into C libraries I doubt that will work out (it creates surprises even in C#; see https://yizhang82.dev/set-environment-variable) - Fix glibc??: Every portable language runtime has this exact problem, so people have tried fixing this in the past. In older times they were met with surprising hostility. The seemingly-current bug is marked suspended https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c2 whatever that means.
For now the only easy / possible thing to do is to clearly option (1): add some locking and a big warning. If glibc could be fixed it could morph into a long term plan. More recent bugs suggest the glibc maintainers may possibly accept a patch.
As a side note, I feel we should consider removing withenv
in the future because the shape of that API misrepresents reality. On the surface withenv
appears to offer dynamic scoping, but ENV
is global state. So withenv
can never work reliably for concurrent code. That is, unless we took option (2) and avoid the C environment completely.