Description
I am trying to fix it so that Rust's stdlib prevents unwinding, or allocating, in the child, after a fork on Unix (including in Command
). That is #81858. (Allocation after fork of a multithreaded program is UB in several libcs.)
I added a new test case, https://github.com/rust-lang/rust/blob/8220f2f2127b9aec972163ded97be7d8cff6b9a8/src/test/ui/process/process-panic-after-fork.rs https://github.com/rust-lang/rust/blob/6369637a192bbd0a2fbf8084345ddb7c099aa460/src/test/ui/process/process-panic-after-fork.rs Unfortunately this test fails, but just on Android: #81858 (comment)
I have few good theories as to why. I wrote some speculations: #81858 (comment)
I think this probably needs attention from an Android expert to try to repro and fix this issue. I suspect it's a problem with the library rather than the tests. The worst case is that it might be a general UB bug in Android Rust programs using libc::fork
or Command
.
I'm filing this issue here to try to ask for help again, since writing in #81858 doesn't seem like a particularly good way of getting the attention of Android folks.
If we can't get a resolution, reluctantly, I guess I will disable that test on Android so that my MR can go through. The current situation is quite a hazard (see eg #79740 "panic! in Command child forked from non-main thread results in exit status 0")
Technical discussion
I will try to explain what the test does and what the symptoms seem to mean:
The test file has a custom global allocator, whose purpose is to spot allocations in the child after fork. That global allocator has an atomic variable which is supposed to contain either zero (initially, meaning it's not engaged yet) or the process's pid. Whenever an allocator method is called, we read the atomic and, if it is not zero, we check it against process::id()
. If it doesn't match we libc::raise(libc::
SIGTRAP
SIGUSR1)
.
The test enters main
, and engages the stunt allocator, recording the program's pid. Each call to expect_aborted
(which is called from run
and therefore from one
) produces output from dbg!(status)
. We see only one of these, so this must be the first test, one(&|| panic!())
.
The test uses libc::fork
to fork. In the child, it calls panic::always_abort()
(my new function to disable panic unwinding). It then panics (using the provided closure). This ought to result in the program dying with SIGABRT
(or maybe SIGILL
or SIGTRAP
).
The parent collects the child's exit status. For the first test case, we run expect_aborted
. This extracts the signal number from it and checks that it is as expected. On other systems this works.
In the failing test, this test fails. The assertion on signal
fails. Meaning, the child did die of a signal but the signal number wasn't the one expected. The previous debug print shows that the raw wait status (confusingly described by Rust stdlib as an "exit status") is 5
10
. Usually, a bare number like that in a wait status is a signal number, and indeed that seems to be the case here since status.signal()
is Some(...)
. On Linux (and most Unices), 5
is SIGTRAP
and 10 is SIGUSR1
.
Ie, it seems that the child tried to allocate memory, despite my efforts to make sure that panicking does not involve allocation. Weirdly, a more-portable test case which uses Command
and does not insist on specific signal numbers passes.
It's definitely my stunt allocator which is tripping here, because when I changed it to use SIGUSR
instead of SIGTRAP
, the failing test case signal number changed too.