Description
fn main() {
unsafe {*(0 as *mut u32) = 0};
}
playpen: application terminated abnormally with signal 4 (Illegal instruction)
The reason for this is that, when a UNIX signal is being handled, UNIX blocks that signal while the handler is running, by default. However, this particular signal handler tries to re-raise the signal when it's done with processing. This doesn't do anything since the handler is still running, so execution continues with the next line, intrinsics::abort()
, which induces SIGILL
. (The code tries to reset the signal disposition to avoid re-entering the same handler, but that's not enough; the mask of blocked signals also needs to be cleared.)
There are a couple of possible solutions to this. The straightforward one is to cause the signal not to be masked. I have a patch for this that depends on my signal-FFI-bindings refactoring in #25784; I can submit it as a PR once that lands.
Another one, which I prefer, is to just let the handler terminate instead of re-raising. There was discussion about this previously, where it was noted that glibc's manual says this is undefined for "program error signals" like SIGSEGV
and SIGBUS
. POSIX agrees with that. However, most platforms in practice define this behavior, and allow you to return from these handlers, so that you can do things like userspace page-fault handling.
For example,
- Google's Breakpad crash-handling library returns from
SIGSEGV
on Linux and on Solaris. (On Darwin they use Mach exceptions.) - The Oracle JVM relies on being able to return from a
SIGSEGV
handler: they use it as a trick for stop-the-world pauses with low overhead. (Periodically each thread will do a single read from a special page, which is ~one instruction. If the GC wants to stop the world, it'll unmap the page to cause each thread to fault, and once it's ready to continue the world, it'll remap the page and make all threads return from their signal handlers.) So any UNIX platform where the JVM works should support this. - GNU libsigsegv is a library for doing things like userspace paging, and it's predicated on the assumption that returning from
SIGSEGV
is possible. The PORTING file has a wide list of supported platforms, including Linux, Darwin, FreeBSD, OpenBSD, NetBSD, Solaris, and MinGW.
So I think that it's merely the case that returning from a program error handler is unspecified in POSIX, but just about all actual OSes we care about support it.
This has the advantage that the exact signal is re-delivered to kill the program, with the right siginfo (indicating it died because of a memory error, not because someone manually sent SIGSEGV
), so the last frame in a coredump is right, dmesg
prints a line, etc. We can keep the re-raise for unknown platforms, but for platforms where we know that returning from the handler works, that seems both simpler and better. (Breakpad takes this approach for the same reasons.)
The final option is just to remove this code. What it does, as far as I see, is to print an error message and die if the segfault was on the guard page, and just die otherwise. I don't think there's a compelling reason to print a message for stack overflow, especially as caught by SIGSEGV
(it makes more sense if it's caught by stack probes or morestack
). In any other systems language, overflowing the stack just gets you killed with SIGSEGV
, and installing a special handler doesn't seem to match the runtime-removal philosophy. It made sense in the librustrt
world, but I'd argue it's not useful now. But if other people are finding the handler / the error message useful, that's fine.
Cc @Zoxc for advice, as the original author of the segfault handling code.