Skip to content

Rust nightly 2018-08-17 or later causes random segmentation faults or panics #53529

Closed
@yorickpeterse

Description

@yorickpeterse

https://gitlab.com/inko-lang/inko is a programming language that I am working on, and the VM is written in Rust. Up until Rust nightly 2018-08-17, everything works fine. Starting with the nightly from the 17th, I'm observing various crashes and different program behaviour. For example:

  • On Windows it will either fail with a memory allocation error, or an error in the runtime test library (more on this in a moment).
  • On Linux it will segfault. Note that the funny segfault output is because the command is started with Ruby, and Ruby installs its own segmentation fault handler.
  • Locally it will usually fail with the same runtime error as observed in Windows above, but sometimes it will segfault. Sometimes it will panic because certain operations are performed using NULL pointers where this is not expected.

The last nightly that did not suffer from these problems was Rust 2018-08-16. Stable Rust also works fine. When the segmentation faults happen, they are usually in different places. For example, for one segmentation fault the backtrace is as follows:

#0  0x00007ffff7e12763 in _int_malloc () from /usr/lib/libc.so.6
#1  0x00007ffff7e13ada in malloc () from /usr/lib/libc.so.6
#2  0x0000555555568e6b in alloc::alloc::alloc (layout=...) at /checkout/src/liballoc/alloc.rs:78
#3  <libinko::chunk::Chunk<T>>::new (capacity=3) at src/chunk.rs:29
#4  libinko::register::Register::new (amount=3) at src/register.rs:23
#5  libinko::execution_context::ExecutionContext::from_block (block=0x7fffdc0971e0, return_register=Some = {...}) at src/execution_context.rs:60
#6  libinko::vm::machine::Machine::run (self=<optimized out>, process=<optimized out>) at src/vm/machine.rs:2350
#7  0x0000555555568b7f in libinko::vm::machine::Machine::run_with_error_handling (self=0x55555567d8c0, process=0x7ffff75fcbd0) at src/vm/machine.rs:351
#8  0x00005555555c88a4 in libinko::vm::machine::Machine::start_primary_threads::{{closure}} (process=...) at src/vm/machine.rs:260
#9  <libinko::pool::PoolInner<T>>::process (self=<optimized out>, index=0, closure=0x7ffff75fcc60) at src/pool.rs:186
#10 0x00005555555b739d in <libinko::pool::Pool<T>>::run::{{closure}} () at src/pool.rs:126
#11 std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:136
#12 0x00005555555cb0dc in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:409
#13 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>) at /checkout/src/libstd/panic.rs:313
#14 std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:310
#15 0x0000555555618a3a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#16 0x00005555555ba39b in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:289
#17 std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:392
#18 std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:408
#19 <F as alloc::boxed::FnBox<A>>::call_box (self=0x55555567db50, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:642
#20 0x00005555556090db in _$LT$alloc..boxed..Box$LT$$LP$dyn$u20$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$RP$$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h904fcd0dbdc71d4f () at /checkout/src/liballoc/boxed.rs:652
#21 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#22 0x00005555555f83b6 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#23 0x00007ffff7f73a9d in start_thread () from /usr/lib/libpthread.so.0
#24 0x00007ffff7e89a43 in clone () from /usr/lib/libc.so.6

While for another segfault the backtrace is instead:

#0  libinko::runtime_panic::display_panic (process=0x7f, message="ObjectValue::as_block() called on a non block object") at src/runtime_panic.rs:11
#1  0x0000555555568baa in libinko::vm::machine::Machine::panic (self=0x55555567d8c0, process=0x7ffff71f8bd0, message="") at src/vm/machine.rs:3750
#2  libinko::vm::machine::Machine::run_with_error_handling (self=0x55555567d8c0, process=0x7ffff71f8bd0) at src/vm/machine.rs:352
#3  0x00005555555c88a4 in libinko::vm::machine::Machine::start_primary_threads::{{closure}} (process=...) at src/vm/machine.rs:260
#4  <libinko::pool::PoolInner<T>>::process (self=<optimized out>, index=4, closure=0x7ffff71f8c60) at src/pool.rs:186
#5  0x00005555555b739d in <libinko::pool::Pool<T>>::run::{{closure}} () at src/pool.rs:126
#6  std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:136
#7  0x00005555555cb0dc in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:409
#8  <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>) at /checkout/src/libstd/panic.rs:313
#9  std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:310
#10 0x0000555555618a3a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#11 0x00005555555ba39b in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:289
#12 std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:392
#13 std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:408
#14 <F as alloc::boxed::FnBox<A>>::call_box (self=0x55555567e5d0, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:642
#15 0x00005555556090db in _$LT$alloc..boxed..Box$LT$$LP$dyn$u20$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$RP$$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h904fcd0dbdc71d4f () at /checkout/src/liballoc/boxed.rs:652
#16 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#17 0x00005555555f83b6 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#18 0x00007ffff7f73a9d in start_thread () from /usr/lib/libpthread.so.0
#19 0x00007ffff7e89a43 in clone () from /usr/lib/libc.so.6

And a third segfault:

#0  0x0000555555568d84 in libinko::vm::machine::Machine::run (self=<optimized out>, process=<optimized out>) at src/vm/machine.rs:388
#1  0x0000555555568b7f in libinko::vm::machine::Machine::run_with_error_handling (self=0x55555567d8c0, process=0x7ffff6cf3bd0) at src/vm/machine.rs:351
#2  0x00005555555c88a4 in libinko::vm::machine::Machine::start_primary_threads::{{closure}} (process=...) at src/vm/machine.rs:260
#3  <libinko::pool::PoolInner<T>>::process (self=<optimized out>, index=10, closure=0x7ffff6cf3c60) at src/pool.rs:186
#4  0x00005555555b739d in <libinko::pool::Pool<T>>::run::{{closure}} () at src/pool.rs:126
#5  std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /checkout/src/libstd/sys_common/backtrace.rs:136
#6  0x00005555555cb0dc in std::thread::Builder::spawn::{{closure}}::{{closure}} () at /checkout/src/libstd/thread/mod.rs:409
#7  <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=<optimized out>) at /checkout/src/libstd/panic.rs:313
#8  std::panicking::try::do_call (data=<optimized out>) at /checkout/src/libstd/panicking.rs:310
#9  0x0000555555618a3a in __rust_maybe_catch_panic () at libpanic_unwind/lib.rs:102
#10 0x00005555555ba39b in std::panicking::try (f=...) at /checkout/src/libstd/panicking.rs:289
#11 std::panic::catch_unwind (f=...) at /checkout/src/libstd/panic.rs:392
#12 std::thread::Builder::spawn::{{closure}} () at /checkout/src/libstd/thread/mod.rs:408
#13 <F as alloc::boxed::FnBox<A>>::call_box (self=0x55555567f590, args=<optimized out>) at /checkout/src/liballoc/boxed.rs:642
#14 0x00005555556090db in _$LT$alloc..boxed..Box$LT$$LP$dyn$u20$alloc..boxed..FnBox$LT$A$C$$u20$Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$RP$$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h904fcd0dbdc71d4f () at /checkout/src/liballoc/boxed.rs:652
#15 std::sys_common::thread::start_thread () at libstd/sys_common/thread.rs:24
#16 0x00005555555f83b6 in std::sys::unix::thread::Thread::new::thread_start () at libstd/sys/unix/thread.rs:90
#17 0x00007ffff7f73a9d in start_thread () from /usr/lib/libpthread.so.0
#18 0x00007ffff7e89a43 in clone () from /usr/lib/libc.so.6

In case of the last segfault, it seems certain local variables that are used are NULL pointers, when this should be impossible. Debugging this in GDB proves to be quite difficult, as a variety of variables are reported as even when debugging symbols are included. For example, for the last backtrace the output of "info locals" is:

(gdb) info locals
instruction = 0x0
index = 1
code = <optimized out>
context = 0x7fffb4000c30
reductions = 984

The VM test suite passes, even when running cargo test --release. I'm wondering if perhaps code is optimised in the wrong way, and this is somehow not triggered in the test suite (certainly possible, code coverage is not 100%).

Reproducing this is a bit weird. If we leave the code as-is, the segmentation faults rarely occur, instead the VM panics with the following:

Stack trace (the most recent call comes last):
  0: "/home/yorickpeterse/Projects/inko/inko/runtime/src/std/process.inko", line 324, in "<block>"
  1: "/home/yorickpeterse/Projects/inko/inko/runtime/src/std/test/runner.inko", line 281, in "<lambda>"
  2: "/home/yorickpeterse/Projects/inko/inko/runtime/src/std/test/runner.inko", line 220, in "run"
Process 1 panicked: ObjectValue::as_block() called on a non block object

However, if we apply the following patch things will start to segfault really quick:

diff --git a/runtime/src/std/test/runner.inko b/runtime/src/std/test/runner.inko
index 8175e2e..45fa998 100644
--- a/runtime/src/std/test/runner.inko
+++ b/runtime/src/std/test/runner.inko
@@ -217,6 +217,8 @@ object Runner {
   def run {
     let command = @receiver.receive

+    _INKOC.stdout_write(command.inspect + "\n")
+
     command.run(@state)

     @state.terminate?.if_true {
  1. git clone https://gitlab.com/inko-lang/inko.git
  2. cd inko
  3. make -C vm profile
  4. curl https://gist.githubusercontent.com/YorickPeterse/2be478ab617ad02e9e2495130e8f32f0/raw/38ca8bcab963d5b9fc4d192e126f546bff0f6aa9/crash.patch | patch -p1 -N
  5. env RUBYLIB=./compiler/lib ./compiler/bin/inko-test -d runtime --vm vm/target/release/ivm

Note that the last command requires Ruby 2.3 or newer. This will run the test suite of the standard library, which is where all the crashes happen rather frequently (probably because they run much more than the VM's own test suite).

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundnessregression-from-stable-to-nightlyPerformance or correctness regression from stable to nightly.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions