Description
In the course of responding to someone's question, I was experimenting with large (1-4 GB) arrays on the stack, using threads with large stacks created using std::thread::Builder::stack_size
.
Code
Here's the baseline test program I used:
fn func() {
const CAP: usize = std::u32::MAX as usize;
let mut x: [u8; CAP] = [0; CAP];
x[2] = 123;
println!("{}", x[2]);
}
fn main() {
std::thread::Builder::new()
.stack_size(5 * 1024 * 1024 * 1024)
.spawn(func)
.unwrap()
.join()
.unwrap();
}
This creates a thread with a 5GB stack, allocates a 4GB array on it, sets one value in that array, reads that value back, and prints it. It should always print 123
.
Call this the 4GB variation. I also tested 2GB and 1GB variations, by changing both references to CAP
to CAP>>1
and CAP>>2
respectively. (I could have changed the constant itself rather than dividing it, but I'm preserving exactly the code I used in case it affects the result.)
Rust version info
I tested two versions of Rust: stable 1.50 (rustc 1.50.0 (cb75ad5db 2021-02-10)
) and nightly (rustc 1.52.0-nightly (4a8b6f708 2021-03-11)
).
stable `rustc --version --verbose`
rustc 1.50.0 (cb75ad5db 2021-02-10)
binary: rustc
commit-hash: cb75ad5db02783e8b0222fee363c5f63f7e2cf5b
commit-date: 2021-02-10
host: x86_64-unknown-linux-gnu
release: 1.50.0
nightly `rustc --version --verbose`
rustc 1.52.0-nightly (4a8b6f708 2021-03-11)
binary: rustc
commit-hash: 4a8b6f708c38342a6c74aa00cf4323774c7381a6
commit-date: 2021-03-11
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0
Results
I tested each of these three variations (1GB, 2GB, 4GB) on stable-debug (cargo run
), stable-release (cargo run --release
), and nightly-release (cargo +nightly run --release
), and got three different results.
On stable-release, all three variations worked as expected, printing 123
.
On stable in debug mode, it works as expected at 1GB and 2GB, but the 4GB version prints a random different value every time (not 123). This seems likely to be a soundness issue, so I'm labeling this I-unsound.
On nightly in release mode, the 1GB version works as expected, but the 2GB and 4GB versions both run for an unexpectedly longer time (many seconds) and then get killed by the Linux OOM killer. This is a regression from stable to nightly, so I'm labeling this accordingly.
I don't know if these come from one or multiple underlying issues. I'm reporting all the details here, but this may need to be split into multiple issues once analyzed further.
I dumped the generated code of all nine builds using objdump -d
, and I'm attaching those. The diffs between working and non-working versions seem unusual and potentially relevant.
(I recommend filtering the objdumps through sed 's/anon[0-9a-f.]*llvm[0-9a-f.]*/anon-ELIDED-llvm-ELIDED/'
to reduce spurious diff noise; that doesn't eliminate differences in code addresses, but it does eliminate differences in anonymous LLVM symbol names.)
objdump-d-stable-debug-1gb.txt
objdump-d-stable-debug-2gb.txt
objdump-d-stable-debug-4gb.txt
objdump-d-stable-release-1gb.txt
objdump-d-stable-release-2gb.txt
objdump-d-stable-release-4gb.txt
objdump-d-nightly-release-1gb.txt
objdump-d-nightly-release-2gb.txt
objdump-d-nightly-release-4gb.txt
The diff from nightly-release-1gb to nightly-release-2gb shows what look like signs of incorrect sign extension of large values. Note how sub $0x40000000,%r11
has become sub $0xffffffff80000000,%r11
. Also note in the stack cleanup at the end that add $0x40000040,%rsp
has become a two-step add $0x7fffffff,%rsp
then add $0x41,%rsp
; looks like something in the cleanup broke the large offset into two steps (which isn't necessarily a problem but seems notable).
highlights from diff of nightly-release-1gb to nightly-release-2gb
@@ -2468,7 +2468,7 @@
0000000000008380 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E>:
8380: 53 push %rbx
8381: 49 89 e3 mov %rsp,%r11
- 8384: 49 81 eb 00 00 00 40 sub $0x40000000,%r11
+ 8384: 49 81 eb 00 00 00 80 sub $0xffffffff80000000,%r11
838b: 48 81 ec 00 10 00 00 sub $0x1000,%rsp
8392: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
8399: 00
@@ -2476,7 +2476,7 @@
839d: 75 ec jne 838b <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E+0xb>
839f: 48 83 ec 40 sub $0x40,%rsp
83a3: 48 8d 5c 24 40 lea 0x40(%rsp),%rbx
- 83a8: ba ff ff ff 3f mov $0x3fffffff,%edx
+ 83a8: ba ff ff ff 7f mov $0x7fffffff,%edx
83ad: 48 89 df mov %rbx,%rdi
83b0: 31 f6 xor %esi,%esi
83b2: ff 15 a0 07 04 00 callq *0x407a0(%rip) # 48b58 <memset@GLIBC_2.2.5>
@@ -2497,12 +2497,11 @@
83ff: 00 00
8401: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
8406: ff 15 7c 0a 04 00 callq *0x40a7c(%rip) # 48e88 <_GLOBAL_OFFSET_TABLE_+0x5b0>
- 840c: 48 81 c4 40 00 00 40 add $0x40000040,%rsp
- 8413: 5b pop %rbx
- 8414: c3 retq
- 8415: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
- 841c: 00 00 00
- 841f: 90 nop
+ 840c: 48 81 c4 ff ff ff 7f add $0x7fffffff,%rsp
+ 8413: 48 83 c4 41 add $0x41,%rsp
+ 8417: 5b pop %rbx
+ 8418: c3 retq
+ 8419: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000008420 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3c355c4098361013E>:
8420: 48 83 ec 08 sub $0x8,%rsp
The diff from nightly-release-2gb to nightly-release-4gb shows further signs of incorrect handling of large values; note how sub $0xffffffff80000000,%r11
has now become sub $0x0,%r11
. Also note in the cleanup that the two-step add has now become a three-step add (add $0x7fffffff,%rsp
twice then add $0x42,%rsp
). I tested, and if I build code that uses a much larger 200GB array, LLVM does generate a movabs
into a register and then adds that register. So it looks like the stack cleanup code is correct with large stack arrays, but there's something wrong with the stack setup code and stack offset code.
highlights from diff of nightly-release-2gb to nightly-release-4gb
@@ -2468,7 +2468,7 @@
0000000000008380 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E>:
8380: 53 push %rbx
8381: 49 89 e3 mov %rsp,%r11
- 8384: 49 81 eb 00 00 00 80 sub $0xffffffff80000000,%r11
+ 8384: 49 81 eb 00 00 00 00 sub $0x0,%r11
838b: 48 81 ec 00 10 00 00 sub $0x1000,%rsp
8392: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
8399: 00
@@ -2476,7 +2476,7 @@
839d: 75 ec jne 838b <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E+0xb>
839f: 48 83 ec 40 sub $0x40,%rsp
83a3: 48 8d 5c 24 40 lea 0x40(%rsp),%rbx
- 83a8: ba ff ff ff 7f mov $0x7fffffff,%edx
+ 83a8: ba ff ff ff ff mov $0xffffffff,%edx
83ad: 48 89 df mov %rbx,%rdi
83b0: 31 f6 xor %esi,%esi
83b2: ff 15 a0 07 04 00 callq *0x407a0(%rip) # 48b58 <memset@GLIBC_2.2.5>
@@ -2498,10 +2498,10 @@
8401: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
8406: ff 15 7c 0a 04 00 callq *0x40a7c(%rip) # 48e88 <_GLOBAL_OFFSET_TABLE_+0x5b0>
840c: 48 81 c4 ff ff ff 7f add $0x7fffffff,%rsp
- 8413: 48 83 c4 41 add $0x41,%rsp
- 8417: 5b pop %rbx
- 8418: c3 retq
- 8419: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
+ 8413: 48 81 c4 ff ff ff 7f add $0x7fffffff,%rsp
+ 841a: 48 83 c4 42 add $0x42,%rsp
+ 841e: 5b pop %rbx
+ 841f: c3 retq
0000000000008420 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3c355c4098361013E>:
8420: 48 83 ec 08 sub $0x8,%rsp
The diff from stable-debug-2gb to stable-debug-4gb is much noisier due to code addresses, but filtering out code-address-related differences, I want to highlight a portion of the diff that again looks like issues with incorrect handling of large values. Note the changes from large negative stack offsets like -0x7fffff98(%rsp)
and -0x7fffff90(%rsp)
to small positive stack offsets like 0x68(%rsp)
and 0x70(%rsp)
. The latter look like they should be larger negative values, but they wrapped. I'm wondering if something has treated these stack offsets as 32-bit values.
highlights from diff of stable-debug-2gb to stable-debug-4gb
--- stable-debug-2gb-edited.txt 2021-03-12 12:26:03.557845940 -0800
+++ stable-debug-4gb-edited.txt 2021-03-12 12:26:05.849811731 -0800
@@ -1,52 +1,46 @@
0000000000007cd0 <_ZN3foo4func17hd776abd5c5604338E>:
- offs: b8 78 00 00 80 mov $0x80000078,%eax
- offs: e8 2b 62 03 00 callq 3df05 <__rust_probestack>
+ offs: 48 b8 78 00 00 00 01 movabs $0x100000078,%rax
+ offs: 00 00 00
+ offs: e8 16 62 03 00 callq 3def5 <__rust_probestack>
offs: 48 29 c4 sub %rax,%rsp
- offs: 48 8d 35 5c 58 03 00 lea 0x3585c(%rip),%rsi # 3d540 <_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hcc1876ba0cca6062E>
+ offs: 48 8d 35 47 58 03 00 lea 0x35847(%rip),%rsi # 3d530 <_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hcc1876ba0cca6062E>
offs: 48 8d 44 24 29 lea 0x29(%rsp),%rax
offs: 31 c9 xor %ecx,%ecx
offs: 48 89 c7 mov %rax,%rdi
offs: 48 89 74 24 20 mov %rsi,0x20(%rsp)
offs: 89 ce mov %ecx,%esi
- offs: ba ff ff ff 7f mov $0x7fffffff,%edx
+ offs: ba ff ff ff ff mov $0xffffffff,%edx
offs: 48 89 44 24 18 mov %rax,0x18(%rsp)
- offs: e8 5c e3 ff ff callq 6060 <memset@plt>
+ offs: e8 57 e3 ff ff callq 6060 <memset@plt>
offs: c6 44 24 2b 7b movb $0x7b,0x2b(%rsp)
offs: 48 8b 44 24 18 mov 0x18(%rsp),%rax
offs: 48 05 02 00 00 00 add $0x2,%rax
- offs: 48 89 84 24 68 00 00 mov %rax,-0x7fffff98(%rsp)
- offs: 80
- offs: 48 8b 84 24 68 00 00 mov -0x7fffff98(%rsp),%rax
- offs: 80
- offs: 48 89 84 24 70 00 00 mov %rax,-0x7fffff90(%rsp)
- offs: 80
+ offs: 48 89 44 24 68 mov %rax,0x68(%rsp)
+ offs: 48 8b 44 24 68 mov 0x68(%rsp),%rax
+ offs: 48 89 44 24 70 mov %rax,0x70(%rsp)
offs: 48 89 c7 mov %rax,%rdi
offs: 48 8b 74 24 20 mov 0x20(%rsp),%rsi
- offs: e8 37 ff ff ff callq 7c70 <_ZN4core3fmt10ArgumentV13new17h9ce77c97586d9adaE>
+ offs: e8 3b ff ff ff callq 7c70 <_ZN4core3fmt10ArgumentV13new17h9ce77c97586d9adaE>
offs: 48 89 44 24 10 mov %rax,0x10(%rsp)
offs: 48 89 54 24 08 mov %rdx,0x8(%rsp)
- offs: 48 8d 05 26 76 04 00 lea 0x47626(%rip),%rax # 4f370 <__do_global_dtors_aux_fini_array_entry+0x20>
+ offs: 48 8d 05 2a 76 04 00 lea 0x4762a(%rip),%rax # 4f370 <__do_global_dtors_aux_fini_array_entry+0x20>
offs: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx
- offs: 48 89 8c 24 58 00 00 mov %rcx,-0x7fffffa8(%rsp)
- offs: 80
+ offs: 48 89 4c 24 58 mov %rcx,0x58(%rsp)
offs: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
- offs: 48 89 94 24 60 00 00 mov %rdx,-0x7fffffa0(%rsp)
- offs: 80
- offs: 48 8d b4 24 58 00 00 lea -0x7fffffa8(%rsp),%rsi
- offs: 80
- offs: 48 8d bc 24 28 00 00 lea -0x7fffffd8(%rsp),%rdi
- offs: 80
+ offs: 48 89 54 24 60 mov %rdx,0x60(%rsp)
+ offs: 48 8d 74 24 58 lea 0x58(%rsp),%rsi
+ offs: 48 8d 7c 24 28 lea 0x28(%rsp),%rdi
offs: 48 89 34 24 mov %rsi,(%rsp)
offs: 48 89 c6 mov %rax,%rsi
offs: ba 02 00 00 00 mov $0x2,%edx
offs: 48 8b 0c 24 mov (%rsp),%rcx
offs: 41 b8 01 00 00 00 mov $0x1,%r8d
- offs: e8 01 0e 00 00 callq 8b90 <_ZN4core3fmt9Arguments6new_v117h797f0a7bd4fbfb7aE>
- offs: 48 8d bc 24 28 00 00 lea -0x7fffffd8(%rsp),%rdi
- offs: 80
- offs: ff 15 bb 9f 04 00 callq *0x49fbb(%rip) # 51d58 <_GLOBAL_OFFSET_TABLE_+0x430>
- offs: 48 b8 78 00 00 80 00 movabs $0x80000078,%rax
+ offs: e8 01 0e 00 00 callq 8b80 <_ZN4core3fmt9Arguments6new_v117h797f0a7bd4fbfb7aE>
+ offs: 48 8d 7c 24 28 lea 0x28(%rsp),%rdi
+ offs: ff 15 ce 9f 04 00 callq *0x49fce(%rip) # 51d58 <_GLOBAL_OFFSET_TABLE_+0x430>
+ offs: 48 b8 78 00 00 00 01 movabs $0x100000078,%rax
offs: 00 00 00
offs: 48 01 c4 add %rax,%rsp
offs: c3 retq
- offs: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
+ offs: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
+ offs: 00