Skip to content

Regressions with large (2-4GB) stack arrays on large stacks #83060

Open
@joshtriplett

Description

@joshtriplett

In the course of responding to someone's question, I was experimenting with large (1-4 GB) arrays on the stack, using threads with large stacks created using std::thread::Builder::stack_size.

Code

Here's the baseline test program I used:

fn func() {
    const CAP: usize = std::u32::MAX as usize;
    let mut x: [u8; CAP] = [0; CAP];
    x[2] = 123;
    println!("{}", x[2]);
}

fn main() {
    std::thread::Builder::new()
        .stack_size(5 * 1024 * 1024 * 1024)
        .spawn(func)
        .unwrap()
        .join()
        .unwrap();
}

This creates a thread with a 5GB stack, allocates a 4GB array on it, sets one value in that array, reads that value back, and prints it. It should always print 123.

Call this the 4GB variation. I also tested 2GB and 1GB variations, by changing both references to CAP to CAP>>1 and CAP>>2 respectively. (I could have changed the constant itself rather than dividing it, but I'm preserving exactly the code I used in case it affects the result.)

Rust version info

I tested two versions of Rust: stable 1.50 (rustc 1.50.0 (cb75ad5db 2021-02-10)) and nightly (rustc 1.52.0-nightly (4a8b6f708 2021-03-11)).

stable `rustc --version --verbose`
rustc 1.50.0 (cb75ad5db 2021-02-10)
binary: rustc
commit-hash: cb75ad5db02783e8b0222fee363c5f63f7e2cf5b
commit-date: 2021-02-10
host: x86_64-unknown-linux-gnu
release: 1.50.0
nightly `rustc --version --verbose`
rustc 1.52.0-nightly (4a8b6f708 2021-03-11)
binary: rustc
commit-hash: 4a8b6f708c38342a6c74aa00cf4323774c7381a6
commit-date: 2021-03-11
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0

Results

I tested each of these three variations (1GB, 2GB, 4GB) on stable-debug (cargo run), stable-release (cargo run --release), and nightly-release (cargo +nightly run --release), and got three different results.

On stable-release, all three variations worked as expected, printing 123.

On stable in debug mode, it works as expected at 1GB and 2GB, but the 4GB version prints a random different value every time (not 123). This seems likely to be a soundness issue, so I'm labeling this I-unsound.

On nightly in release mode, the 1GB version works as expected, but the 2GB and 4GB versions both run for an unexpectedly longer time (many seconds) and then get killed by the Linux OOM killer. This is a regression from stable to nightly, so I'm labeling this accordingly.

I don't know if these come from one or multiple underlying issues. I'm reporting all the details here, but this may need to be split into multiple issues once analyzed further.

I dumped the generated code of all nine builds using objdump -d, and I'm attaching those. The diffs between working and non-working versions seem unusual and potentially relevant.

(I recommend filtering the objdumps through sed 's/anon[0-9a-f.]*llvm[0-9a-f.]*/anon-ELIDED-llvm-ELIDED/' to reduce spurious diff noise; that doesn't eliminate differences in code addresses, but it does eliminate differences in anonymous LLVM symbol names.)

objdump-d-stable-debug-1gb.txt
objdump-d-stable-debug-2gb.txt
objdump-d-stable-debug-4gb.txt
objdump-d-stable-release-1gb.txt
objdump-d-stable-release-2gb.txt
objdump-d-stable-release-4gb.txt
objdump-d-nightly-release-1gb.txt
objdump-d-nightly-release-2gb.txt
objdump-d-nightly-release-4gb.txt

The diff from nightly-release-1gb to nightly-release-2gb shows what look like signs of incorrect sign extension of large values. Note how sub $0x40000000,%r11 has become sub $0xffffffff80000000,%r11. Also note in the stack cleanup at the end that add $0x40000040,%rsp has become a two-step add $0x7fffffff,%rsp then add $0x41,%rsp; looks like something in the cleanup broke the large offset into two steps (which isn't necessarily a problem but seems notable).

highlights from diff of nightly-release-1gb to nightly-release-2gb
@@ -2468,7 +2468,7 @@
 0000000000008380 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E>:
     8380:      53                      push   %rbx
     8381:      49 89 e3                mov    %rsp,%r11
-    8384:      49 81 eb 00 00 00 40    sub    $0x40000000,%r11
+    8384:      49 81 eb 00 00 00 80    sub    $0xffffffff80000000,%r11
     838b:      48 81 ec 00 10 00 00    sub    $0x1000,%rsp
     8392:      48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
     8399:      00 
@@ -2476,7 +2476,7 @@
     839d:      75 ec                   jne    838b <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E+0xb>
     839f:      48 83 ec 40             sub    $0x40,%rsp
     83a3:      48 8d 5c 24 40          lea    0x40(%rsp),%rbx
-    83a8:      ba ff ff ff 3f          mov    $0x3fffffff,%edx
+    83a8:      ba ff ff ff 7f          mov    $0x7fffffff,%edx
     83ad:      48 89 df                mov    %rbx,%rdi
     83b0:      31 f6                   xor    %esi,%esi
     83b2:      ff 15 a0 07 04 00       callq  *0x407a0(%rip)        # 48b58 <memset@GLIBC_2.2.5>
@@ -2497,12 +2497,11 @@
     83ff:      00 00 
     8401:      48 8d 7c 24 10          lea    0x10(%rsp),%rdi
     8406:      ff 15 7c 0a 04 00       callq  *0x40a7c(%rip)        # 48e88 <_GLOBAL_OFFSET_TABLE_+0x5b0>
-    840c:      48 81 c4 40 00 00 40    add    $0x40000040,%rsp
-    8413:      5b                      pop    %rbx
-    8414:      c3                      retq   
-    8415:      66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
-    841c:      00 00 00 
-    841f:      90                      nop
+    840c:      48 81 c4 ff ff ff 7f    add    $0x7fffffff,%rsp
+    8413:      48 83 c4 41             add    $0x41,%rsp
+    8417:      5b                      pop    %rbx
+    8418:      c3                      retq   
+    8419:      0f 1f 80 00 00 00 00    nopl   0x0(%rax)
 
 0000000000008420 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3c355c4098361013E>:
     8420:      48 83 ec 08             sub    $0x8,%rsp

The diff from nightly-release-2gb to nightly-release-4gb shows further signs of incorrect handling of large values; note how sub $0xffffffff80000000,%r11 has now become sub $0x0,%r11. Also note in the cleanup that the two-step add has now become a three-step add (add $0x7fffffff,%rsp twice then add $0x42,%rsp). I tested, and if I build code that uses a much larger 200GB array, LLVM does generate a movabs into a register and then adds that register. So it looks like the stack cleanup code is correct with large stack arrays, but there's something wrong with the stack setup code and stack offset code.

highlights from diff of nightly-release-2gb to nightly-release-4gb
@@ -2468,7 +2468,7 @@
 0000000000008380 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E>:
     8380:	53                   	push   %rbx
     8381:	49 89 e3             	mov    %rsp,%r11
-    8384:	49 81 eb 00 00 00 80 	sub    $0xffffffff80000000,%r11
+    8384:	49 81 eb 00 00 00 00 	sub    $0x0,%r11
     838b:	48 81 ec 00 10 00 00 	sub    $0x1000,%rsp
     8392:	48 c7 04 24 00 00 00 	movq   $0x0,(%rsp)
     8399:	00 
@@ -2476,7 +2476,7 @@
     839d:	75 ec                	jne    838b <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h12c12c27c7a85101E+0xb>
     839f:	48 83 ec 40          	sub    $0x40,%rsp
     83a3:	48 8d 5c 24 40       	lea    0x40(%rsp),%rbx
-    83a8:	ba ff ff ff 7f       	mov    $0x7fffffff,%edx
+    83a8:	ba ff ff ff ff       	mov    $0xffffffff,%edx
     83ad:	48 89 df             	mov    %rbx,%rdi
     83b0:	31 f6                	xor    %esi,%esi
     83b2:	ff 15 a0 07 04 00    	callq  *0x407a0(%rip)        # 48b58 <memset@GLIBC_2.2.5>
@@ -2498,10 +2498,10 @@
     8401:	48 8d 7c 24 10       	lea    0x10(%rsp),%rdi
     8406:	ff 15 7c 0a 04 00    	callq  *0x40a7c(%rip)        # 48e88 <_GLOBAL_OFFSET_TABLE_+0x5b0>
     840c:	48 81 c4 ff ff ff 7f 	add    $0x7fffffff,%rsp
-    8413:	48 83 c4 41          	add    $0x41,%rsp
-    8417:	5b                   	pop    %rbx
-    8418:	c3                   	retq   
-    8419:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
+    8413:	48 81 c4 ff ff ff 7f 	add    $0x7fffffff,%rsp
+    841a:	48 83 c4 42          	add    $0x42,%rsp
+    841e:	5b                   	pop    %rbx
+    841f:	c3                   	retq   
 
 0000000000008420 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3c355c4098361013E>:
     8420:	48 83 ec 08          	sub    $0x8,%rsp

The diff from stable-debug-2gb to stable-debug-4gb is much noisier due to code addresses, but filtering out code-address-related differences, I want to highlight a portion of the diff that again looks like issues with incorrect handling of large values. Note the changes from large negative stack offsets like -0x7fffff98(%rsp) and -0x7fffff90(%rsp) to small positive stack offsets like 0x68(%rsp) and 0x70(%rsp). The latter look like they should be larger negative values, but they wrapped. I'm wondering if something has treated these stack offsets as 32-bit values.

highlights from diff of stable-debug-2gb to stable-debug-4gb
--- stable-debug-2gb-edited.txt	2021-03-12 12:26:03.557845940 -0800
+++ stable-debug-4gb-edited.txt	2021-03-12 12:26:05.849811731 -0800
@@ -1,52 +1,46 @@
 0000000000007cd0 <_ZN3foo4func17hd776abd5c5604338E>:
-    offs:	b8 78 00 00 80       	mov    $0x80000078,%eax
-    offs:	e8 2b 62 03 00       	callq  3df05 <__rust_probestack>
+    offs:	48 b8 78 00 00 00 01 	movabs $0x100000078,%rax
+    offs:	00 00 00 
+    offs:	e8 16 62 03 00       	callq  3def5 <__rust_probestack>
     offs:	48 29 c4             	sub    %rax,%rsp
-    offs:	48 8d 35 5c 58 03 00 	lea    0x3585c(%rip),%rsi        # 3d540 <_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hcc1876ba0cca6062E>
+    offs:	48 8d 35 47 58 03 00 	lea    0x35847(%rip),%rsi        # 3d530 <_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hcc1876ba0cca6062E>
     offs:	48 8d 44 24 29       	lea    0x29(%rsp),%rax
     offs:	31 c9                	xor    %ecx,%ecx
     offs:	48 89 c7             	mov    %rax,%rdi
     offs:	48 89 74 24 20       	mov    %rsi,0x20(%rsp)
     offs:	89 ce                	mov    %ecx,%esi
-    offs:	ba ff ff ff 7f       	mov    $0x7fffffff,%edx
+    offs:	ba ff ff ff ff       	mov    $0xffffffff,%edx
     offs:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
-    offs:	e8 5c e3 ff ff       	callq  6060 <memset@plt>
+    offs:	e8 57 e3 ff ff       	callq  6060 <memset@plt>
     offs:	c6 44 24 2b 7b       	movb   $0x7b,0x2b(%rsp)
     offs:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
     offs:	48 05 02 00 00 00    	add    $0x2,%rax
-    offs:	48 89 84 24 68 00 00 	mov    %rax,-0x7fffff98(%rsp)
-    offs:	80 
-    offs:	48 8b 84 24 68 00 00 	mov    -0x7fffff98(%rsp),%rax
-    offs:	80 
-    offs:	48 89 84 24 70 00 00 	mov    %rax,-0x7fffff90(%rsp)
-    offs:	80 
+    offs:	48 89 44 24 68       	mov    %rax,0x68(%rsp)
+    offs:	48 8b 44 24 68       	mov    0x68(%rsp),%rax
+    offs:	48 89 44 24 70       	mov    %rax,0x70(%rsp)
     offs:	48 89 c7             	mov    %rax,%rdi
     offs:	48 8b 74 24 20       	mov    0x20(%rsp),%rsi
-    offs:	e8 37 ff ff ff       	callq  7c70 <_ZN4core3fmt10ArgumentV13new17h9ce77c97586d9adaE>
+    offs:	e8 3b ff ff ff       	callq  7c70 <_ZN4core3fmt10ArgumentV13new17h9ce77c97586d9adaE>
     offs:	48 89 44 24 10       	mov    %rax,0x10(%rsp)
     offs:	48 89 54 24 08       	mov    %rdx,0x8(%rsp)
-    offs:	48 8d 05 26 76 04 00 	lea    0x47626(%rip),%rax        # 4f370 <__do_global_dtors_aux_fini_array_entry+0x20>
+    offs:	48 8d 05 2a 76 04 00 	lea    0x4762a(%rip),%rax        # 4f370 <__do_global_dtors_aux_fini_array_entry+0x20>
     offs:	48 8b 4c 24 10       	mov    0x10(%rsp),%rcx
-    offs:	48 89 8c 24 58 00 00 	mov    %rcx,-0x7fffffa8(%rsp)
-    offs:	80 
+    offs:	48 89 4c 24 58       	mov    %rcx,0x58(%rsp)
     offs:	48 8b 54 24 08       	mov    0x8(%rsp),%rdx
-    offs:	48 89 94 24 60 00 00 	mov    %rdx,-0x7fffffa0(%rsp)
-    offs:	80 
-    offs:	48 8d b4 24 58 00 00 	lea    -0x7fffffa8(%rsp),%rsi
-    offs:	80 
-    offs:	48 8d bc 24 28 00 00 	lea    -0x7fffffd8(%rsp),%rdi
-    offs:	80 
+    offs:	48 89 54 24 60       	mov    %rdx,0x60(%rsp)
+    offs:	48 8d 74 24 58       	lea    0x58(%rsp),%rsi
+    offs:	48 8d 7c 24 28       	lea    0x28(%rsp),%rdi
     offs:	48 89 34 24          	mov    %rsi,(%rsp)
     offs:	48 89 c6             	mov    %rax,%rsi
     offs:	ba 02 00 00 00       	mov    $0x2,%edx
     offs:	48 8b 0c 24          	mov    (%rsp),%rcx
     offs:	41 b8 01 00 00 00    	mov    $0x1,%r8d
-    offs:	e8 01 0e 00 00       	callq  8b90 <_ZN4core3fmt9Arguments6new_v117h797f0a7bd4fbfb7aE>
-    offs:	48 8d bc 24 28 00 00 	lea    -0x7fffffd8(%rsp),%rdi
-    offs:	80 
-    offs:	ff 15 bb 9f 04 00    	callq  *0x49fbb(%rip)        # 51d58 <_GLOBAL_OFFSET_TABLE_+0x430>
-    offs:	48 b8 78 00 00 80 00 	movabs $0x80000078,%rax
+    offs:	e8 01 0e 00 00       	callq  8b80 <_ZN4core3fmt9Arguments6new_v117h797f0a7bd4fbfb7aE>
+    offs:	48 8d 7c 24 28       	lea    0x28(%rsp),%rdi
+    offs:	ff 15 ce 9f 04 00    	callq  *0x49fce(%rip)        # 51d58 <_GLOBAL_OFFSET_TABLE_+0x430>
+    offs:	48 b8 78 00 00 00 01 	movabs $0x100000078,%rax
     offs:	00 00 00 
     offs:	48 01 c4             	add    %rax,%rsp
     offs:	c3                   	retq   
-    offs:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
+    offs:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
+    offs:	00 

Metadata

Metadata

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-arrayArea: `[T; N]`C-bugCategory: This is a bug.E-help-wantedCall for participation: Help is requested to fix this issue.E-mediumCall for participation: Medium difficulty. Experience needed to fix: Intermediate.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessP-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.WG-diagnosticsWorking group: DiagnosticsWG-llvmWorking group: LLVM backend code generation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions