Skip to content

Not inlining array/slice access and iteration at opt-level=z causes knock-on effects on size #123345

Open
@cbiffle

Description

@cbiffle

Summary

Hi! We've been seeing some surprising codegen decisions in Hubris programs for a while, and I wanted to surface it upstream in case there's anything to be done.

We build primarily at opt-level="z", though many of these behaviors seem to happen at opt-level="s" too. (z code is very slightly smaller.)

There are a handful of routines in core that rustc often decides not to inline which have significant knock-on effects on LLVM's ability to reason about panics, integer ranges for bounds checks, and the like. The problem primarily arises with:

  • <Enumerate<core::slice::iter::Iter<T>> as Iterator>>::next - not inlining this appears to hamper the ability of other parts of the compiler to notice that the enumerated indices it's producing are in a predictable range
  • <Index<usize> for [T]>::index or <Index<Range<usize>> for [T]>::index - the non-inlined version of this does an unconditional bounds check, when in many cases there is information available at the callsite to eliminate it were it inlined
  • More rarely, but often enough, <core::slice::Iter<T> as Iterator>::next

All of these routines, when inlined, tend to produce either one or zero machine instructions -- one in the case of index which is typically an indexed addressing mode load, zero in the case of Enumerate::next where the information is often already available in the calling frame for other reasons, and generally one in the case of slice::Iter::next.

A concrete example from 1.75.0-ish

Here is a concrete example from one of our current programs.

The originating Rust code is (simplified):

        let mut argsbuf = [0; 8];  // length actually a constant but I've inlined it
        // ... things happen ...
        let arglen = an_inlined_operation();
        // ... things happen ...
        &argsbuf[..arglen]

Here's the ARMv7E-M code as produced by objdump from the output of nightly-2023-12-21 (chosen to be approximately equivalent to 1.75.0).

 80071ee:       2601            movs    r6, #1  ; included for reasons I'll explain in a bit
    ; unrelated things happen...
core::array::<impl core::ops::index::Index<I> for [T; N]>::index:
/rustc/5ac4c8a63ee305742071ac6dd11817f7c24adce2/library/core/src/array/mod.rs:348
 8007202:       4640            mov     r0, r8
 8007204:       2108            movs    r1, #8
 8007206:       4632            mov     r2, r6
 8007208:       f000 fb5b       bl      80078c2 <core::slice::index::<impl core::ops::index::Index<I> for [T]>::index>

r6 here holds the length being produced by the inlined operation; along all code paths the length it produces is either 0, 1, or 8, all literal constants. I've included the simplest one here, which sets it to literal 1.

Given that we are indexing a fixed-length 8-element array normally rustc would be very good about eliding the bounds check, since 0, 1, and 8 are all valid options here.

Instead we have the argument setup and a call to the non-inlined index function, which contains a bounds check and conditional panic:

080078c2 <core::slice::index::<impl core::ops::index::Index<I> for [T]>::index>:
 80078c2:       428a            cmp     r2, r1
 80078c4:       bf9c            itt     ls
 80078c6:       4611            movls   r1, r2
 80078c8:       4770            bxls    lr
 80078ca:       b580            push    {r7, lr}
 80078cc:       466f            mov     r7, sp
 80078ce:       4610            mov     r0, r2
 80078d0:       f7ff ff50       bl      8007774 <core::slice::index::slice_end_index_len_fail>

Impact

In many of our programs, this has a pretty significant impact. In at least one case, opt-level=3 winds up being smaller because of more aggressive inlining causing checks to be elided. Because panic sites are relatively expensive -- we format and record panic messages, so they pull in formatting machinery that might not otherwise be needed -- in programs that otherwise contain no panics, a shift in the inlining decisions around something like index can cause a program's size to grow by 4 kiB pretty easily.

For context, that's a 50-100% size increase on most of our programs.

A possibly clumsy solution

So, for the specific cases I described above -- indexing arrays with usizes or ranges of usizes, and core::slice::Iter -- does it ever make sense to not inline these routines? I'm wondering if folks would be open to me going through and carefully applying some inline(always) attributes in this neighborhood. (This might not be sufficient for the Enumerate<T> case but could definitely be applied to the array/slice Index impls.)

If so, let me know what tests you'd like me to perform on the result. I would obviously measure the impact on various Hubris-based firmware, but perhaps there are other contexts where this could be a bad thing that you'd want me to check.

My LLVM experience is limited and rusty, so there might be a cleverer thing that could be done in the compiler, but I wouldn't know what to suggest.

rustc version

The examples above were generated by:

rustc 1.76.0-nightly (5ac4c8a63 2023-12-20)
binary: rustc
commit-hash: 5ac4c8a63ee305742071ac6dd11817f7c24adce2
commit-date: 2023-12-20
host: x86_64-unknown-linux-gnu
release: 1.76.0-nightly
LLVM version: 17.0.6

but I've been observing this behavior consistently since at least 2021, and just wasn't sure if an upstream bug report made sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-bugCategory: This is a bug.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.S-has-mcveStatus: A Minimal Complete and Verifiable Example has been found for this issueT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions