Not inlining array/slice access and iteration at opt-level=z causes knock-on effects on size

# Summary

Hi! We've been seeing some surprising codegen decisions in Hubris programs for a while, and I wanted to surface it upstream in case there's anything to be done.

We build primarily at `opt-level="z"`, though many of these behaviors seem to happen at `opt-level="s"` too. (`z` code is very slightly smaller.)

There are a handful of routines in `core` that rustc often decides not to inline which have significant knock-on effects on LLVM's ability to reason about panics, integer ranges for bounds checks, and the like. The problem primarily arises with:

- `<Enumerate<core::slice::iter::Iter<T>> as Iterator>>::next` - not inlining this appears to hamper the ability of other parts of the compiler to notice that the enumerated indices it's producing are in a predictable range
- `<Index<usize> for [T]>::index` or `<Index<Range<usize>> for [T]>::index` - the non-inlined version of this does an unconditional bounds check, when in many cases there is information available at the callsite to eliminate it were it inlined
- More rarely, but often enough, `<core::slice::Iter<T> as Iterator>::next`

All of these routines, when inlined, tend to produce either one or zero machine instructions -- one in the case of `index` which is typically an indexed addressing mode load, zero in the case of `Enumerate::next` where the information is often already available in the calling frame for other reasons, and generally one in the case of `slice::Iter::next`.

# A concrete example from 1.75.0-ish

Here is a concrete example from one of our current programs.

The originating Rust code is (simplified):

```rust
        let mut argsbuf = [0; 8];  // length actually a constant but I've inlined it
        // ... things happen ...
        let arglen = an_inlined_operation();
        // ... things happen ...
        &argsbuf[..arglen]
```

Here's the ARMv7E-M code as produced by objdump from the output of `nightly-2023-12-21` (chosen to be approximately equivalent to 1.75.0).

```
 80071ee:       2601            movs    r6, #1  ; included for reasons I'll explain in a bit
    ; unrelated things happen...
core::array::<impl core::ops::index::Index<I> for [T; N]>::index:
/rustc/5ac4c8a63ee305742071ac6dd11817f7c24adce2/library/core/src/array/mod.rs:348
 8007202:       4640            mov     r0, r8
 8007204:       2108            movs    r1, #8
 8007206:       4632            mov     r2, r6
 8007208:       f000 fb5b       bl      80078c2 <core::slice::index::<impl core::ops::index::Index<I> for [T]>::index>
```

`r6` here holds the length being produced by the inlined operation; along all code paths the length it produces is either 0, 1, or 8, all literal constants. I've included the simplest one here, which sets it to literal 1.

Given that we are indexing a fixed-length 8-element array normally rustc would be very good about eliding the bounds check, since 0, 1, and 8 are all valid options here.

Instead we have the argument setup and a call to the non-inlined index function, which contains a bounds check and conditional panic:

```
080078c2 <core::slice::index::<impl core::ops::index::Index<I> for [T]>::index>:
 80078c2:       428a            cmp     r2, r1
 80078c4:       bf9c            itt     ls
 80078c6:       4611            movls   r1, r2
 80078c8:       4770            bxls    lr
 80078ca:       b580            push    {r7, lr}
 80078cc:       466f            mov     r7, sp
 80078ce:       4610            mov     r0, r2
 80078d0:       f7ff ff50       bl      8007774 <core::slice::index::slice_end_index_len_fail>
```

# Impact

In many of our programs, this has a pretty significant impact. In at least one case, `opt-level=3` winds up being smaller because of more aggressive inlining causing checks to be elided. Because panic sites are relatively expensive -- we format and record panic messages, so they pull in formatting machinery that might not otherwise be needed -- in programs that otherwise contain no panics, a shift in the inlining decisions around something like `index` can cause a program's size to grow by 4 kiB pretty easily.

For context, that's a 50-100% size increase on most of our programs.

# A possibly clumsy solution

So, for the specific cases I described above -- indexing arrays with usizes or ranges of usizes, and `core::slice::Iter` -- does it _ever_ make sense to not inline these routines? I'm wondering if folks would be open to me going through and carefully applying some `inline(always)` attributes in this neighborhood. (This might not be sufficient for the `Enumerate<T>` case but could definitely be applied to the array/slice Index impls.)

If so, let me know what tests you'd like me to perform on the result. I would obviously measure the impact on various Hubris-based firmware, but perhaps there are other contexts where this could be a bad thing that you'd want me to check.

My LLVM experience is limited and rusty, so there might be a cleverer thing that could be done in the compiler, but I wouldn't know what to suggest.

# rustc version

The examples above were generated by:

```
rustc 1.76.0-nightly (5ac4c8a63 2023-12-20)
binary: rustc
commit-hash: 5ac4c8a63ee305742071ac6dd11817f7c24adce2
commit-date: 2023-12-20
host: x86_64-unknown-linux-gnu
release: 1.76.0-nightly
LLVM version: 17.0.6
```

but I've been observing this behavior consistently since at least 2021, and just wasn't sure if an upstream bug report made sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not inlining array/slice access and iteration at opt-level=z causes knock-on effects on size #123345

Summary

A concrete example from 1.75.0-ish

Impact

A possibly clumsy solution

rustc version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not inlining array/slice access and iteration at opt-level=z causes knock-on effects on size #123345

Description

Summary

A concrete example from 1.75.0-ish

Impact

A possibly clumsy solution

rustc version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions