Description
All enum discriminants are statically known, but the optimizer appears not to be able to introspect their max values when they are cast to primitives and used as indices, probably because at the LLVM level, they only ever are primitives. LLVM is never told their maximum value. This means that when using enum discriminants to index arrays/vectors whose lengths are also statically known, bounds checks are emitted even when one would reasonably expect them to be elided.
Minimal example (Playground):
#![feature(test)]
extern crate test;
use test::black_box;
#[repr(usize)]
enum Index {
Zero = 0,
One = 1,
}
#[inline(never)]
fn idx_cast(vals: &[i32; 2], idx: Index) {
black_box(vals[idx as usize]);
}
fn main() {
let vals = black_box([0, 1]);
idx_cast(&vals, black_box(Index::Zero));
}
If you look at the emitted assembly (in release mode), you can see that idx_cast
does a bounds-check.
If you add ::std::intrinsics::assume(idx as usize < 2)
, the bounds-check is elided and LLVM appears to use the discriminant as the index directly: https://is.gd/DdX7dd
This also works even if enum Index
is not #[repr(usize)]
; LLVM simply zero-extends the discriminant: https://is.gd/2JrJty
So as a possible optimization at codegen, whenever an enum is cast to a primitive, we could (or maybe even should) emit @llvm.assume( [discriminant value] <= [maximum discriminant value] )
.
Converting via explicit match
also allows elision of the bounds check without reintroducing a branch, but has differing behavior between enums with 2 variants and enums of more variants:
- 2 variant enums get a
use 1 if discriminant is 1, otherwise 0
treatment: https://is.gd/VS7JZK - Enums with > 2 variants directly use the discriminant (or zero-extend it if it's a different width) like the
assume
solution: https://is.gd/d8dEtY
I assume LLVM does the former in the different-width case because it uses fewer cycles than the latter, since they are otherwise functionally equivalent. However, when the enum is #[repr(usize)]
, the 2-variant version is using 3 extra, unnecessary instructions. But that seems like an LLVM problem more than a codegen problem.