Skip to content

slice::rotate_left/right is not simplified for compile-time-constant small rotates #89714

Open
@saethlin

Description

@saethlin

The current (2021-10-08 nightly) and recent implementations of slice::rotate_right produce 15x more instructions than a manual special-case implementation for a small rotate by a compile-time constant offset. This results in a 5-50% performance difference, depending on the length of the slice in question.

godbolt demonstration: https://godbolt.org/z/a4YdqP1rn

pub fn shift_after_fast(vec: &mut [usize], i: usize) {
    if i > vec.len() || (vec.len() - i) < 2 {
        return;
    }
    unsafe {
        let tmp = core::ptr::read(vec.as_ptr().add(vec.len() - 1));
        core::ptr::copy(
            vec.as_ptr().add(i),
            vec.as_mut_ptr().add(i + 1),
            vec.len() - i - 1,
        );
        core::ptr::write(vec.as_mut_ptr().add(i), tmp);
    }
}

pub fn shift_after(vec: &mut [usize], i: usize) {
    vec[i..].rotate_right(1);
}

I think the current implementation doesn't optimize well because it is a loop { surrounding three rotate algorithms with their own predicates, the first of which is almost never known at compile time:

if (left + right < 24) || (mem::size_of::<T>() > mem::size_of::<[usize; 4]>()) {

and it is the second algorithm that we want to run in this case:

} else if cmp::min(left, right) <= mem::size_of::<BufType>() / mem::size_of::<T>() {

Simply pasting a copy of the second algorithm above the loop removes the performance difference and most but not all of the code size difference.

cc @scottmcm I've finally gotten around to reporting this after months

rustc 1.57.0-nightly (54bb4fec6 2021-10-08)
binary: rustc
commit-hash: 54bb4fec68cb592e23077896baea072919721573
commit-date: 2021-10-08
host: x86_64-unknown-linux-gnu
release: 1.57.0-nightly
LLVM version: 13.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions