Skip to content

StepBy<_, Range<_>> optimises poorly #31155

Closed
@huonw

Description

@huonw

There's a lot going on inside StepBy<_, Range<_>>'s Iterator implementation and LLVM does a reasonable job of cutting things down, but doesn't get all the way (definition inlined for context if it changes in future, and to allow easy experimentation):

#![crate_type = "lib"]

use std::mem;

struct R {
    start: i32,
    end: i32,
    step_by: i32,
}

impl Iterator for R
{
    type Item = i32;

    fn next(&mut self) -> Option<i32> {
        let rev = self.step_by < 0;
        if (rev && self.start > self.end) ||
           (!rev && self.start < self.end)
        {
            match self.start.checked_add(self.step_by) { // inlined & stable version of <i32 as std::iter::Step>::step
                Some(mut n) => {
                    mem::swap(&mut self.start, &mut n);
                    Some(n)
                },
                None => {
                    let mut n = self.end.clone();
                    mem::swap(&mut self.start, &mut n);
                    Some(n)
                }
            }
        } else {
            None
        }
    }
}


pub fn iterator() -> usize {
    R { start: 0, end: 100000, step_by: 2 }.count()
}

pub fn while_() -> usize {
    let mut count = 0;
    let mut i = 0;
    while i < 100000 {
        count += 1;
        i += 2;
    }
    count
}

Optimised asm (it'd be great for the first to be like the second):

_ZN8iterator20h37c73f52cfd206d4LbaE:
    .cfi_startproc
    xorl    %eax, %eax
    movl    $100000, %ecx
    xorl    %edx, %edx
    .align  16, 0x90
.LBB1_1:
    addl    $2, %edx
    cmovol  %ecx, %edx
    incq    %rax
    cmpl    $100000, %edx
    jl  .LBB1_1
    retq
_ZN6while_20h2b6d290fcc3d36d0UbaE:
    .cfi_startproc
    movl    $50000, %eax
    retq

https://play.rust-lang.org/?gist=a926869a4cf59d6683c4
#24660 previously had a somewhat similar problem, although this one is compounded by using checked_add implemented in terms of LLVM's overflow intrinsics, which the LLVM performance tips explicitly recommend against:

Avoid using arithmetic intrinsics unless you are required by your source language specification to emit a particular code sequence. The optimizer is quite good at reasoning about general control flow and arithmetic, it is not anywhere near as strong at reasoning about the various intrinsics. If profitable for code generation purposes, the optimizer will likely form the intrinsics itself late in the optimization pipeline. It is very rarely profitable to emit these directly in the language frontend. This item explicitly includes the use of the overflow intrinsics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions