StepBy<_, Range<_>> optimises poorly

There's a lot going on inside `StepBy<_, Range<_>>`'s `Iterator` implementation and LLVM does a reasonable job of cutting things down, but doesn't get all the way (definition inlined for context if it changes in future, and to allow easy experimentation):

``` rust
#![crate_type = "lib"]

use std::mem;

struct R {
    start: i32,
    end: i32,
    step_by: i32,
}

impl Iterator for R
{
    type Item = i32;

    fn next(&mut self) -> Option<i32> {
        let rev = self.step_by < 0;
        if (rev && self.start > self.end) ||
           (!rev && self.start < self.end)
        {
            match self.start.checked_add(self.step_by) { // inlined & stable version of <i32 as std::iter::Step>::step
                Some(mut n) => {
                    mem::swap(&mut self.start, &mut n);
                    Some(n)
                },
                None => {
                    let mut n = self.end.clone();
                    mem::swap(&mut self.start, &mut n);
                    Some(n)
                }
            }
        } else {
            None
        }
    }
}


pub fn iterator() -> usize {
    R { start: 0, end: 100000, step_by: 2 }.count()
}

pub fn while_() -> usize {
    let mut count = 0;
    let mut i = 0;
    while i < 100000 {
        count += 1;
        i += 2;
    }
    count
}
```

Optimised asm (it'd be great for the first to be like the second):

``` asm
_ZN8iterator20h37c73f52cfd206d4LbaE:
    .cfi_startproc
    xorl    %eax, %eax
    movl    $100000, %ecx
    xorl    %edx, %edx
    .align  16, 0x90
.LBB1_1:
    addl    $2, %edx
    cmovol  %ecx, %edx
    incq    %rax
    cmpl    $100000, %edx
    jl  .LBB1_1
    retq
```

``` asm
_ZN6while_20h2b6d290fcc3d36d0UbaE:
    .cfi_startproc
    movl    $50000, %eax
    retq
```

https://play.rust-lang.org/?gist=a926869a4cf59d6683c4
#24660 previously had a somewhat similar problem, although this one is compounded by using `checked_add` implemented in terms of LLVM's overflow intrinsics, which the [LLVM performance tips](http://llvm.org/docs/Frontend/PerformanceTips.html) explicitly recommend against:

> Avoid using arithmetic intrinsics unless you are required by your source language specification to emit a particular code sequence. The optimizer is quite good at reasoning about general control flow and arithmetic, it is not anywhere near as strong at reasoning about the various intrinsics. If profitable for code generation purposes, the optimizer will likely form the intrinsics itself late in the optimization pipeline. It is very rarely profitable to emit these directly in the language frontend. This item explicitly includes the use of the overflow intrinsics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StepBy<_, Range<_>> optimises poorly #31155

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StepBy<_, Range<_>> optimises poorly #31155

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions