Description
There's a lot going on inside StepBy<_, Range<_>>
's Iterator
implementation and LLVM does a reasonable job of cutting things down, but doesn't get all the way (definition inlined for context if it changes in future, and to allow easy experimentation):
#![crate_type = "lib"]
use std::mem;
struct R {
start: i32,
end: i32,
step_by: i32,
}
impl Iterator for R
{
type Item = i32;
fn next(&mut self) -> Option<i32> {
let rev = self.step_by < 0;
if (rev && self.start > self.end) ||
(!rev && self.start < self.end)
{
match self.start.checked_add(self.step_by) { // inlined & stable version of <i32 as std::iter::Step>::step
Some(mut n) => {
mem::swap(&mut self.start, &mut n);
Some(n)
},
None => {
let mut n = self.end.clone();
mem::swap(&mut self.start, &mut n);
Some(n)
}
}
} else {
None
}
}
}
pub fn iterator() -> usize {
R { start: 0, end: 100000, step_by: 2 }.count()
}
pub fn while_() -> usize {
let mut count = 0;
let mut i = 0;
while i < 100000 {
count += 1;
i += 2;
}
count
}
Optimised asm (it'd be great for the first to be like the second):
_ZN8iterator20h37c73f52cfd206d4LbaE:
.cfi_startproc
xorl %eax, %eax
movl $100000, %ecx
xorl %edx, %edx
.align 16, 0x90
.LBB1_1:
addl $2, %edx
cmovol %ecx, %edx
incq %rax
cmpl $100000, %edx
jl .LBB1_1
retq
_ZN6while_20h2b6d290fcc3d36d0UbaE:
.cfi_startproc
movl $50000, %eax
retq
https://play.rust-lang.org/?gist=a926869a4cf59d6683c4
#24660 previously had a somewhat similar problem, although this one is compounded by using checked_add
implemented in terms of LLVM's overflow intrinsics, which the LLVM performance tips explicitly recommend against:
Avoid using arithmetic intrinsics unless you are required by your source language specification to emit a particular code sequence. The optimizer is quite good at reasoning about general control flow and arithmetic, it is not anywhere near as strong at reasoning about the various intrinsics. If profitable for code generation purposes, the optimizer will likely form the intrinsics itself late in the optimization pipeline. It is very rarely profitable to emit these directly in the language frontend. This item explicitly includes the use of the overflow intrinsics.