Iterator Default Implementation for position() is slow

The compiler struggles to optimize the [current default implementation](https://doc.rust-lang.org/beta/src/core/iter/traits/iterator.rs.html#3091) for position on Iterators.
Simplifying the implementation has increased the efficiency in various scenarios OMM:

Old position():

        #[inline]
        fn check<T>(
            mut predicate: impl FnMut(T) -> bool,
        ) -> impl FnMut(usize, T) -> ControlFlow<usize, usize> {
            #[rustc_inherit_overflow_checks]
            move |i, x| {
                if predicate(x) { ControlFlow::Break(i) } else { ControlFlow::Continue(i + 1) }
            }
        }

        self.try_fold(0, check(predicate)).break_value()

Simplified version (There are several alternative implementations producing similar results as well):

        let mut position = 0;
        while let Some(x) = self.next() {
            if predicate(x) {
                return Some(position);
            }
            position += 1;
        }
        None

Bench results:

```
position            fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ arcstr_new       103.4 µs      │ 134.8 µs      │ 105.4 µs      │ 106.4 µs      │ 118     │ 944
├─ arcstr_old       103.8 µs      │ 143.7 µs      │ 105.6 µs      │ 107.1 µs      │ 117     │ 936
├─ big_range_new    33.33 ns      │ 25.79 µs      │ 34.95 ns      │ 36.06 ns      │ 642049  │ 642049
├─ big_range_old    805 µs        │ 893.4 µs      │ 828.3 µs      │ 830 µs        │ 121     │ 121
├─ small_range_new  547.3 ns      │ 60.17 µs      │ 622 ns        │ 659 ns        │ 128056  │ 128056
├─ small_range_old  3.174 µs      │ 54.47 µs      │ 3.195 µs      │ 3.298 µs      │ 28676   │ 28676
├─ u32_new          109.7 µs      │ 125.7 µs      │ 111.6 µs      │ 112.2 µs      │ 112     │ 896
├─ u32_old          245 µs        │ 261.4 µs      │ 247.6 µs      │ 248.5 µs      │ 100     │ 800
├─ u64_new          110.5 µs      │ 147.9 µs      │ 112.8 µs      │ 114.6 µs      │ 109     │ 872
├─ u64_old          255 µs        │ 378.5 µs      │ 258.4 µs      │ 264.1 µs      │ 100     │ 800
├─ u8_new           10.86 µs      │ 34.03 µs      │ 11.49 µs      │ 11.71 µs      │ 1064    │ 8512
╰─ u8_old           17.71 µs      │ 28.99 µs      │ 18.13 µs      │ 18.18 µs      │ 686     │ 5488
```


<details>
<summary> bench.rs </summary>
<br>

```
use std::sync::Arc;
use once_cell::sync::Lazy;

fn main() {
    divan::main();
}

static VECSTRING: Lazy<Vec<Arc<str>>> =
    Lazy::new(|| Vec::from_iter((0..1_000).map(|x| x.to_string().into())));
static VEC64: Lazy<Vec<u64>> = Lazy::new(|| Vec::from_iter(0..10_000));
static VEC32: Lazy<Vec<u32>> = Lazy::new(|| Vec::from_iter(0..10_000));
static VEC8: Lazy<Vec<u8>> = Lazy::new(|| Vec::from_iter(0..=255));

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u32_old() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC32)
                .iter()
                .copied()
                .position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u32_new() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC32).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u64_old() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC64)
                .iter()
                .copied()
                .position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u64_new() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC64).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u8_old() {
    for x in 0..=255 {
        assert_eq!(
            divan::black_box(&VEC8).iter().copied().position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u8_new() {
    for x in 0..=255 {
        assert_eq!(
            divan::black_box(&VEC8).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn arcstr_old() {
    for x in (0..1_000).step_by(10) {
        let xstr = x.to_string();
        assert_eq!(
            divan::black_box(&VECSTRING)
                .iter()
                .clone()
                .position(|s| **s == *xstr.as_str()),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn arcstr_new() {
    for x in (0..1_000).step_by(10) {
        let xstr = x.to_string();
        assert_eq!(
            divan::black_box(&VECSTRING)
                .iter()
                .clone()
                .locate(|s| **s == *xstr.as_str()),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn big_range_old() {
    for x in (0..1_000_000).step_by(100_000) {
        assert_eq!(
            divan::black_box(0..1_000_000).position(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn big_range_new() {
    for x in (0..1_000_000).step_by(100_000) {
        assert_eq!(
            divan::black_box(0..1_000_000).locate(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn small_range_old() {
    for x in 0..100 {
        assert_eq!(
            divan::black_box(0..100).position(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn small_range_new() {
    for x in 0..100 {
        assert_eq!(
            divan::black_box(0..100).locate(|s| s == x),
            Some(x as usize)
        )
    }
}

trait Locate {
    type Item;
    fn locate<P>(&mut self, predicate: P) -> Option<usize>
    where
        P: FnMut(Self::Item) -> bool;
}

impl<T, U> Locate for U
where
    U: Iterator<Item = T>,
{
    type Item = T;

    #[inline]
    fn locate<P>(&mut self, mut predicate: P) -> Option<usize>
    where
        P: FnMut(Self::Item) -> bool,
    {
        let mut position = 0;
        while let Some(x) = self.next() {
            if predicate(x) {
                return Some(position);
            }
            position += 1;
        }
        None
    }
}
```
</details>



Benchmarks are named after the iterator and were run using [Divan](https://github.com/nvzqz/divan).
Ranges benefit massively (big ones by multiple orders of magnitude), while for other types, speedups are usually between 0% and 100%.

A specific implementation for range calculating the position would of course be even faster, but I don't think position() is called that often on ranges.

The implementation is similar to the [slice::iter::Iter](https://doc.rust-lang.org/beta/src/core/slice/iter/macros.rs.html#331) one, which was introduced to reduce compile times #72166, so perhaps this would help in that regard as well.

However, the compiler appears to occasionally forget how to optimize the function after editing unrelated code. This also happens with the current implementation, so I don't know how reproducible the speedups are (FWIW Ranges have never failed to improve, though).

I did not attempt to benchmark compilation speed, but according to [godbolt](https://godbolt.org/z/YYbEb3ocM) the memory usage is down from 11.8 MB to 8.3 MB which looks promising.

I was motivated by a [thread](https://www.reddit.com/r/rust/comments/18w4yiy/why_does_iterator_copied_vs_dereference_in/) on reddit, where some context and speculation can be found.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterator Default Implementation for position() is slow #119551

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Iterator Default Implementation for position() is slow #119551

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions