Skip to content

Iterator Default Implementation for position() is slow #119551

Closed
@marthadev

Description

@marthadev

The compiler struggles to optimize the current default implementation for position on Iterators.
Simplifying the implementation has increased the efficiency in various scenarios OMM:

Old position():

    #[inline]
    fn check<T>(
        mut predicate: impl FnMut(T) -> bool,
    ) -> impl FnMut(usize, T) -> ControlFlow<usize, usize> {
        #[rustc_inherit_overflow_checks]
        move |i, x| {
            if predicate(x) { ControlFlow::Break(i) } else { ControlFlow::Continue(i + 1) }
        }
    }

    self.try_fold(0, check(predicate)).break_value()

Simplified version (There are several alternative implementations producing similar results as well):

    let mut position = 0;
    while let Some(x) = self.next() {
        if predicate(x) {
            return Some(position);
        }
        position += 1;
    }
    None

Bench results:

position            fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ arcstr_new       103.4 µs      │ 134.8 µs      │ 105.4 µs      │ 106.4 µs      │ 118     │ 944
├─ arcstr_old       103.8 µs      │ 143.7 µs      │ 105.6 µs      │ 107.1 µs      │ 117     │ 936
├─ big_range_new    33.33 ns      │ 25.79 µs      │ 34.95 ns      │ 36.06 ns      │ 642049  │ 642049
├─ big_range_old    805 µs        │ 893.4 µs      │ 828.3 µs      │ 830 µs        │ 121     │ 121
├─ small_range_new  547.3 ns      │ 60.17 µs      │ 622 ns        │ 659 ns        │ 128056  │ 128056
├─ small_range_old  3.174 µs      │ 54.47 µs      │ 3.195 µs      │ 3.298 µs      │ 28676   │ 28676
├─ u32_new          109.7 µs      │ 125.7 µs      │ 111.6 µs      │ 112.2 µs      │ 112     │ 896
├─ u32_old          245 µs        │ 261.4 µs      │ 247.6 µs      │ 248.5 µs      │ 100     │ 800
├─ u64_new          110.5 µs      │ 147.9 µs      │ 112.8 µs      │ 114.6 µs      │ 109     │ 872
├─ u64_old          255 µs        │ 378.5 µs      │ 258.4 µs      │ 264.1 µs      │ 100     │ 800
├─ u8_new           10.86 µs      │ 34.03 µs      │ 11.49 µs      │ 11.71 µs      │ 1064    │ 8512
╰─ u8_old           17.71 µs      │ 28.99 µs      │ 18.13 µs      │ 18.18 µs      │ 686     │ 5488
bench.rs
use std::sync::Arc;
use once_cell::sync::Lazy;

fn main() {
    divan::main();
}

static VECSTRING: Lazy<Vec<Arc<str>>> =
    Lazy::new(|| Vec::from_iter((0..1_000).map(|x| x.to_string().into())));
static VEC64: Lazy<Vec<u64>> = Lazy::new(|| Vec::from_iter(0..10_000));
static VEC32: Lazy<Vec<u32>> = Lazy::new(|| Vec::from_iter(0..10_000));
static VEC8: Lazy<Vec<u8>> = Lazy::new(|| Vec::from_iter(0..=255));

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u32_old() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC32)
                .iter()
                .copied()
                .position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u32_new() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC32).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u64_old() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC64)
                .iter()
                .copied()
                .position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u64_new() {
    for x in (0..10_000).step_by(100) {
        assert_eq!(
            divan::black_box(&VEC64).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u8_old() {
    for x in 0..=255 {
        assert_eq!(
            divan::black_box(&VEC8).iter().copied().position(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn u8_new() {
    for x in 0..=255 {
        assert_eq!(
            divan::black_box(&VEC8).iter().copied().locate(|s| s == x),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn arcstr_old() {
    for x in (0..1_000).step_by(10) {
        let xstr = x.to_string();
        assert_eq!(
            divan::black_box(&VECSTRING)
                .iter()
                .clone()
                .position(|s| **s == *xstr.as_str()),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 8)]
fn arcstr_new() {
    for x in (0..1_000).step_by(10) {
        let xstr = x.to_string();
        assert_eq!(
            divan::black_box(&VECSTRING)
                .iter()
                .clone()
                .locate(|s| **s == *xstr.as_str()),
            Some(x as usize)
        );
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn big_range_old() {
    for x in (0..1_000_000).step_by(100_000) {
        assert_eq!(
            divan::black_box(0..1_000_000).position(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn big_range_new() {
    for x in (0..1_000_000).step_by(100_000) {
        assert_eq!(
            divan::black_box(0..1_000_000).locate(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn small_range_old() {
    for x in 0..100 {
        assert_eq!(
            divan::black_box(0..100).position(|s| s == x),
            Some(x as usize)
        )
    }
}

#[divan::bench(min_time = 0.1, sample_size = 1)]
fn small_range_new() {
    for x in 0..100 {
        assert_eq!(
            divan::black_box(0..100).locate(|s| s == x),
            Some(x as usize)
        )
    }
}

trait Locate {
    type Item;
    fn locate<P>(&mut self, predicate: P) -> Option<usize>
    where
        P: FnMut(Self::Item) -> bool;
}

impl<T, U> Locate for U
where
    U: Iterator<Item = T>,
{
    type Item = T;

    #[inline]
    fn locate<P>(&mut self, mut predicate: P) -> Option<usize>
    where
        P: FnMut(Self::Item) -> bool,
    {
        let mut position = 0;
        while let Some(x) = self.next() {
            if predicate(x) {
                return Some(position);
            }
            position += 1;
        }
        None
    }
}

Benchmarks are named after the iterator and were run using Divan.
Ranges benefit massively (big ones by multiple orders of magnitude), while for other types, speedups are usually between 0% and 100%.

A specific implementation for range calculating the position would of course be even faster, but I don't think position() is called that often on ranges.

The implementation is similar to the slice::iter::Iter one, which was introduced to reduce compile times #72166, so perhaps this would help in that regard as well.

However, the compiler appears to occasionally forget how to optimize the function after editing unrelated code. This also happens with the current implementation, so I don't know how reproducible the speedups are (FWIW Ranges have never failed to improve, though).

I did not attempt to benchmark compilation speed, but according to godbolt the memory usage is down from 11.8 MB to 8.3 MB which looks promising.

I was motivated by a thread on reddit, where some context and speculation can be found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-iteratorsArea: IteratorsI-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions