Skip to content

array::zip in combination with array::map optimises very poorly #103555

Closed
@AuroransSolis

Description

@AuroransSolis

Today for no reason in particular I was curious about the behaviour of arrays when zipped together and then mapped into a single array, roughly:

fn zip_with<T, U, V, F: FnMut(T, U) -> V, const N: usize>(a: [T; N], b: [U; N], mut f: F) -> [V; N] {
    a.zip(b).map(|(a, b)| f(a, b))
}

I wrote some tests for this and disassembled them in Godbolt to see what was going on behind the scenes and wow was a whole lot going on that seemed like it didn't need to be. So I wrote something kinda like what's in the stdlib for mapping and zipping arrays, just without the iterators, to see if I could do any better. Turns out I absolutely can:

struct ZipWithGuard<T, U, const N: usize> {
    a: *mut [T; N],
    b: *mut [U; N],
    moved: usize,
}

impl<T, U, const N: usize> Drop for ZipWithGuard<T, U, N> {
    fn drop(&mut self) {
        for i in self.moved + 1..N {
            unsafe {
                drop(self.a.cast::<T>().add(i).read());
                drop(self.b.cast::<U>().add(i).read());
            }
        }
    }
}

pub fn zip_with_guarded<T, U, V, F: FnMut(T, U) -> V, const N: usize>(
    a: [T; N],
    b: [U; N],
    mut f: F,
) -> [V; N] {
    let aref = &a as *const _ as *mut [ManuallyDrop<T>; N];
    forget(a);
    let bref = &b as *const _ as *mut [ManuallyDrop<U>; N];
    forget(b);
    let mut guard = ZipWithGuard {
        a: aref,
        b: bref,
        moved: 0,
    };
    let mut out: MaybeUninit<[V; N]> = unsafe { MaybeUninit::uninit().assume_init() };
    let out_ptr = out.as_mut_ptr().cast::<MaybeUninit<V>>();
    while guard.moved < N {
        let i = guard.moved;
        unsafe {
            *out_ptr.add(i) = MaybeUninit::new(f(
                ptr::read(guard.a.cast::<T>().add(i)),
                ptr::read(guard.b.cast::<U>().add(i)),
            ));
        }
        guard.moved += 1;
    }
    unsafe { out.assume_init() }
}

Maybe the design isn't as optimal (or as safe) as it could be, but it performs perfectly well in benchmarks (read: as fast as an unguarded version and anywhere between 3-25* times faster than the zip_with definition shown at the top of this issue. Here's a rough table of execution times averaged over the primitive ops (+, -, *, &, |, ^):
image

Not really liking how things are looking for .zip().map(). So, question then is: is there something in Rust that needs to change to let these optimisations happen, or do I need to look at putting this in a library somewhere (or maybe stdlib wink wink nudge nudge?)?

*The 25x comes from the [u8; 8] benchmarks. zip_with benches at 11.230ns, and zip_with_guarded benches at 452.593ps, which seems suspicious but I can't seem to find anything wrong with the result.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-arrayArea: `[T; N]`A-codegenArea: Code generationA-const-genericsArea: const generics (parameters and arguments)C-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions