Skip to content

Suboptimal codegen for potential [T; N]::zip() #79754

Closed
@cynecx

Description

@cynecx

Code taken from #79451.

#![feature(min_const_generics, array_value_iter)]

use std::array::IntoIter;
use std::mem::MaybeUninit;

pub fn zip<T, U, const N: usize>(lhs: [T; N], rhs: [U; N]) -> [(T, U); N] {
    let mut dst = MaybeUninit::<[(T, U); N]>::uninit();
    let ptr = dst.as_mut_ptr() as *mut (T, U);
    for (idx, (lhs, rhs)) in IntoIter::new(lhs).zip(IntoIter::new(rhs)).enumerate() {
        unsafe { ptr.add(idx).write((lhs, rhs)) }
    }
    unsafe { dst.assume_init() }
}

pub fn zip_8xu64(lhs: [u64; 8], rhs: [u64; 8]) -> [(u64, u64); 8] {
    zip(lhs, rhs)
}

Godbolt (llvm-ir / asm): https://godbolt.org/z/Yq7W98

It seems that llvm is unable to eliminate the memcpys and thus results in suboptimal code.

Also there are dead stores which haven't been eliminated as well:

store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.5.0..sroa_idx33, align 8
store i64 8, i64* %_7.sroa.0.sroa.5.0._7.sroa.0.0..sroa_cast.sroa_idx106.i, align 8
store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.4.0..sroa_idx31, align 8
store i64 8, i64* %_7.sroa.0.sroa.4.0._7.sroa.0.0..sroa_cast.sroa_idx104.i, align 8

A not quite equivalent c++ example produces "optimal" code where no memcpy/dead stores occurs: https://godbolt.org/z/sdfa13

EDIT:

On second thought, I'd assume that LLVM's GVN pass should have eliminated the memcpys but it seems that this isn't supported?

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions