Inefficient codegen collecting slice iterator into array

While working on [`Itertools::collect_array`](https://github.com/rust-itertools/itertools/pull/560) for `itertools` I wanted to compare the efficiency of

```rust
pub fn try_into(sl: &[u32]) -> Option<[u32; N]> {
    Some(*<&[u32; N]>::try_from(sl.get(0..N)?).unwrap())
}

pub fn collect_array(sl: &[u32]) -> Option<[u32; N]> {
    sl.iter().copied().collect_array()
}
```

You can see my comparison here: https://rust.godbolt.org/z/qPW3K8aTx.

I was rather shocked, both look good for `N = 4`, but for `N = 16` we see the following nice implementation for `try_into`:

```asm
try_into:
        mov     rax, rdi
        xor     ecx, ecx
        cmp     rdx, 15
        jbe     .LBB0_2
        vmovups ymm0, ymmword ptr [rsi + 32]
        vmovups ymmword ptr [rax + 36], ymm0
        vmovups ymm0, ymmword ptr [rsi]
        vmovups ymmword ptr [rax + 4], ymm0
        mov     ecx, 1
.LBB0_2:
        mov     dword ptr [rax], ecx
        vzeroupper
        ret
```

But the following abomination for `collect_array`:

```asm
collect_array:
        mov     rax, rdi
        xor     ecx, ecx
        test    rdx, rdx
        je      .LBB1_14
        cmp     rdx, 1
        je      .LBB1_14
        cmp     rdx, 2
        je      .LBB1_14
        cmp     rdx, 3
        je      .LBB1_14
        cmp     rdx, 4
        je      .LBB1_14
        cmp     rdx, 5
        je      .LBB1_14
        cmp     rdx, 6
        je      .LBB1_14
        cmp     rdx, 7
        je      .LBB1_14
        cmp     rdx, 8
        je      .LBB1_14
        cmp     rdx, 9
        je      .LBB1_14
        cmp     rdx, 10
        je      .LBB1_14
        cmp     rdx, 11
        je      .LBB1_14
        and     rdx, -4
        cmp     rdx, 12
        je      .LBB1_14
        push    rbp
        push    r15
        push    r14
        push    r12
        push    rbx
        mov     ecx, dword ptr [rsi]
        mov     edx, dword ptr [rsi + 4]
        mov     edi, dword ptr [rsi + 8]
        mov     r8d, dword ptr [rsi + 12]
        mov     r9d, dword ptr [rsi + 16]
        mov     r10d, dword ptr [rsi + 20]
        mov     r11d, dword ptr [rsi + 24]
        mov     ebx, dword ptr [rsi + 28]
        mov     ebp, dword ptr [rsi + 32]
        mov     r14d, dword ptr [rsi + 36]
        mov     r15d, dword ptr [rsi + 40]
        mov     r12d, dword ptr [rsi + 44]
        mov     dword ptr [rax + 4], ecx
        mov     dword ptr [rax + 8], edx
        mov     dword ptr [rax + 12], edi
        mov     dword ptr [rax + 16], r8d
        mov     dword ptr [rax + 20], r9d
        mov     dword ptr [rax + 24], r10d
        mov     dword ptr [rax + 28], r11d
        mov     dword ptr [rax + 32], ebx
        mov     dword ptr [rax + 36], ebp
        mov     dword ptr [rax + 40], r14d
        mov     dword ptr [rax + 44], r15d
        mov     dword ptr [rax + 48], r12d
        vmovups xmm0, xmmword ptr [rsi + 48]
        vmovups xmmword ptr [rax + 52], xmm0
        mov     ecx, 1
        pop     rbx
        pop     r12
        pop     r14
        pop     r15
        pop     rbp
.LBB1_14:
        mov     dword ptr [rax], ecx
        ret
```

While it not folding the consecutive address `mov`s into efficient SIMD `mov`s is disappointing, I would argue that there's probably a bug somewhere since it compares `rdx` **THIRTEEN TIMES IN A ROW** to ultimately just check if it is `< 16`.

This doesn't appear to be a recent regression, the same happens in 1.60 through nightly, and it happens on both x86 as well as ARM.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient codegen collecting slice iterator into array #126000

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient codegen collecting slice iterator into array #126000

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions