Creating an array can be made 2x faster

Consider this simple function:
```rs
const SIZE: usize = 4096;

fn array_of_twos() -> [u64; SIZE] {
    [2; SIZE]
}
```
Because `2u64` doesn't have the same bytes throughout, the compile can't call `memset` and instead creates a vectorized loop.

However, from my testing, using the `rep stosq` instruction is over twice as fast for large arrays (more than a few hundred elements). Here is a faster version of the same function:
```rs
fn array_of_twos_faster() -> [u64; SIZE] {
    let mut arr = MaybeUninit::uninit();
    unsafe {
        asm!(
            "mov rax, 2",
            "mov rcx, {}",
            "mov rdi, {}",
            "rep stosq",
            const SIZE,
            in(reg) arr.as_mut_ptr(),
            lateout("rax") _, lateout("rdi") _, lateout("rcx") _,
            options(nostack, preserves_flags)
        );
        arr.assume_init()
    }
}
```

Benchmarking both with Criterion:

```
normal                  time:   [1.5435 µs 1.5465 µs 1.5501 µs]
                        change: [-3.1683% -2.2863% -1.4243%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

rep stosq               time:   [633.94 ns 636.36 ns 639.77 ns]
                        change: [-2.2975% -1.8986% -1.4693%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
```

Compare both of them on [Godbolt](https://godbolt.org/z/hn7PexGGa).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating an array can be made 2x faster #139875

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Creating an array can be made 2x faster #139875

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions