Skip to content

Creating an array can be made 2x faster #139875

Open
@abgros

Description

@abgros

Consider this simple function:

const SIZE: usize = 4096;

fn array_of_twos() -> [u64; SIZE] {
    [2; SIZE]
}

Because 2u64 doesn't have the same bytes throughout, the compile can't call memset and instead creates a vectorized loop.

However, from my testing, using the rep stosq instruction is over twice as fast for large arrays (more than a few hundred elements). Here is a faster version of the same function:

fn array_of_twos_faster() -> [u64; SIZE] {
    let mut arr = MaybeUninit::uninit();
    unsafe {
        asm!(
            "mov rax, 2",
            "mov rcx, {}",
            "mov rdi, {}",
            "rep stosq",
            const SIZE,
            in(reg) arr.as_mut_ptr(),
            lateout("rax") _, lateout("rdi") _, lateout("rcx") _,
            options(nostack, preserves_flags)
        );
        arr.assume_init()
    }
}

Benchmarking both with Criterion:

normal                  time:   [1.5435 µs 1.5465 µs 1.5501 µs]
                        change: [-3.1683% -2.2863% -1.4243%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

rep stosq               time:   [633.94 ns 636.36 ns 639.77 ns]
                        change: [-2.2975% -1.8986% -1.4693%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Compare both of them on Godbolt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions