Skip to content

Suboptimal codegen for snippet with Armv7 target #98157

Open
@ghost

Description

The code generated for this particular function seems quite suboptimal,

pub const fn f(n: u8) -> [u8; 4] {
    match n % 4 {
        0 => [0x0, 0x1, 0x2, 0x3],
        1 => [0x4, 0x5, 0x6, 0x7],
        2 => [0x8, 0x9, 0xA, 0xB],
        3 => [0xC, 0xD, 0xE, 0xF],
        _ => unsafe { std::hint::unreachable_unchecked() }
    }
}

From my observations, for all targets, when written as-is above, it emits a switch table and accesses memory.

For x86-64, if the inner arrays are moved into constants, the switch table is removed, and the code is replaced with arithmetic.

Side-by-side comparisons between x86-64 codegen versus armv7-linux-androideabi:
https://godbolt.org/z/ehxabaq38

Here, I was able to manually rewrite the expression into the equivalent of what LLVM emits above:
https://godbolt.org/z/qhfaqEcsf

Nothing else seemed to make the compiler emit the specific codegen.

Unknown as to whether this applies to other output targets.

@rustbot label A-LLVM I-slow

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions