Skip to content

Suboptimal code generated for alignment checks and similar via number.trailing_zeros() >= bit_count #107554

Closed
@schreter

Description

@schreter

Typically, one would write val & 7 == 0 to check whether val is aligned to 8B. However, Clippy complains and says it would be nicer to write it as val.trailing_zeros() >= 3. Although it is disputable whether this is really more readable, the problem is that the code generated is significantly worse.

For example, let's take this code:

pub fn check_align_trailing_zeros(val: usize) -> bool {
    val.trailing_zeros() >= 3
}

pub fn check_align_mask(val: usize) -> bool {
    val & 7 == 0
}

I expected to see the same optimal code generated. However, the compiler indeed generates separate instruction for trailing_zeros() instruction and additional compare, instead of a single instruction.

Code generated on x64:

example::check_align_trailing_zeros:
        test    rdi, rdi
        je      .LBB0_1
        bsf     rax, rdi
        cmp     eax, 3
        setae   al
        ret
.LBB0_1:
        mov     eax, 64
        cmp     eax, 3
        setae   al
        ret

example::check_align_mask:
        test    dil, 7
        sete    al
        ret

Code generated on ARM:

example::check_align_trailing_zeros:
        rbit    x8, x0
        clz     x8, x8
        cmp     w8, #2
        cset    w0, hi
        ret

example::check_align_mask:
        tst     x0, #0x7
        cset    w0, eq
        ret

This happens with the newest Rust 1.67 as well as with older versions and in nightly.

Checking of trailing_zeros/trailing_ones and leading_zeros/leading_ones with >/>= operators against n can be mapped to checking via a mask of n+1/n ones at the tail (for trailing_*) or head (for leading_*) of the mask word and comparing against 0 for *_zeroes (which is implicitly done and set as ZERO/EQ flag in CPU flags after the TEST operation, i.e., it boils down to a single instruction) or the mask word for *_ones (which boils down to two instructions).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions