Description
Typically, one would write val & 7 == 0
to check whether val
is aligned to 8B. However, Clippy complains and says it would be nicer to write it as val.trailing_zeros() >= 3
. Although it is disputable whether this is really more readable, the problem is that the code generated is significantly worse.
For example, let's take this code:
pub fn check_align_trailing_zeros(val: usize) -> bool {
val.trailing_zeros() >= 3
}
pub fn check_align_mask(val: usize) -> bool {
val & 7 == 0
}
I expected to see the same optimal code generated. However, the compiler indeed generates separate instruction for trailing_zeros()
instruction and additional compare, instead of a single instruction.
Code generated on x64:
example::check_align_trailing_zeros:
test rdi, rdi
je .LBB0_1
bsf rax, rdi
cmp eax, 3
setae al
ret
.LBB0_1:
mov eax, 64
cmp eax, 3
setae al
ret
example::check_align_mask:
test dil, 7
sete al
ret
Code generated on ARM:
example::check_align_trailing_zeros:
rbit x8, x0
clz x8, x8
cmp w8, #2
cset w0, hi
ret
example::check_align_mask:
tst x0, #0x7
cset w0, eq
ret
This happens with the newest Rust 1.67 as well as with older versions and in nightly.
Checking of trailing_zeros
/trailing_ones
and leading_zeros
/leading_ones
with >
/>=
operators against n
can be mapped to checking via a mask of n+1
/n
ones at the tail (for trailing_*
) or head (for leading_*
) of the mask word and comparing against 0 for *_zeroes
(which is implicitly done and set as ZERO/EQ flag in CPU flags after the TEST operation, i.e., it boils down to a single instruction) or the mask word for *_ones
(which boils down to two instructions).