Closed
Description
I tried this code, which contains 3 functions which check if all the bits in a u64 are all ones:
#[no_mangle]
fn ne_bytes(input: u64) -> bool {
let bytes = input.to_ne_bytes();
bytes.iter().all(|x| *x == !0)
}
#[no_mangle]
fn black_box_ne_bytes(input: u64) -> bool {
let bytes = input.to_ne_bytes();
let bytes = std::hint::black_box(bytes);
bytes.iter().all(|x| *x == !0)
}
#[no_mangle]
fn direct(input: u64) -> bool {
input == !0
}
I expected to see this happen: ne_bytes()
should be optimized to the same thing as direct()
, while black_box_ne_bytes()
should be optimized slightly worse
Instead, this happened: I got the following assembly, where ne_bytes()
is somehow optimized worse than black_box_ne_bytes()
ne_bytes:
mov rax, rdi
not rax
shl rax, 8
sete cl
shr rdi, 56
cmp edi, 255
setae al
and al, cl
ret
black_box_ne_bytes:
mov qword ptr [rsp - 8], rdi
lea rax, [rsp - 8]
cmp qword ptr [rsp - 8], -1
sete al
ret
direct:
cmp rdi, -1
sete al
ret
Meta
Reproducible on godbolt with stable rustc 1.82.0 (f6e511eec 2024-10-15)
and nightly rustc 1.85.0-nightly (7db7489f9 2024-11-25)
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Category: An issue highlighting optimization opportunities or PRs implementing suchCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.Relevant to the compiler team, which will review and decide on the PR/issue.