Closed
Description
I tried this code:
#![feature(stdsimd)]
use std::arch::arm::*;
pub fn copysign(from: float32x4_t, to: float32x4_t) -> float32x4_t {
unsafe {
let mask = vdupq_n_u32(0x8000_0000);
vbslq_f32(mask, from, to)
}
}
using this Cargo configuration
[build]
target = "armv7-unknown-linux-gnueabihf"
[target.armv7-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc"
rustflags = ["-Ctarget-feature=+neon"]
I expected to see this happen: the generated code includes a vbsl
/vbit
/vbif
instruction, i.e., like Clang's output for an equivalent C function
copysign(__simd128_float32_t, __simd128_float32_t):
vmov.i32 q8, #0x80000000
vbif q0, q1, q8
bx lr
Instead, this happened: The function is optimized to returning to
:
00000000 <neon_test::copysign>:
0: f4620acf vld1.64 {d16-d17}, [r2]
4: f4400acf vst1.64 {d16-d17}, [r0]
8: e12fff1e bx lr
We discussed this issue on Zulip, and it appears that all NEON vbsl*_*
intrinsics are implemented using simd_select
which does lane selection instead of bitwise selection. The issue affects both aarch64
and armv7
targets.
Meta
rustc --version --verbose
:
rustc 1.56.0-nightly (1f0a591b3 2021-07-30)
binary: rustc
commit-hash: 1f0a591b3a5963a0ab11a35dc525ad9d46f612e4
commit-date: 2021-07-30
host: x86_64-pc-windows-msvc
release: 1.56.0-nightly
LLVM version: 12.0.1
Backtrace
PS D:\development\neon-test> $env:RUST_BACKTRACE="1"
PS D:\development\neon-test> cargo build --release
Compiling neon-test v0.1.0 (D:\development\neon-test)
Finished release [optimized] target(s) in 0.65s
Metadata
Metadata
Assignees
Labels
No labels