Skip to content

Miscompilation of vbsl*_* NEON intrinsics #1191

Closed
@l0calh05t

Description

@l0calh05t

I tried this code:

#![feature(stdsimd)]

use std::arch::arm::*;
pub fn copysign(from: float32x4_t, to: float32x4_t) -> float32x4_t {
    unsafe {
        let mask = vdupq_n_u32(0x8000_0000);
        vbslq_f32(mask, from, to)
    }
}

using this Cargo configuration

[build]
target = "armv7-unknown-linux-gnueabihf"

[target.armv7-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc"
rustflags = ["-Ctarget-feature=+neon"]

I expected to see this happen: the generated code includes a vbsl/vbit/vbif instruction, i.e., like Clang's output for an equivalent C function

copysign(__simd128_float32_t, __simd128_float32_t):
        vmov.i32        q8, #0x80000000
        vbif    q0, q1, q8
        bx      lr

Instead, this happened: The function is optimized to returning to:

00000000 <neon_test::copysign>:
   0:   f4620acf        vld1.64 {d16-d17}, [r2]
   4:   f4400acf        vst1.64 {d16-d17}, [r0]
   8:   e12fff1e        bx      lr

We discussed this issue on Zulip, and it appears that all NEON vbsl*_* intrinsics are implemented using simd_select which does lane selection instead of bitwise selection. The issue affects both aarch64 and armv7 targets.

Meta

rustc --version --verbose:

rustc 1.56.0-nightly (1f0a591b3 2021-07-30)
binary: rustc
commit-hash: 1f0a591b3a5963a0ab11a35dc525ad9d46f612e4
commit-date: 2021-07-30
host: x86_64-pc-windows-msvc
release: 1.56.0-nightly
LLVM version: 12.0.1
Backtrace

PS D:\development\neon-test> $env:RUST_BACKTRACE="1"
PS D:\development\neon-test> cargo build --release
   Compiling neon-test v0.1.0 (D:\development\neon-test)
    Finished release [optimized] target(s) in 0.65s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions