Description
I tried this code:
#![feature(portable_simd)]
use std::simd::prelude::*;
static TABLE: [u8; 32] = *b"abcdefghijklmnopqrstuvwxyz012345";
pub fn x(input_indices: [u8; 4]) -> [u8; 4] {
let mut indices = [0u8; 32];
indices[0..4].copy_from_slice(&input_indices);
let indices = u8x32::from_array(indices);
let table = u8x32::from_array(TABLE);
let buf = table.swizzle_dyn(indices);
buf.to_array()[0..4].try_into().unwrap()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(&x([2, 14, 14, 11]), b"cool");
}
}
I expected to see this happen: The test should pass.
Instead, this happened: The test fails when run with RUSTFLAGS="-C target-cpu=icelake-client" cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu
. But it passes with, say, target-cpu=x86-64
, which points to this being a problem with the code generated for this specific cpu.
In my analysis, the failure happens because swizzle_dyn
ends up calling the _mm256_permutexvar_epi8
intrinsic with the arguments in reverse order. According to the docs, the first parameter is the index vector, but in swizzle_dyn
(via transize), the index vector is passed as the second argument. Oops!
This analysis is corroborated by the fact that I can make it work by calling said intrinsic directly :)
Meta
rustc --version --verbose
:
rustc 1.77.0-nightly (3cdd004e5 2023-12-29)
binary: rustc
commit-hash: 3cdd004e55c869faa2b7b25efd3becf50346e7d6
commit-date: 2023-12-29
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6