Description
I tried getting an AVX-512 intrinsic to work and ran into a bunch of difficulties. Some points:
-
It looks like the combination of AVX512's masks and AVX512VL (which lets AVX512 instructions operate on 128/256bit vectors) means that for most instructions there's one C intrinsic for each of {no mask, write mask, zero mask} x {xmm, ymm, zmm}.
-
These would probably be good to generate with a macro?
-
Because AVX512 uses mask registers, the
constify!
macro hacks are probably not needed for mask instructions. -
The list of intrinsics linked in the readme doesn't seem to have non-masked versions; I don't know if this is just an accident of how it was made.
-
Trying to use the
int_x86_avx512_mask_pmul_dq_512
intrinsic from that list using
#[link_name = "llvm.x86.avx512.mask.pmul.dq.512"]
fn mask_pmul_dq_512(a: i32x16, b: i32x16, src: i64x8, k: i8) -> i64x8;
didn't work, failing with
rustc: /checkout/src/llvm/include/llvm/Support/Casting.h:236: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::VectorType; Y = llvm::Type; typename llvm::cast_retty<X, Y*>::ret_type = llvm::VectorType*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
which I guess means I was linking to the intrinsic incorrectly?
@alexcrichton reduced to this minimal example for the vpmuldq
instruction: https://godbolt.org/g/VMCtYy and found https://github.com/rust-lang/rust/blob/4c053db233d69519b548e5b8ed7192d0783e582a/src/librustc_trans/cabi_x86_64.rs#L30-L31 which hardcodes the biggest vector as 256 bits (the size of a ymm register).