Closed
Description
define dso_local <64 x i8> @compress(<64 x i8> %0, i64 %1) local_unnamed_addr {
Entry:
%2 = bitcast i64 %1 to <64 x i1>
%3 = tail call fastcc <64 x i8> @llvm.experimental.vector.compress.v64i8(<64 x i8> %0, <64 x i1> %2, <64 x i8> zeroinitializer)
ret <64 x i8> %3
}
declare fastcc <64 x i8> @llvm.experimental.vector.compress.v64i8(<64 x i8>, <64 x i1>, <64 x i8>) #1
Compiled for Zen 5, we get:
compress:
.Lcompress$local:
kmovq k1, rdi
vpxor xmm1, xmm1, xmm1
vpcompressb zmm1 {k1}, zmm0
vmovdqa64 zmm0, zmm1
ret
The vpxor
is unnecessary. We could just use the {z}
variant.