Closed
Description
this LLVM
https://godbolt.org/z/cx8adPc9f
define range(i32 0, -131070) <4 x i32> @manual_mule(<8 x i16> %a, <8 x i16> %b) unnamed_addr {
start:
%0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%1 = zext <4 x i16> %0 to <4 x i32>
%2 = shufflevector <8 x i16> %b, <8 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%3 = zext <4 x i16> %2 to <4 x i32>
%4 = mul nuw <4 x i32> %3, %1
ret <4 x i32> %4
}
does not optimize to the expected output of vec_mule
, a single vmleh
instruction. The same is true for the other multiplication flavors (low, high, odd).