Closed
Description
https://godbolt.org/z/Wxc8x8Tax
This LLVM IR
define range(i32 -32768, 32768) <4 x i32> @unpackh(<8 x i16> %a) unnamed_addr {
start:
%0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%1 = sext <4 x i16> %0 to <4 x i32>
ret <4 x i32> %1
}
define range(i32 -32768, 32768) <4 x i32> @unpackl(<8 x i16> %a) unnamed_addr {
start:
%0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%1 = sext <4 x i16> %0 to <4 x i32>
ret <4 x i32> %1
}
optimizes to
unpackh:
vuphh %v24, %v24
br %r14
unpackl:
vmrlg %v0, %v24, %v24
vuphh %v24, %v0
br %r14
this is already very good, and optimal for unpackh
, but not for unpackl
:
https://godbolt.org/z/xfobea8ee
vector signed int foo(vector signed short a) {
return vec_unpackh(a);
}
vector signed int unpack(vector signed short a) {
return vec_unpackl(a);
}
optimizes to
foo:
vuphh %v24, %v24
br %r14
unpack:
vuplhw %v24, %v24
br %r14
So it looks like a final step is missed where vmrlg + vuphh
can be rewritten to just vuplhw
(and similarly for the other vector types). Or maybe the shuffle vector should be recognized directly. Anyway, this seems achievable.
This came up in the context of the rust standard library
cc @uweigand