s390x: missed optimization to `vec_unpackl`



https://godbolt.org/z/Wxc8x8Tax

This LLVM IR


```llvm
define range(i32 -32768, 32768) <4 x i32> @unpackh(<8 x i16> %a) unnamed_addr {
start:
  %0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  %1 = sext <4 x i16> %0 to <4 x i32>
  ret <4 x i32> %1
}

define range(i32 -32768, 32768) <4 x i32> @unpackl(<8 x i16> %a) unnamed_addr {
start:
  %0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
  %1 = sext <4 x i16> %0 to <4 x i32>
  ret <4 x i32> %1
}
```

optimizes to 

```asm
unpackh:
        vuphh   %v24, %v24
        br      %r14

unpackl:
        vmrlg   %v0, %v24, %v24
        vuphh   %v24, %v0
        br      %r14
```

this is already very good, and optimal for `unpackh`, but not for `unpackl`:

https://godbolt.org/z/xfobea8ee

```c
vector signed int foo(vector signed short a) { 
    return vec_unpackh(a);
}

vector signed int unpack(vector signed short a) { 
    return vec_unpackl(a);
}
```

optimizes to

```asm
foo:
        vuphh   %v24, %v24
        br      %r14

unpack:
        vuplhw  %v24, %v24
        br      %r14
```

So it looks like a final step is missed where `vmrlg + vuphh` can be rewritten to just `vuplhw` (and similarly for the other vector types). Or maybe the shuffle vector should be recognized directly. Anyway, this seems achievable.

This came up in the context of the rust standard library

cc @uweigand


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

s390x: missed optimization to `vec_unpackl` #129576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

s390x: missed optimization to vec_unpackl #129576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

s390x: missed optimization to `vec_unpackl` #129576