Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets

This was found while building rustc 1.33.0 on `powerpc64-unknown-linux-musl`. Test `run-pass/simd/simd-intrinsic-generic-select` fails at the third `simd_select_bitmask` test:

```
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(8, 1, 10, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 13, 6, 15)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:159:9
```

The LLVM IR generated for this test case on `x86_64-unknown-linux-musl` and `powerpc64-unknown-linux-musl` (respectively) is identical:

```llvm
  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32
```
```llvm
  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32
```

The test appears to expect that the bitmask is interpreted as bitwise little-endian (in other words, the LSB selects the first element from the vectors). However, the implementation uses a bitcast to a vector of `i1`. On big-endian architectures such as powerpc64, the LSB becomes the last element of this `i1` vector, not the first.

Unfortunately, all of the upstream test cases are symmetrical, so "choosing the wrong vector" and "reading the bitmask backwards" are indistinguishable. I added an additional test case:

```rust
        let r: u32x8 = simd_select_bitmask(0b11110101u8, a, b);
        let e = u32x8(0, 9, 2, 11, 4, 5, 6, 7);
        assert_eq!(r, e);
```

This passes on `x86_64-unknown-linux-musl`, but fails on `powerpc64-unknown-linux-musl` with:

```
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(0, 1, 2, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 5, 6, 7)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:50:9
```

The two "unlike" elements were chosen from the wrong place in the vector.

Since the vectors are 256 bits, and the POWER VSX registers are only 128 bits wide, LLVM must split each vector across two registers. The following powerpc64 assembly is from the last test case (`0b11110000u8`), and clearly shows that LLVM (not the hardware) picks the first half of `a` and the second half of `b`:

```
   0x000000000000416c <+36>:    li      r3,0                                                                                                                                                   

36          unsafe {
37              let a = u32x8(0, 1, 2, 3, 4, 5, 6, 7);
   0x0000000000004170 <+40>:    stw     r3,288(r1)                                                                                                                                             
   0x0000000000004174 <+44>:    li      r3,1                                                                                                                                                   
   0x0000000000004178 <+48>:    stw     r3,292(r1)
   0x000000000000417c <+52>:    li      r3,2
   0x0000000000004180 <+56>:    stw     r3,296(r1)
   0x0000000000004184 <+60>:    li      r3,3
   0x0000000000004188 <+64>:    stw     r3,300(r1)
   0x000000000000418c <+68>:    li      r3,4
   0x0000000000004190 <+72>:    stw     r3,304(r1)                                                                                                                                             
   0x0000000000004194 <+76>:    li      r3,5
   0x0000000000004198 <+80>:    stw     r3,308(r1)
   0x000000000000419c <+84>:    li      r3,6
   0x00000000000041a0 <+88>:    stw     r3,312(r1)
   0x00000000000041a4 <+92>:    li      r3,7                                                                                                                                                   
   0x00000000000041a8 <+96>:    stw     r3,316(r1)                                                                                                                                             
   0x00000000000041ac <+100>:   li      r3,8

38              let b = u32x8(8, 9, 10, 11, 12, 13, 14, 15);
   0x00000000000041b0 <+104>:   stw     r3,320(r1)
   0x00000000000041b4 <+108>:   li      r3,9
   0x00000000000041b8 <+112>:   stw     r3,324(r1)                                                                                                                                             
   0x00000000000041bc <+116>:   li      r3,10
   0x00000000000041c0 <+120>:   stw     r3,328(r1)                                                                                                                                             
   0x00000000000041c4 <+124>:   li      r3,11
   0x00000000000041c8 <+128>:   stw     r3,332(r1)
   0x00000000000041cc <+132>:   addi    r3,r1,336
   0x00000000000041d0 <+136>:   li      r4,12
   0x00000000000041d4 <+140>:   stw     r4,336(r1)
   0x00000000000041d8 <+144>:   li      r4,13                                                                                                                                                  
   0x00000000000041dc <+148>:   stw     r4,340(r1)
   0x00000000000041e0 <+152>:   li      r4,14
   0x00000000000041e4 <+156>:   stw     r4,344(r1)
   0x00000000000041e8 <+160>:   li      r4,15
   0x00000000000041ec <+164>:   stw     r4,348(r1)
...
60              let r: u32x8 = simd_select_bitmask(0b11110000u8, a, b);
   0x00000000000048cc <+1924>:  addi    r3,r1,288
   0x00000000000048d0 <+1928>:  lvx     v2,0,r3
   0x00000000000048d4 <+1932>:  addi    r3,r1,336
   0x00000000000048d8 <+1936>:  lvx     v3,0,r3
   0x00000000000048dc <+1940>:  addi    r3,r1,1488
   0x00000000000048e0 <+1944>:  stvx    v3,0,r3
   0x00000000000048e4 <+1948>:  addi    r3,r1,1472
   0x00000000000048e8 <+1952>:  stvx    v2,0,r3
```

So is this a bug in the test, because it should be ensuring that the bitmask is in native vector/endian order? Or in the implementation of `simd_select_bitmask`, because it should always take a little-endian bitmask and reverse the bits as necessary?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets #59356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets #59356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions