Skip to content

Failure in run-pass/simd/simd-intrinsic-generic-select on Big-Endian Targets #59356

Closed
@smaeul

Description

@smaeul

This was found while building rustc 1.33.0 on powerpc64-unknown-linux-musl. Test run-pass/simd/simd-intrinsic-generic-select fails at the third simd_select_bitmask test:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(8, 1, 10, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 13, 6, 15)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:159:9

The LLVM IR generated for this test case on x86_64-unknown-linux-musl and powerpc64-unknown-linux-musl (respectively) is identical:

  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32
  %1036 = load <8 x i32>, <8 x i32>* %a83, align 32
  %1037 = load <8 x i32>, <8 x i32>* %b84, align 32
  %1038 = select <8 x i1> bitcast (<1 x i8> <i8 -16> to <8 x i1>), <8 x i32> %1036, <8 x i32> %1037
  store <8 x i32> %1038, <8 x i32>* %r101, align 32

The test appears to expect that the bitmask is interpreted as bitwise little-endian (in other words, the LSB selects the first element from the vectors). However, the implementation uses a bitcast to a vector of i1. On big-endian architectures such as powerpc64, the LSB becomes the last element of this i1 vector, not the first.

Unfortunately, all of the upstream test cases are symmetrical, so "choosing the wrong vector" and "reading the bitmask backwards" are indistinguishable. I added an additional test case:

        let r: u32x8 = simd_select_bitmask(0b11110101u8, a, b);
        let e = u32x8(0, 9, 2, 11, 4, 5, 6, 7);
        assert_eq!(r, e);

This passes on x86_64-unknown-linux-musl, but fails on powerpc64-unknown-linux-musl with:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u32x8(0, 1, 2, 3, 12, 5, 14, 7)`,
 right: `u32x8(0, 9, 2, 11, 4, 5, 6, 7)`', src/test/run-pass/simd/simd-intrinsic-generic-select.rs:50:9

The two "unlike" elements were chosen from the wrong place in the vector.

Since the vectors are 256 bits, and the POWER VSX registers are only 128 bits wide, LLVM must split each vector across two registers. The following powerpc64 assembly is from the last test case (0b11110000u8), and clearly shows that LLVM (not the hardware) picks the first half of a and the second half of b:

   0x000000000000416c <+36>:    li      r3,0                                                                                                                                                   

36          unsafe {
37              let a = u32x8(0, 1, 2, 3, 4, 5, 6, 7);
   0x0000000000004170 <+40>:    stw     r3,288(r1)                                                                                                                                             
   0x0000000000004174 <+44>:    li      r3,1                                                                                                                                                   
   0x0000000000004178 <+48>:    stw     r3,292(r1)
   0x000000000000417c <+52>:    li      r3,2
   0x0000000000004180 <+56>:    stw     r3,296(r1)
   0x0000000000004184 <+60>:    li      r3,3
   0x0000000000004188 <+64>:    stw     r3,300(r1)
   0x000000000000418c <+68>:    li      r3,4
   0x0000000000004190 <+72>:    stw     r3,304(r1)                                                                                                                                             
   0x0000000000004194 <+76>:    li      r3,5
   0x0000000000004198 <+80>:    stw     r3,308(r1)
   0x000000000000419c <+84>:    li      r3,6
   0x00000000000041a0 <+88>:    stw     r3,312(r1)
   0x00000000000041a4 <+92>:    li      r3,7                                                                                                                                                   
   0x00000000000041a8 <+96>:    stw     r3,316(r1)                                                                                                                                             
   0x00000000000041ac <+100>:   li      r3,8

38              let b = u32x8(8, 9, 10, 11, 12, 13, 14, 15);
   0x00000000000041b0 <+104>:   stw     r3,320(r1)
   0x00000000000041b4 <+108>:   li      r3,9
   0x00000000000041b8 <+112>:   stw     r3,324(r1)                                                                                                                                             
   0x00000000000041bc <+116>:   li      r3,10
   0x00000000000041c0 <+120>:   stw     r3,328(r1)                                                                                                                                             
   0x00000000000041c4 <+124>:   li      r3,11
   0x00000000000041c8 <+128>:   stw     r3,332(r1)
   0x00000000000041cc <+132>:   addi    r3,r1,336
   0x00000000000041d0 <+136>:   li      r4,12
   0x00000000000041d4 <+140>:   stw     r4,336(r1)
   0x00000000000041d8 <+144>:   li      r4,13                                                                                                                                                  
   0x00000000000041dc <+148>:   stw     r4,340(r1)
   0x00000000000041e0 <+152>:   li      r4,14
   0x00000000000041e4 <+156>:   stw     r4,344(r1)
   0x00000000000041e8 <+160>:   li      r4,15
   0x00000000000041ec <+164>:   stw     r4,348(r1)
...
60              let r: u32x8 = simd_select_bitmask(0b11110000u8, a, b);
   0x00000000000048cc <+1924>:  addi    r3,r1,288
   0x00000000000048d0 <+1928>:  lvx     v2,0,r3
   0x00000000000048d4 <+1932>:  addi    r3,r1,336
   0x00000000000048d8 <+1936>:  lvx     v3,0,r3
   0x00000000000048dc <+1940>:  addi    r3,r1,1488
   0x00000000000048e0 <+1944>:  stvx    v3,0,r3
   0x00000000000048e4 <+1948>:  addi    r3,r1,1472
   0x00000000000048e8 <+1952>:  stvx    v2,0,r3

So is this a bug in the test, because it should be ensuring that the bitmask is in native vector/endian order? Or in the implementation of simd_select_bitmask, because it should always take a little-endian bitmask and reverse the bits as necessary?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-SIMDArea: SIMD (Single Instruction Multiple Data)C-bugCategory: This is a bug.O-PowerPCTarget: PowerPC processors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions