[RISCV] Merge shuffle sources if lanes are disjoint #119401

lukel97 · 2024-12-10T15:50:48Z

In x264, there's a few kernels with shuffles like this:

%41 = add nsw <16 x i32> %39, %40
%42 = sub nsw <16 x i32> %39, %40
%43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16>

Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together.

vadd.vv v20, v16, v12
vsub.vv v12, v16, v12
vrgatherei16.vv v24, v20, v10
vrgatherei16.vv v24, v12, v16, v0.t

However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead:

%41 = add nsw <16 x i32> %39, %40
%42 = sub nsw <16 x i32> %39, %40
%43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42
%44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0>

The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv:

vadd.vv v20, v16, v12
vsub.vv v20, v16, v12, v0.t
vrgatherei16.vv v24, v20, v10

This patch bails if either of the sources are a broadcast/splat/identity shuffle, since that will usually already have some sort of cheaper lowering.

This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60: https://lnt.lukelau.me/db_default/v4/nts/71?compare_to=70

llvmbot · 2024-12-10T15:51:26Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

In x264, there's a few kernels with shuffles like this:

%41 = add nsw &lt;16 x i32&gt; %39, %40
%42 = sub nsw &lt;16 x i32&gt; %39, %40
%43 = shufflevector &lt;16 x i32&gt; %41, &lt;16 x i32&gt; %42, &lt;16 x i32&gt; &lt;i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16&gt;

Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together.

vadd.vv v20, v16, v12
vsub.vv v12, v16, v12
vrgatherei16.vv v24, v20, v10
vrgatherei16.vv v24, v12, v16, v0.t

However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead:

%41 = add nsw &lt;16 x i32&gt; %39, %40
%42 = sub nsw &lt;16 x i32&gt; %39, %40
%43 = select &lt;0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1&gt; %41, %42
%44 = shufflevector &lt;16 x i32&gt; %43, &lt;16 x i32&gt; poison, &lt;16 x i32&gt; &lt;i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0&gt;

The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv:

vadd.vv v20, v16, v12
vsub.vv v20, v16, v12, v0.t
vrgatherei16.vv v24, v20, v10

This patch bails if either of the sources are a splat however, since that will usually already have some sort of cheaper lowering via vrgather.vi.

This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60: https://lnt.lukelau.me/db_default/v4/nts/71?compare_to=70

Patch is 80.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119401.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+71)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+58-45)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+40-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll (+52-21)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll (+708-642)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll (+30-37)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 46dedcc3e09cf2..ea53f6306b9c07 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5197,6 +5197,67 @@ static bool isCompressMask(ArrayRef<int> Mask) {
   return true;
 }
 
+/// Given a shuffle where the indices are disjoint between the two sources,
+/// e.g.:
+///
+/// t2:v4i8 = vector_shuffle t0:v4i8, t1:v4i8, <2, 7, 1, 4>
+///
+/// Merge the two sources into one and do a single source shuffle:
+///
+/// t2:v4i8 = vselect t1:v4i8, t0:v4i8, <0, 1, 0, 1>
+/// t3:v4i8 = vector_shuffle t2:v4i8, undef, <2, 3, 1, 0>
+///
+/// A vselect will either be merged into a masked instruction or be lowered as a
+/// vmerge.vvm, which is cheaper than a vrgather.vv.
+static SDValue lowerDisjointIndicesShuffle(ShuffleVectorSDNode *SVN,
+                                           SelectionDAG &DAG,
+                                           const RISCVSubtarget &Subtarget) {
+  MVT VT = SVN->getSimpleValueType(0);
+  MVT XLenVT = Subtarget.getXLenVT();
+  SDLoc DL(SVN);
+
+  const ArrayRef<int> Mask = SVN->getMask();
+
+  // Work out which source each lane will come from.
+  SmallVector<int, 16> Srcs(Mask.size(), -1);
+
+  for (int Idx : Mask) {
+    if (Idx == -1)
+      continue;
+    unsigned SrcIdx = Idx % Mask.size();
+    int Src = (uint32_t)Idx < Mask.size() ? 0 : 1;
+    if (Srcs[SrcIdx] == -1)
+      // Mark this source as using this lane.
+      Srcs[SrcIdx] = Src;
+    else if (Srcs[SrcIdx] != Src)
+      // The other source is using this lane: not disjoint.
+      return SDValue();
+  }
+
+  SmallVector<SDValue> SelectMaskVals;
+  for (int Lane : Srcs) {
+    if (Lane == -1)
+      SelectMaskVals.push_back(DAG.getUNDEF(XLenVT));
+    else
+      SelectMaskVals.push_back(DAG.getConstant(Lane, DL, XLenVT));
+  }
+  MVT MaskVT = VT.changeVectorElementType(MVT::i1);
+  SDValue SelectMask = DAG.getBuildVector(MaskVT, DL, SelectMaskVals);
+  SDValue Select = DAG.getNode(ISD::VSELECT, DL, VT, SelectMask,
+                               SVN->getOperand(1), SVN->getOperand(0));
+
+  // Move all indices relative to the first source.
+  SmallVector<int> NewMask(Mask.size());
+  for (unsigned I = 0; I < Mask.size(); I++) {
+    if (Mask[I] == -1)
+      NewMask[I] = -1;
+    else
+      NewMask[I] = Mask[I] % Mask.size();
+  }
+
+  return DAG.getVectorShuffle(VT, DL, Select, DAG.getUNDEF(VT), NewMask);
+}
+
 static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
                                    const RISCVSubtarget &Subtarget) {
   SDValue V1 = Op.getOperand(0);
@@ -5540,6 +5601,16 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
     ShuffleMaskRHS.push_back(IsLHSOrUndefIndex ? -1 : (MaskIndex - NumElts));
   }
 
+  // If the mask indices are disjoint between the two sources, we can lower it
+  // as a vselect + a single source vrgather.vv. Don't do this if the operands
+  // will be splatted since they will be lowered to something cheaper like
+  // vrgather.vi anyway.
+  if (!DAG.isSplatValue(V2) && !DAG.isSplatValue(V1) &&
+      !ShuffleVectorSDNode::isSplatMask(ShuffleMaskLHS.data(), VT) &&
+      !ShuffleVectorSDNode::isSplatMask(ShuffleMaskRHS.data(), VT))
+    if (SDValue V = lowerDisjointIndicesShuffle(SVN, DAG, Subtarget))
+      return V;
+
   // Try to pick a profitable operand order.
   bool SwapOps = DAG.isSplatValue(V2) && !DAG.isSplatValue(V1);
   SwapOps = SwapOps ^ ShuffleVectorInst::isIdentityMask(ShuffleMaskRHS, NumElts);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
index 1a5ca429b531fa..e16f998fd64017 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll
@@ -104,60 +104,73 @@ define <512 x i8> @two_source(<512 x i8> %a, <512 x i8> %b) {
 ; CHECK-NEXT:    addi s0, sp, 1536
 ; CHECK-NEXT:    .cfi_def_cfa s0, 0
 ; CHECK-NEXT:    andi sp, sp, -512
-; CHECK-NEXT:    vsetivli zero, 1, e8, m1, ta, ma
-; CHECK-NEXT:    vmv8r.v v24, v8
+; CHECK-NEXT:    vsetivli zero, 8, e64, m1, ta, ma
+; CHECK-NEXT:    vmv.v.i v0, 0
+; CHECK-NEXT:    lui a2, 16384
+; CHECK-NEXT:    li a3, 1
 ; CHECK-NEXT:    li a0, 512
 ; CHECK-NEXT:    addi a1, sp, 512
-; CHECK-NEXT:    vslidedown.vi v0, v24, 5
-; CHECK-NEXT:    vmv.x.s a2, v24
-; CHECK-NEXT:    li a3, 432
+; CHECK-NEXT:    li a4, 43
+; CHECK-NEXT:    slli a3, a3, 34
+; CHECK-NEXT:    vmv.s.x v24, a3
+; CHECK-NEXT:    li a3, 36
+; CHECK-NEXT:    addi a2, a2, 129
+; CHECK-NEXT:    slli a2, a2, 36
+; CHECK-NEXT:    vsetvli zero, zero, e64, m1, tu, ma
+; CHECK-NEXT:    vmv.s.x v0, a2
+; CHECK-NEXT:    vsetivli zero, 3, e64, m1, tu, ma
+; CHECK-NEXT:    vslideup.vi v0, v24, 2
+; CHECK-NEXT:    li a2, 399
 ; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, ma
-; CHECK-NEXT:    vmv.v.x v8, a2
-; CHECK-NEXT:    li a2, 431
-; CHECK-NEXT:    vsetvli zero, a3, e8, m8, tu, ma
-; CHECK-NEXT:    vslideup.vx v8, v0, a2
+; CHECK-NEXT:    vmerge.vvm v16, v8, v16, v0
 ; CHECK-NEXT:    vsetivli zero, 1, e8, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v0, v24, 4
+; CHECK-NEXT:    vslidedown.vx v8, v16, a4
+; CHECK-NEXT:    vmv.x.s a4, v16
+; CHECK-NEXT:    vslidedown.vx v24, v16, a3
+; CHECK-NEXT:    vmv.x.s a3, v8
+; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a4
+; CHECK-NEXT:    li a4, 398
+; CHECK-NEXT:    vslide1down.vx v8, v8, a3
+; CHECK-NEXT:    li a3, 432
+; CHECK-NEXT:    vsetvli zero, a2, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v24, a4
+; CHECK-NEXT:    li a4, 431
 ; CHECK-NEXT:    li a2, 466
+; CHECK-NEXT:    vsetivli zero, 1, e8, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v24, v16, 5
+; CHECK-NEXT:    vsetvli zero, a3, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v24, a4
 ; CHECK-NEXT:    li a3, 465
+; CHECK-NEXT:    li a4, 62
 ; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, ma
-; CHECK-NEXT:    vse8.v v24, (a1)
-; CHECK-NEXT:    lbu a1, 985(sp)
-; CHECK-NEXT:    vsetvli zero, a2, e8, m8, tu, ma
-; CHECK-NEXT:    vslideup.vx v8, v0, a3
-; CHECK-NEXT:    li a2, 478
-; CHECK-NEXT:    lbu a3, 1012(sp)
-; CHECK-NEXT:    vmv.s.x v24, a1
-; CHECK-NEXT:    li a1, 477
+; CHECK-NEXT:    vse8.v v16, (a1)
+; CHECK-NEXT:    li a0, 467
+; CHECK-NEXT:    vsetivli zero, 1, e8, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vx v24, v16, a4
+; CHECK-NEXT:    li a1, 478
+; CHECK-NEXT:    vslidedown.vi v16, v16, 4
 ; CHECK-NEXT:    vsetvli zero, a2, e8, m8, tu, ma
-; CHECK-NEXT:    vslideup.vx v8, v24, a1
-; CHECK-NEXT:    li a1, 501
-; CHECK-NEXT:    vmv.s.x v24, a3
-; CHECK-NEXT:    li a2, 500
-; CHECK-NEXT:    vsetvli zero, a1, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v16, a3
+; CHECK-NEXT:    lbu a3, 985(sp)
+; CHECK-NEXT:    vsetvli zero, a0, e8, m8, tu, ma
 ; CHECK-NEXT:    vslideup.vx v8, v24, a2
-; CHECK-NEXT:    lui a1, 2761
-; CHECK-NEXT:    vsetivli zero, 8, e64, m1, ta, ma
-; CHECK-NEXT:    vmv.v.i v24, 0
-; CHECK-NEXT:    lui a2, 4
-; CHECK-NEXT:    vmv.s.x v25, a2
-; CHECK-NEXT:    lui a2, 1047552
-; CHECK-NEXT:    addi a2, a2, 1
-; CHECK-NEXT:    slli a2, a2, 23
-; CHECK-NEXT:    addi a2, a2, 1
-; CHECK-NEXT:    slli a2, a2, 18
-; CHECK-NEXT:    vslide1down.vx v0, v24, a2
-; CHECK-NEXT:    li a2, 64
-; CHECK-NEXT:    slli a1, a1, 25
-; CHECK-NEXT:    addi a1, a1, 501
-; CHECK-NEXT:    slli a1, a1, 13
-; CHECK-NEXT:    addi a1, a1, 512
-; CHECK-NEXT:    vsetivli zero, 7, e64, m1, tu, ma
-; CHECK-NEXT:    vslideup.vi v0, v25, 6
-; CHECK-NEXT:    vsetvli zero, a2, e64, m8, ta, ma
-; CHECK-NEXT:    vmv.v.x v24, a1
-; CHECK-NEXT:    vsetvli zero, a0, e8, m8, ta, mu
-; CHECK-NEXT:    vrgather.vv v8, v16, v24, v0.t
+; CHECK-NEXT:    li a0, 477
+; CHECK-NEXT:    lbu a2, 1012(sp)
+; CHECK-NEXT:    vmv.s.x v16, a3
+; CHECK-NEXT:    lbu a3, 674(sp)
+; CHECK-NEXT:    vsetvli zero, a1, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v16, a0
+; CHECK-NEXT:    vmv.s.x v24, a3
+; CHECK-NEXT:    li a0, 490
+; CHECK-NEXT:    vmv.s.x v16, a2
+; CHECK-NEXT:    li a1, 489
+; CHECK-NEXT:    li a2, 501
+; CHECK-NEXT:    vsetvli zero, a0, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v24, a1
+; CHECK-NEXT:    li a0, 500
+; CHECK-NEXT:    vsetvli zero, a2, e8, m8, tu, ma
+; CHECK-NEXT:    vslideup.vx v8, v16, a0
 ; CHECK-NEXT:    addi sp, s0, -1536
 ; CHECK-NEXT:    .cfi_def_cfa sp, 1536
 ; CHECK-NEXT:    ld ra, 1528(sp) # 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
index 0db45ae71bc8ac..ccb0166040b168 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
@@ -29,10 +29,10 @@ define <4 x half> @shuffle_v4f16(<4 x half> %x, <4 x half> %y) {
 define <8 x float> @shuffle_v8f32(<8 x float> %x, <8 x float> %y) {
 ; CHECK-LABEL: shuffle_v8f32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, -20
+; CHECK-NEXT:    li a0, 19
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vmerge.vvm v8, v10, v8, v0
+; CHECK-NEXT:    vmerge.vvm v8, v8, v10, v0
 ; CHECK-NEXT:    ret
   %s = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 12, i32 5, i32 6, i32 7>
   ret <8 x float> %s
@@ -395,3 +395,41 @@ define <4 x half> @vrgather_shuffle_vx_v4f16_load(ptr %p) {
   %s = shufflevector <4 x half> %v, <4 x half> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
   ret <4 x half> %s
 }
+
+define <16 x float> @shuffle_disjoint_lanes(<16 x float> %v, <16 x float> %w) {
+; CHECK-LABEL: shuffle_disjoint_lanes:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI30_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI30_0)
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vle8.v v16, (a0)
+; CHECK-NEXT:    lui a0, 5
+; CHECK-NEXT:    addi a0, a0, 1365
+; CHECK-NEXT:    vmv.s.x v0, a0
+; CHECK-NEXT:    vmerge.vvm v12, v8, v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vsext.vf2 v18, v16
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v8, v12, v18
+; CHECK-NEXT:    ret
+  %out = shufflevector <16 x float> %v, <16 x float> %w, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16>
+  ret <16 x float> %out
+}
+
+define <16 x float> @shuffle_disjoint_lanes_one_splat(<16 x float> %v, <16 x float> %w) {
+; CHECK-LABEL: shuffle_disjoint_lanes_one_splat:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI31_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI31_0)
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
+; CHECK-NEXT:    vle16.v v20, (a0)
+; CHECK-NEXT:    lui a0, 15
+; CHECK-NEXT:    addi a0, a0, 240
+; CHECK-NEXT:    vmv.s.x v0, a0
+; CHECK-NEXT:    vrgather.vi v16, v8, 7
+; CHECK-NEXT:    vrgatherei16.vv v16, v12, v20, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %out = shufflevector <16 x float> %v, <16 x float> %w, <16 x i32> <i32 7, i32 7, i32 7, i32 7, i32 26, i32 30, i32 22, i32 18, i32 7, i32 7, i32 7, i32 7, i32 24, i32 28, i32 20, i32 16>
+  ret <16 x float> %out
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
index 1c6e1a37fa8af5..cb6ef7faf06a22 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
@@ -16,10 +16,10 @@ define <4 x i16> @shuffle_v4i16(<4 x i16> %x, <4 x i16> %y) {
 define <8 x i32> @shuffle_v8i32(<8 x i32> %x, <8 x i32> %y) {
 ; CHECK-LABEL: shuffle_v8i32:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, 203
+; CHECK-NEXT:    li a0, 52
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vmerge.vvm v8, v10, v8, v0
+; CHECK-NEXT:    vmerge.vvm v8, v8, v10, v0
 ; CHECK-NEXT:    ret
   %s = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 12, i32 13, i32 6, i32 7>
   ret <8 x i32> %s
@@ -451,21 +451,14 @@ define <8 x i8> @splat_ve2_we0_ins_i2we4(<8 x i8> %v, <8 x i8> %w) {
 define <8 x i8> @splat_ve2_we0_ins_i2ve4_i5we6(<8 x i8> %v, <8 x i8> %w) {
 ; CHECK-LABEL: splat_ve2_we0_ins_i2ve4_i5we6:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI26_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI26_0)
 ; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vmv.v.i v10, 6
-; CHECK-NEXT:    vmv.v.i v11, 0
-; CHECK-NEXT:    lui a0, 8256
-; CHECK-NEXT:    addi a0, a0, 2
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vmv.v.x v12, a0
-; CHECK-NEXT:    li a0, 98
-; CHECK-NEXT:    vsetivli zero, 6, e8, mf2, tu, ma
-; CHECK-NEXT:    vslideup.vi v11, v10, 5
+; CHECK-NEXT:    vle8.v v10, (a0)
+; CHECK-NEXT:    li a0, 65
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, mu
-; CHECK-NEXT:    vrgather.vv v10, v8, v12
-; CHECK-NEXT:    vrgather.vv v10, v9, v11, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmerge.vvm v9, v8, v9, v0
+; CHECK-NEXT:    vrgather.vv v8, v9, v10
 ; CHECK-NEXT:    ret
   %shuff = shufflevector <8 x i8> %v, <8 x i8> %w, <8 x i32> <i32 2, i32 8, i32 4, i32 2, i32 2, i32 14, i32 8, i32 2>
   ret <8 x i8> %shuff
@@ -693,12 +686,12 @@ define <8 x i8> @unmergable(<8 x i8> %v, <8 x i8> %w) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    lui a0, %hi(.LCPI46_0)
 ; CHECK-NEXT:    addi a0, a0, %lo(.LCPI46_0)
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, mu
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vle8.v v10, (a0)
-; CHECK-NEXT:    li a0, -22
+; CHECK-NEXT:    li a0, 171
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vslidedown.vi v8, v8, 2
-; CHECK-NEXT:    vrgather.vv v8, v9, v10, v0.t
+; CHECK-NEXT:    vmerge.vvm v9, v8, v9, v0
+; CHECK-NEXT:    vrgather.vv v8, v9, v10
 ; CHECK-NEXT:    ret
   %res = shufflevector <8 x i8> %v, <8 x i8> %w, <8 x i32> <i32 2, i32 9, i32 4, i32 11, i32 6, i32 13, i32 8, i32 15>
   ret <8 x i8> %res
@@ -709,9 +702,9 @@ define <8 x i32> @shuffle_v8i32_2(<8 x i32> %x, <8 x i32> %y) {
 ; CHECK-LABEL: shuffle_v8i32_2:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
-; CHECK-NEXT:    vmv.v.i v0, -13
+; CHECK-NEXT:    vmv.v.i v0, 12
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vmerge.vvm v8, v10, v8, v0
+; CHECK-NEXT:    vmerge.vvm v8, v8, v10, v0
 ; CHECK-NEXT:    ret
   %s = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 0, i32 1, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>
   ret <8 x i32> %s
@@ -1021,3 +1014,41 @@ define <8 x i32> @shuffle_repeat4_singlesrc_e32(<8 x i32> %v) {
   %out = shufflevector <8 x i32> %v, <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1>
   ret <8 x i32> %out
 }
+
+define <16 x i32> @shuffle_disjoint_lanes(<16 x i32> %v, <16 x i32> %w) {
+; CHECK-LABEL: shuffle_disjoint_lanes:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI70_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI70_0)
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vle8.v v16, (a0)
+; CHECK-NEXT:    lui a0, 5
+; CHECK-NEXT:    addi a0, a0, 1365
+; CHECK-NEXT:    vmv.s.x v0, a0
+; CHECK-NEXT:    vmerge.vvm v12, v8, v12, v0
+; CHECK-NEXT:    vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vsext.vf2 v18, v16
+; CHECK-NEXT:    vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT:    vrgatherei16.vv v8, v12, v18
+; CHECK-NEXT:    ret
+  %out = shufflevector <16 x i32> %v, <16 x i32> %w, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16>
+  ret <16 x i32> %out
+}
+
+define <16 x i32> @shuffle_disjoint_lanes_one_splat(<16 x i32> %v, <16 x i32> %w) {
+; CHECK-LABEL: shuffle_disjoint_lanes_one_splat:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI71_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI71_0)
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, mu
+; CHECK-NEXT:    vle16.v v20, (a0)
+; CHECK-NEXT:    lui a0, 15
+; CHECK-NEXT:    addi a0, a0, 240
+; CHECK-NEXT:    vmv.s.x v0, a0
+; CHECK-NEXT:    vrgather.vi v16, v8, 7
+; CHECK-NEXT:    vrgatherei16.vv v16, v12, v20, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %out = shufflevector <16 x i32> %v, <16 x i32> %w, <16 x i32> <i32 7, i32 7, i32 7, i32 7, i32 26, i32 30, i32 22, i32 18, i32 7, i32 7, i32 7, i32 7, i32 24, i32 28, i32 20, i32 16>
+  ret <16 x i32> %out
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
index 8833634be1a0ed..d4ae952325d6b3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll
@@ -183,406 +183,499 @@ define {<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>} @load_
 ; RV32-NEXT:    addi sp, sp, -16
 ; RV32-NEXT:    .cfi_def_cfa_offset 16
 ; RV32-NEXT:    csrr a2, vlenb
-; RV32-NEXT:    slli a3, a2, 6
-; RV32-NEXT:    add a2, a3, a2
+; RV32-NEXT:    li a3, 96
+; RV32-NEXT:    mul a2, a2, a3
 ; RV32-NEXT:    sub sp, sp, a2
-; RV32-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xc1, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 65 * vlenb
-; RV32-NEXT:    addi a3, a1, 256
-; RV32-NEXT:    addi a4, a1, 128
+; RV32-NEXT:    .cfi_escape 0x0f, 0x0e, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0xe0, 0x00, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 96 * vlenb
+; RV32-NEXT:    addi a3, a1, 128
+; RV32-NEXT:    addi a4, a1, 256
 ; RV32-NEXT:    li a2, 32
-; RV32-NEXT:    lui a5, 12291
-; RV32-NEXT:    vsetvli zero, a2, e32, m8, ta, mu
-; RV32-NEXT:    vle32.v v24, (a1)
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    li a6, 41
-; RV32-NEXT:    mul a1, a1, a6
-; RV32-NEXT:    add a1, sp, a1
-; RV32-NEXT:    addi a1, a1, 16
-; RV32-NEXT:    vs8r.v v24, (a1) # Unknown-size Folded Spill
-; RV32-NEXT:    lui a1, %hi(.LCPI8_0)
-; RV32-NEXT:    addi a1, a1, %lo(.LCPI8_0)
-; RV32-NEXT:    vle16.v v4, (a1)
-; RV32-NEXT:    lui a1, 1
-; RV32-NEXT:    addi a5, a5, 3
+; RV32-NEXT:    li a5, 48
+; RV32-NEXT:    lui a6, 196656
+; RV32-NEXT:    lui a7, %hi(.LCPI8_1)
+; RV32-NEXT:    addi a7, a7, %lo(.LCPI8_1)
+; RV32-NEXT:    vsetvli zero, a2, e32, m8, ta, ma
 ; RV32-NEXT:    vle32.v v8, (a4)
 ; RV32-NEXT:    csrr a4, vlenb
-; RV32-NEXT:    li a6, 57
-; RV32-NEXT:    mul a4, a4, a6
+; RV32-NEXT:    li t0, 88
+; RV32-NEXT:    mul a4, a4, t0
 ; RV32-NEXT:    add a4, sp, a4
 ; RV32-NEXT:    addi a4, a4, 16
 ; RV32-NEXT:    vs8r.v v8, (a4) # Unknown-size Folded Spill
-; RV32-NEXT:    addi a1, a1, -64
-; RV32-NEXT:    vle32.v v16, (a3)
-; RV32-NEXT:    vmv.s.x v3, a5
-; RV32-NEXT:    vmv.s.x v0, a1
+; RV32-NEXT:    vmv.s.x v0, a5
+; RV32-NEXT:    vle32.v v24, (a3)
+; RV32-NEXT:    csrr a3, vlenb
+; RV32-NEXT:    li a4, 72
+; RV32-NEXT:    mul a3, a3, a4
+; RV32-NEXT:    add a3, sp, a3
+; RV32-NEXT:    addi a3, a3, 16
+; RV32-NEXT:    vs8r.v v24, (a3) # Unknown-size Folded Spill
+; RV32-NEXT:    vle32.v v16, (a1)
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    slli a1, a1, 6
+; RV32-NEXT:    add a1, sp, a1
+; RV32-NEXT:    addi a1, a1, 16
+; RV32-NEXT:    vs8r.v v16, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    addi a1, a6, 48
+; RV32-NEXT:    vle16.v v4, (a7)
+; RV32-NEXT:    vmv.s.x v3, a1
+; RV32-NEXT:    vsetivli zero, 16, e32, m8, ta, ma
+; RV32-NEXT:    vslidedown.vi v16, v8, 16
 ; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    li a3, 13
+; RV32-NEXT:    li a3, 80
 ; RV32-NEXT:    mul a1, a1, a3
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 16
-; RV32-NEXT:    vs1r.v v0, (a1) # Unknown-size Folded Spill
-; RV32-NEXT:    vcompress.vm v8, v24, v3
+; RV32-NEXT:    vs8r.v v16, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; RV32-NEXT:    vmerge.vvm v8, v8, v16, v0
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    li a3, 52
+; RV32-NEXT:    mul a1, a1, a3
+; RV32-NEXT:    add a1, sp, a1
+; RV32-NEXT:    addi a1, a1, 16
+; RV32-NEXT:    vs4r.v v8, (a1) # Unknown-size Folded Spill
+; RV32-NEXT:    vmv1r.v v0, v3
+; RV32-NEXT:    csrr a1, vlenb
+; RV32-NEXT:    ...
[truncated]

lukel97 · 2024-12-10T15:52:54Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll

+; CHECK-NEXT:    lui a0, %hi(.LCPI71_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI71_0)


I'm just noticing we don't seem to have any way of seeing the gather indices via update_llc_test_checks. I wonder if it's possible to print constant indices inline beside the vrgathers in an asm comment?

It's possible to add comments from RISCVAsmPrinter.cpp. X86 does it. Unfortunately, the constant pool reference belongs to the load and not the vrgather. On X86, shuffles can often fold the load so they become the same instruction.

Let me see if I can at least print the constant pool contents on a vector load.

Hmm. Its even harder because the constant pool reference belongs to an AUIPC or ADDI instruction that is the input to the load.

lukel97 · 2024-12-10T15:54:10Z

These shufflevectors come from the SLP vectorization mentioned in #119386

preames

Sometimes great minds think alike. I literally noticed this case last night, and was going to write up the patch this morning. :)

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll

preames · 2024-12-10T16:15:32Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll

+  ret <16 x float> %out
+}
+
+define <16 x float> @shuffle_disjoint_lanes_one_splat(<16 x float> %v, <16 x float> %w) {


Add an identity test please.

preames · 2024-12-10T16:20:52Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  }
+
+  SmallVector<SDValue> SelectMaskVals;
+  for (int Lane : Srcs) {


Can you reverse the select here so that it uses the same order as the generic fallthrough below? From prior experience, our vmerge.vxm matching is oddly fragile. I'd like to remove this unrelated change if possible.

Good point, I've reversed it to match the generic fallthrough when SwapOps is false. But it's worth pointing out that this simultaneously removes some mask diffs and introduces some others too

preames · 2024-12-10T16:23:14Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll

+; CHECK-NEXT:    vrgatherei16.vv v16, v12, v20, v0.t
+; CHECK-NEXT:    vmv.v.v v8, v16
+; CHECK-NEXT:    ret
+  %out = shufflevector <16 x float> %v, <16 x float> %w, <16 x i32> <i32 7, i32 7, i32 7, i32 7, i32 26, i32 30, i32 22, i32 18, i32 7, i32 7, i32 7, i32 7, i32 24, i32 28, i32 20, i32 16>


This test isn't exactly convincing that we want to avoid the splat case. I think maybe this hints at the difference between a splatvalue (from scalar) and the splat shuffle (from mask). The later requires a vrgather.vi, whereas the former is a vmerge.vxm. Maybe rework the tests to cover both cases separately?

Here's a diff from a regression if we remove the isSplatValue guard:

; CHECK-LABEL: shuffle_vx_v4i16: ; CHECK: # %bb.0: ; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma -; CHECK-NEXT: vmv.v.i v0, 6 -; CHECK-NEXT: vmerge.vim v8, v8, 5, v0 +; CHECK-NEXT: vmv.v.i v0, 9 +; CHECK-NEXT: vmv.v.i v9, 5 +; CHECK-NEXT: vmerge.vvm v8, v9, v8, v0 ; CHECK-NEXT: ret %s = shufflevector <4 x i16> %x, <4 x i16> <i16 5, i16 5, i16 5, i16 5>, <4 x i32> <i32 0, i32 5, i32 6, i32 3> ret <4 x i16> %s

And here's one if remove the isSplatMask guard:

--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll @@ -364,12 +364,14 @@ define <8 x i8> @splat_ve4_ins_i1ve3(<8 x i8> %v) { define <8 x i8> @splat_ve2_we0(<8 x i8> %v, <8 x i8> %w) { ; CHECK-LABEL: splat_ve2_we0: ; CHECK: # %bb.0: +; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma +; CHECK-NEXT: vmv.v.i v0, 4 ; CHECK-NEXT: li a0, 66 -; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, mu +; CHECK-NEXT: vmerge.vvm v9, v9, v8, v0 ; CHECK-NEXT: vmv.s.x v0, a0 -; CHECK-NEXT: vrgather.vi v10, v8, 2 -; CHECK-NEXT: vrgather.vi v10, v9, 0, v0.t -; CHECK-NEXT: vmv1r.v v8, v10 +; CHECK-NEXT: vmv.v.i v8, 2 +; CHECK-NEXT: vmerge.vim v10, v8, 0, v0 +; CHECK-NEXT: vrgather.vv v8, v9, v10 ; CHECK-NEXT: ret %shuff = shufflevector <8 x i8> %v, <8 x i8> %w, <8 x i32> <i32 2, i32 8, i32 2, i32 2, i32 2, i32 2, i32 8, i32 2> ret <8 x i8> %shuff

I think ideally we would really only want to do this lowering when we know we're going to end up with at least a vrgather.vv. Maybe in a follow up this could be reworked as a late combine on VRGATHER_VV_VL instead? It might be a bit trickier because you would need to reconstruct the original mask.

This is to prevent it from being caught up in the lowering in #119401

preames

LGTM w/aside comments.

preames · 2024-12-11T04:34:51Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll

@@ -183,406 +183,499 @@ define {<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>} @load_
 ; RV32-NEXT:    addi sp, sp, -16
 ; RV32-NEXT:    .cfi_def_cfa_offset 16
 ; RV32-NEXT:    csrr a2, vlenb
-; RV32-NEXT:    slli a3, a2, 6


Just for the record, this test is a real regression. The new lower appears to increase register pressure.

I'm not worried about this case because:

This test case is really stressing legalization, and isn't representative of real code.

The two argument shuffle could have been a vcompress to start with.

The "right" lowering probably would have been multiple seg6 loads with offset addresses, and vslideups on the results, but that's well beyond anything worth bothering with.

preames · 2024-12-11T04:37:44Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll

-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, mu
-; CHECK-NEXT:    vrgather.vv v11, v8, v10, v0.t
-; CHECK-NEXT:    vse8.v v11, (a1)
+; CHECK-NEXT:    vslidedown.vi v10, v8, 8


This is a fascinating accident, I hadn't considered this would catch the general deinterleave case. I'd been exploring a targeted change specific for deinterleave type cases, and this gets it generically. Nice!

(The new codegen is fairly neutral on m1, but if the result is > m1, the vslidedown and smaller shuffle helps a ton.)

preames · 2024-12-11T04:40:06Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll

-; CHECK-NEXT:    vse8.v v11, (a1)
+; CHECK-NEXT:    vslidedown.vi v10, v8, 8
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT:    vmerge.vvm v8, v10, v8, v0


I think this vmerge should fold into the previous vslidedown. Not sure why that didn't happen? (Separate issue)

Probably because the LMULs/VTs are mismatched? But in this case we might be able to get away with it because the vmerge has a smaller LMUL, i.e. we're only using the bottom elements

preames · 2024-12-11T17:33:55Z

Thought of a possible follow on extension - noting it for future reference.

In addition to the select case, we can merge the masks through other cheap (linear in LMUL) instructions. The two that occur are vslideup.vi and vslidedown.vi, but we might be able to apply other shuffle techniques (i.e. interleave). The vslidedown.vi subcase is potentially useful for e.g. deinterleave(N) where N is even.

In x264, there's a few kernels with shuffles like this: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16> Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together. vadd.vv v20, v16, v12 vsub.vv v12, v16, v12 vrgatherei16.vv v24, v20, v10 vrgatherei16.vv v24, v12, v16, v0.t However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42 %44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0> The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv: vadd.vv v20, v16, v12 vsub.vv v20, v16, v12, v0.t vrgatherei16.vv v24, v20, v10 This patch bails if either of the sources are a splat however, since that will usually already have some sort of cheaper lowering via vrgather.vi. This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60: https://lnt.lukelau.me/db_default/v4/nts/71?compare_to=70

github-actions · 2024-12-12T05:43:40Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' b26fe5b7e9833b7813459c6a0dc4577b350754f1 7a8309574e778a9db8f32557df37c83aff54b00c llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleaved-access.ll llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-deinterleave.ll

The following files introduce new uses of undef:

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

lukel97 requested review from preames, mikhailramalho, topperc and wangpc-pp December 10, 2024 15:50

llvmbot added the backend:RISC-V label Dec 10, 2024

lukel97 commented Dec 10, 2024

View reviewed changes

preames requested changes Dec 10, 2024

View reviewed changes

lukel97 added a commit that referenced this pull request Dec 11, 2024

[RISCV] Adjust vrgather.vv test to avoid disjoint indices. NFC

bc7449c

This is to prevent it from being caught up in the lowering in #119401

lukel97 force-pushed the shuffle-merge-distinct-lanes branch from 0ce99e4 to 337858a Compare December 11, 2024 03:49

preames approved these changes Dec 11, 2024

View reviewed changes

lukel97 added 5 commits December 12, 2024 13:29

Precommit tests

e6bda96

Swap operand order to match generic case

99e1f12

Bail on identity shuffles

d074e75

Update tests after rebase

7a83095

lukel97 force-pushed the shuffle-merge-distinct-lanes branch from 337858a to 7a83095 Compare December 12, 2024 05:40

lukel97 merged commit 088db86 into llvm:main Dec 12, 2024
4 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Merge shuffle sources if lanes are disjoint #119401

[RISCV] Merge shuffle sources if lanes are disjoint #119401

lukel97 commented Dec 10, 2024 •

edited

Loading

llvmbot commented Dec 10, 2024

lukel97 Dec 10, 2024

topperc Dec 10, 2024

topperc Dec 10, 2024

lukel97 commented Dec 10, 2024

preames left a comment

preames Dec 10, 2024

preames Dec 10, 2024

lukel97 Dec 11, 2024

preames Dec 10, 2024

lukel97 Dec 11, 2024

preames left a comment

preames Dec 11, 2024

preames Dec 11, 2024

preames Dec 11, 2024

lukel97 Dec 11, 2024 •

edited

Loading

preames commented Dec 11, 2024

github-actions bot commented Dec 12, 2024

		; CHECK-NEXT: lui a0, %hi(.LCPI71_0)
		; CHECK-NEXT: addi a0, a0, %lo(.LCPI71_0)

[RISCV] Merge shuffle sources if lanes are disjoint #119401

[RISCV] Merge shuffle sources if lanes are disjoint #119401

Conversation

lukel97 commented Dec 10, 2024 • edited Loading

llvmbot commented Dec 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 commented Dec 10, 2024

preames left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

preames commented Dec 11, 2024

github-actions bot commented Dec 12, 2024

lukel97 commented Dec 10, 2024 •

edited

Loading

lukel97 Dec 11, 2024 •

edited

Loading