-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LLVM][CodeGen][SVE] Improve custom lowering for EXTRACT_SUBVECTOR. #90963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM][CodeGen][SVE] Improve custom lowering for EXTRACT_SUBVECTOR. #90963
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Paul Walker (paulwalker-arm) ChangesWe can extract any legal fixed length vector from a scalable vector by using VECTOR_SPLICE. I've also taken the time to simplify the code a little. Full diff: https://github.com/llvm/llvm-project/pull/90963.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2af679e0755b54..62071641eb94eb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -13897,7 +13897,8 @@ AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,
SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
SelectionDAG &DAG) const {
- assert(Op.getValueType().isFixedLengthVector() &&
+ EVT VT = Op.getValueType();
+ assert(VT.isFixedLengthVector() &&
"Only cases that extract a fixed length vector are supported!");
EVT InVT = Op.getOperand(0).getValueType();
@@ -13905,15 +13906,18 @@ SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
unsigned Size = Op.getValueSizeInBits();
// If we don't have legal types yet, do nothing
- if (!DAG.getTargetLoweringInfo().isTypeLegal(InVT))
+ if (!isTypeLegal(InVT))
return SDValue();
if (InVT.isScalableVector()) {
// This will be matched by custom code during ISelDAGToDAG.
- if (Idx == 0 && isPackedVectorType(InVT, DAG))
+ if (Idx == 0)
return Op;
- return SDValue();
+ SDLoc DL(Op);
+ SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, InVT, Op.getOperand(0),
+ Op.getOperand(0), Op.getOperand(1));
+ return convertFromScalableVector(DAG, VT, Splice);
}
// This will get lowered to an appropriate EXTRACT_SUBREG in ISel.
@@ -13934,8 +13938,8 @@ SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
convertToScalableVector(DAG, ContainerVT, Op.getOperand(0));
SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, ContainerVT, NewInVec,
- NewInVec, DAG.getConstant(Idx, DL, MVT::i64));
- return convertFromScalableVector(DAG, Op.getValueType(), Splice);
+ NewInVec, Op.getOperand(1));
+ return convertFromScalableVector(DAG, VT, Splice);
}
return SDValue();
diff --git a/llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll b/llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
index b9c531fe335261..f2bc795d6b3bdc 100644
--- a/llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
@@ -143,15 +143,8 @@ define <4 x float> @extract_v4f32_nxv16f32_12(<vscale x 16 x float> %arg) {
define <2 x float> @extract_v2f32_nxv16f32_2(<vscale x 16 x float> %arg) {
; CHECK-LABEL: extract_v2f32_nxv16f32_2:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
-; CHECK-NEXT: .cfi_offset w29, -16
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: st1w { z0.s }, p0, [sp]
-; CHECK-NEXT: ldr d0, [sp, #8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
%ext = call <2 x float> @llvm.vector.extract.v2f32.nxv16f32(<vscale x 16 x float> %arg, i64 2)
ret <2 x float> %ext
@@ -274,15 +267,8 @@ define <4 x i3> @extract_v4i3_nxv32i3_16(<vscale x 32 x i3> %arg) {
define <2 x i32> @extract_v2i32_nxv16i32_2(<vscale x 16 x i32> %arg) {
; CHECK-LABEL: extract_v2i32_nxv16i32_2:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
-; CHECK-NEXT: .cfi_offset w29, -16
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: st1w { z0.s }, p0, [sp]
-; CHECK-NEXT: ldr d0, [sp, #8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret
%ext = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %arg, i64 2)
ret <2 x i32> %ext
diff --git a/llvm/test/CodeGen/AArch64/sve-extract-fixed-vector.ll b/llvm/test/CodeGen/AArch64/sve-extract-fixed-vector.ll
index 88268104889fde..b05b46a75b698d 100644
--- a/llvm/test/CodeGen/AArch64/sve-extract-fixed-vector.ll
+++ b/llvm/test/CodeGen/AArch64/sve-extract-fixed-vector.ll
@@ -15,20 +15,8 @@ define <2 x i64> @extract_v2i64_nxv2i64(<vscale x 2 x i64> %vec) nounwind {
define <2 x i64> @extract_v2i64_nxv2i64_idx2(<vscale x 2 x i64> %vec) nounwind {
; CHECK-LABEL: extract_v2i64_nxv2i64_idx2:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: cntd x8
-; CHECK-NEXT: mov w9, #2 // =0x2
-; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: sub x8, x8, #2
-; CHECK-NEXT: cmp x8, #2
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: csel x8, x8, x9, lo
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: lsl x8, x8, #3
-; CHECK-NEXT: ldr q0, [x9, x8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
%retval = call <2 x i64> @llvm.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64> %vec, i64 2)
ret <2 x i64> %retval
@@ -48,20 +36,8 @@ define <4 x i32> @extract_v4i32_nxv4i32(<vscale x 4 x i32> %vec) nounwind {
define <4 x i32> @extract_v4i32_nxv4i32_idx4(<vscale x 4 x i32> %vec) nounwind {
; CHECK-LABEL: extract_v4i32_nxv4i32_idx4:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: cntw x8
-; CHECK-NEXT: mov w9, #4 // =0x4
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: sub x8, x8, #4
-; CHECK-NEXT: cmp x8, #4
-; CHECK-NEXT: st1w { z0.s }, p0, [sp]
-; CHECK-NEXT: csel x8, x8, x9, lo
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: lsl x8, x8, #2
-; CHECK-NEXT: ldr q0, [x9, x8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
%retval = call <4 x i32> @llvm.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> %vec, i64 4)
ret <4 x i32> %retval
@@ -82,18 +58,9 @@ define <4 x i32> @extract_v4i32_nxv2i32(<vscale x 2 x i32> %vec) nounwind #1 {
define <4 x i32> @extract_v4i32_nxv2i32_idx4(<vscale x 2 x i32> %vec) nounwind #1 {
; CHECK-LABEL: extract_v4i32_nxv2i32_idx4:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: mov x8, #4 // =0x4
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p1.d, vl4
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: ld1d { z0.d }, p1/z, [x9, x8, lsl #3]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #32
; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <4 x i32> @llvm.vector.extract.v4i32.nxv2i32(<vscale x 2 x i32> %vec, i64 4)
ret <4 x i32> %retval
@@ -113,20 +80,8 @@ define <8 x i16> @extract_v8i16_nxv8i16(<vscale x 8 x i16> %vec) nounwind {
define <8 x i16> @extract_v8i16_nxv8i16_idx8(<vscale x 8 x i16> %vec) nounwind {
; CHECK-LABEL: extract_v8i16_nxv8i16_idx8:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: cnth x8
-; CHECK-NEXT: mov w9, #8 // =0x8
-; CHECK-NEXT: ptrue p0.h
-; CHECK-NEXT: sub x8, x8, #8
-; CHECK-NEXT: cmp x8, #8
-; CHECK-NEXT: st1h { z0.h }, p0, [sp]
-; CHECK-NEXT: csel x8, x8, x9, lo
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: lsl x8, x8, #1
-; CHECK-NEXT: ldr q0, [x9, x8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
%retval = call <8 x i16> @llvm.vector.extract.v8i16.nxv8i16(<vscale x 8 x i16> %vec, i64 8)
ret <8 x i16> %retval
@@ -147,18 +102,9 @@ define <8 x i16> @extract_v8i16_nxv4i16(<vscale x 4 x i16> %vec) nounwind #1 {
define <8 x i16> @extract_v8i16_nxv4i16_idx8(<vscale x 4 x i16> %vec) nounwind #1 {
; CHECK-LABEL: extract_v8i16_nxv4i16_idx8:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: mov x8, #8 // =0x8
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p1.s, vl8
-; CHECK-NEXT: st1w { z0.s }, p0, [sp]
-; CHECK-NEXT: ld1w { z0.s }, p1/z, [x9, x8, lsl #2]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #32
; CHECK-NEXT: uzp1 z0.h, z0.h, z0.h
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <8 x i16> @llvm.vector.extract.v8i16.nxv4i16(<vscale x 4 x i16> %vec, i64 8)
ret <8 x i16> %retval
@@ -180,19 +126,10 @@ define <8 x i16> @extract_v8i16_nxv2i16(<vscale x 2 x i16> %vec) nounwind #1 {
define <8 x i16> @extract_v8i16_nxv2i16_idx8(<vscale x 2 x i16> %vec) nounwind #1 {
; CHECK-LABEL: extract_v8i16_nxv2i16_idx8:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: mov x8, #8 // =0x8
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p1.d, vl8
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: ld1d { z0.d }, p1/z, [x9, x8, lsl #3]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #64
; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s
; CHECK-NEXT: uzp1 z0.h, z0.h, z0.h
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <8 x i16> @llvm.vector.extract.v8i16.nxv2i16(<vscale x 2 x i16> %vec, i64 8)
ret <8 x i16> %retval
@@ -212,19 +149,8 @@ define <16 x i8> @extract_v16i8_nxv16i8(<vscale x 16 x i8> %vec) nounwind {
define <16 x i8> @extract_v16i8_nxv16i8_idx16(<vscale x 16 x i8> %vec) nounwind {
; CHECK-LABEL: extract_v16i8_nxv16i8_idx16:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: rdvl x8, #1
-; CHECK-NEXT: ptrue p0.b
-; CHECK-NEXT: mov w9, #16 // =0x10
-; CHECK-NEXT: sub x8, x8, #16
-; CHECK-NEXT: cmp x8, #16
-; CHECK-NEXT: st1b { z0.b }, p0, [sp]
-; CHECK-NEXT: csel x8, x8, x9, lo
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ldr q0, [x9, x8]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
%retval = call <16 x i8> @llvm.vector.extract.v16i8.nxv16i8(<vscale x 16 x i8> %vec, i64 16)
ret <16 x i8> %retval
@@ -245,18 +171,9 @@ define <16 x i8> @extract_v16i8_nxv8i8(<vscale x 8 x i8> %vec) nounwind #1 {
define <16 x i8> @extract_v16i8_nxv8i8_idx16(<vscale x 8 x i8> %vec) nounwind #1 {
; CHECK-LABEL: extract_v16i8_nxv8i8_idx16:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.h
-; CHECK-NEXT: mov x8, #16 // =0x10
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p1.h, vl16
-; CHECK-NEXT: st1h { z0.h }, p0, [sp]
-; CHECK-NEXT: ld1h { z0.h }, p1/z, [x9, x8, lsl #1]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #32
; CHECK-NEXT: uzp1 z0.b, z0.b, z0.b
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <16 x i8> @llvm.vector.extract.v16i8.nxv8i8(<vscale x 8 x i8> %vec, i64 16)
ret <16 x i8> %retval
@@ -278,19 +195,10 @@ define <16 x i8> @extract_v16i8_nxv4i8(<vscale x 4 x i8> %vec) nounwind #1 {
define <16 x i8> @extract_v16i8_nxv4i8_idx16(<vscale x 4 x i8> %vec) nounwind #1 {
; CHECK-LABEL: extract_v16i8_nxv4i8_idx16:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.s
-; CHECK-NEXT: mov x8, #16 // =0x10
-; CHECK-NEXT: mov x9, sp
-; CHECK-NEXT: ptrue p1.s, vl16
-; CHECK-NEXT: st1w { z0.s }, p0, [sp]
-; CHECK-NEXT: ld1w { z0.s }, p1/z, [x9, x8, lsl #2]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #64
; CHECK-NEXT: uzp1 z0.h, z0.h, z0.h
; CHECK-NEXT: uzp1 z0.b, z0.b, z0.b
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <16 x i8> @llvm.vector.extract.v16i8.nxv4i8(<vscale x 4 x i8> %vec, i64 16)
ret <16 x i8> %retval
@@ -313,17 +221,11 @@ define <16 x i8> @extract_v16i8_nxv2i8(<vscale x 2 x i8> %vec) nounwind #1 {
define <16 x i8> @extract_v16i8_nxv2i8_idx16(<vscale x 2 x i8> %vec) nounwind #1 {
; CHECK-LABEL: extract_v16i8_nxv2i8_idx16:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: ld1d { z0.d }, p0/z, [sp]
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #128
; CHECK-NEXT: uzp1 z0.s, z0.s, z0.s
; CHECK-NEXT: uzp1 z0.h, z0.h, z0.h
; CHECK-NEXT: uzp1 z0.b, z0.b, z0.b
; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <16 x i8> @llvm.vector.extract.v16i8.nxv2i8(<vscale x 2 x i8> %vec, i64 16)
ret <16 x i8> %retval
@@ -434,13 +336,8 @@ define <16 x i1> @extract_v16i1_nxv16i1(<vscale x 16 x i1> %inmask) {
define <2 x i64> @extract_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec) nounwind #0 {
; CHECK-LABEL: extract_fixed_v2i64_nxv2i64:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
-; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: ldr q0, [sp, #16]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #16
+; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret
%retval = call <2 x i64> @llvm.vector.extract.v2i64.nxv2i64(<vscale x 2 x i64> %vec, i64 2)
ret <2 x i64> %retval
@@ -449,14 +346,9 @@ define <2 x i64> @extract_fixed_v2i64_nxv2i64(<vscale x 2 x i64> %vec) nounwind
define void @extract_fixed_v4i64_nxv2i64(<vscale x 2 x i64> %vec, ptr %p) nounwind #0 {
; CHECK-LABEL: extract_fixed_v4i64_nxv2i64:
; CHECK: // %bb.0:
-; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT: addvl sp, sp, #-1
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #32
; CHECK-NEXT: ptrue p0.d
-; CHECK-NEXT: st1d { z0.d }, p0, [sp]
-; CHECK-NEXT: ld1d { z0.d }, p0/z, [sp]
; CHECK-NEXT: st1d { z0.d }, p0, [x0]
-; CHECK-NEXT: addvl sp, sp, #1
-; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
; CHECK-NEXT: ret
%retval = call <4 x i64> @llvm.vector.extract.v4i64.nxv2i64(<vscale x 2 x i64> %vec, i64 4)
store <4 x i64> %retval, ptr %p
|
e01de76
to
6f0589b
Compare
The original commit was indeed broken for the cases where fixed length vectors are extracted from an unpacked scalable vector. Fixing this uncovered other issues (hence the VECTOR_SPLICE PR) and I've upstream additional tests that would have caught the issue. I've pushed the full chain to this PR for now but only the final commit is relevant. Feel free to review but I'll keep the PR in draft until I can rebase to remove the dependent work. |
We can extract any legal fixed length vector from a scalable vector by using VECTOR_SPLICE.
6f0589b
to
e2affa3
Compare
; CHECK-NEXT: uunpklo z2.d, z1.s | ||
; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8 | ||
; CHECK-NEXT: uunpklo z3.d, z0.s | ||
; CHECK-NEXT: mov z3.d, z1.d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated this (via --debug=only=isel) and the only difference in the DAG prior to instruction selection is a few nodes having different numerical IDs. The extra mov instruction disappears when using a Neoverse scheduling model so this looks like an existing issue with in-order scheduling.
Given these are streaming mode functions I believe the current output is acceptable but please shout if you think otherwise.
Rebased to remove dependent commits. |
// This will be matched by custom code during ISelDAGToDAG. | ||
if (Idx == 0 && isPackedVectorType(InVT, DAG)) | ||
if (InVT.is128BitVector()) { | ||
assert(VT.is64BitVector() && "Extracting unexpected vector type!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rationale here that at this point all types should be legal and therefore the only possible result VTs are 64-bit and 128-bit. I guess we are assuming that for 128-bit result types we're relying on the EXTRACT_SUBVECTOR being folded away before hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can be sure that if the input is type legal then the result must also be type legal. That means for operation legalisation the only NEON sized combination that can happen is extracting a 64-bit vector from a 128-bit vector. The only exception is when the result type matches the input type and the index is zero, which is a NOP and optimised by SelectionDAG::getNode()
and so is not worth considering at this point in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We can extract any legal fixed length vector from a scalable vector by using VECTOR_SPLICE.