-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ERMV v16f32/v16i32 nodes if the upper elements are not demanded Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
@llvm/pr-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon) ChangesMissed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents. Full diff: https://github.com/llvm/llvm-project/pull/134890.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 47ac1ee571269..908b81d896e34 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
case X86ISD::VPERMV: {
SmallVector<int, 16> Mask;
SmallVector<SDValue, 2> Ops;
- if ((VT.is256BitVector() || Subtarget.hasVLX()) &&
+ // We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants.
+ if ((VT.is256BitVector() || Subtarget.hasVLX() || VT == MVT::v16i32 ||
+ VT == MVT::v16f32) &&
getTargetShuffleMask(Op, /*AllowSentinelZero=*/false, Ops, Mask)) {
// For lane-crossing shuffles, only split in half in case we're still
// referencing higher elements.
diff --git a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
index b1efb416014b0..7df80ee9f175b 100644
--- a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
+++ b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
@@ -491,8 +491,8 @@ define <4 x float> @test_v16f32_0_1_3_6 (<16 x float> %v) {
; ALL-LABEL: test_v16f32_0_1_3_6:
; ALL: # %bb.0:
; ALL-NEXT: vpmovsxbd {{.*#+}} xmm1 = [0,1,3,6]
-; ALL-NEXT: vpermps %zmm0, %zmm1, %zmm0
-; ALL-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
+; ALL-NEXT: vpermps %ymm0, %ymm1, %ymm0
+; ALL-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
; ALL-NEXT: vzeroupper
; ALL-NEXT: retq
%res = shufflevector <16 x float> %v, <16 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 6>
|
@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode( | |||
case X86ISD::VPERMV: { | |||
SmallVector<int, 16> Mask; | |||
SmallVector<SDValue, 2> Ops; | |||
if ((VT.is256BitVector() || Subtarget.hasVLX()) && | |||
// We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it for v8i64/v8f64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not - avx2 vpermq takes an immediate (X86ISD::VPERMI) - we already handle 512->ymm X86ISD::VPERMI though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it's quite queer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…ERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890) Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
…ERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890) Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
Local branch origin/amd-gfx c4e1c89 Merged main:c1e95b2e5e61 into origin/amd-gfx:e60c0ac07789 Remote branch main 74f69c4 [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890)
Missed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.