[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

RKSimon · 2025-04-08T17:31:31Z

Missed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

…ERMV v16f32/v16i32 nodes if the upper elements are not demanded Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

llvmbot · 2025-04-08T17:32:07Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Missed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

Full diff: https://github.com/llvm/llvm-project/pull/134890.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3-1)
(modified) llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll (+2-2)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 47ac1ee571269..908b81d896e34 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
     case X86ISD::VPERMV: {
       SmallVector<int, 16> Mask;
       SmallVector<SDValue, 2> Ops;
-      if ((VT.is256BitVector() || Subtarget.hasVLX()) &&
+      // We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants.
+      if ((VT.is256BitVector() || Subtarget.hasVLX() || VT == MVT::v16i32 ||
+           VT == MVT::v16f32) &&
           getTargetShuffleMask(Op, /*AllowSentinelZero=*/false, Ops, Mask)) {
         // For lane-crossing shuffles, only split in half in case we're still
         // referencing higher elements.
diff --git a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
index b1efb416014b0..7df80ee9f175b 100644
--- a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
+++ b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
@@ -491,8 +491,8 @@ define <4 x float> @test_v16f32_0_1_3_6 (<16 x float> %v) {
 ; ALL-LABEL: test_v16f32_0_1_3_6:
 ; ALL:       # %bb.0:
 ; ALL-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [0,1,3,6]
-; ALL-NEXT:    vpermps %zmm0, %zmm1, %zmm0
-; ALL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; ALL-NEXT:    vpermps %ymm0, %ymm1, %ymm0
+; ALL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; ALL-NEXT:    vzeroupper
 ; ALL-NEXT:    retq
   %res = shufflevector <16 x float> %v, <16 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 6>

phoebewang · 2025-04-09T06:15:24Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
    case X86ISD::VPERMV: {
      SmallVector<int, 16> Mask;
      SmallVector<SDValue, 2> Ops;
-      if ((VT.is256BitVector() || Subtarget.hasVLX()) &&
+      // We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants.


Can we do it for v8i64/v8f64?

Unfortunately not - avx2 vpermq takes an immediate (X86ISD::VPERMI) - we already handle 512->ymm X86ISD::VPERMI though

Ok, it's quite queer.

phoebewang

LGTM.

…ERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890) Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

Local branch origin/amd-gfx c4e1c89 Merged main:c1e95b2e5e61 into origin/amd-gfx:e60c0ac07789 Remote branch main 74f69c4 [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890)

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VP…

c46aaed

…ERMV v16f32/v16i32 nodes if the upper elements are not demanded Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

RKSimon assigned phoebewang Apr 8, 2025

llvmbot added the backend:X86 label Apr 8, 2025

RKSimon unassigned phoebewang Apr 8, 2025

RKSimon requested a review from phoebewang April 8, 2025 17:31

phoebewang reviewed Apr 9, 2025

View reviewed changes

phoebewang approved these changes Apr 9, 2025

View reviewed changes

Merge branch 'main' into x86-demandelts-avx2-vpermv

b31012e

RKSimon merged commit 74f69c4 into llvm:main Apr 9, 2025
11 checks passed

RKSimon deleted the x86-demandelts-avx2-vpermv branch April 9, 2025 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

Uh oh!

RKSimon commented Apr 8, 2025

Uh oh!

llvmbot commented Apr 8, 2025

Uh oh!

phoebewang Apr 9, 2025

Uh oh!

RKSimon Apr 9, 2025

Uh oh!

phoebewang Apr 9, 2025

Uh oh!

phoebewang left a comment

Uh oh!

Uh oh!

Uh oh!

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

Uh oh!

Conversation

RKSimon commented Apr 8, 2025

Uh oh!

llvmbot commented Apr 8, 2025

Uh oh!

phoebewang Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!