Skip to content

[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded #134890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 9, 2025

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Apr 8, 2025

Missed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.

…ERMV v16f32/v16i32 nodes if the upper elements are not demanded

Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
@llvmbot
Copy link
Member

llvmbot commented Apr 8, 2025

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Missed in #133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.


Full diff: https://github.com/llvm/llvm-project/pull/134890.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3-1)
  • (modified) llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll (+2-2)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 47ac1ee571269..908b81d896e34 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
     case X86ISD::VPERMV: {
       SmallVector<int, 16> Mask;
       SmallVector<SDValue, 2> Ops;
-      if ((VT.is256BitVector() || Subtarget.hasVLX()) &&
+      // We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants.
+      if ((VT.is256BitVector() || Subtarget.hasVLX() || VT == MVT::v16i32 ||
+           VT == MVT::v16f32) &&
           getTargetShuffleMask(Op, /*AllowSentinelZero=*/false, Ops, Mask)) {
         // For lane-crossing shuffles, only split in half in case we're still
         // referencing higher elements.
diff --git a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
index b1efb416014b0..7df80ee9f175b 100644
--- a/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
+++ b/llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
@@ -491,8 +491,8 @@ define <4 x float> @test_v16f32_0_1_3_6 (<16 x float> %v) {
 ; ALL-LABEL: test_v16f32_0_1_3_6:
 ; ALL:       # %bb.0:
 ; ALL-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [0,1,3,6]
-; ALL-NEXT:    vpermps %zmm0, %zmm1, %zmm0
-; ALL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; ALL-NEXT:    vpermps %ymm0, %ymm1, %ymm0
+; ALL-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
 ; ALL-NEXT:    vzeroupper
 ; ALL-NEXT:    retq
   %res = shufflevector <16 x float> %v, <16 x float> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 6>

@@ -43810,7 +43810,9 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
case X86ISD::VPERMV: {
SmallVector<int, 16> Mask;
SmallVector<SDValue, 2> Ops;
if ((VT.is256BitVector() || Subtarget.hasVLX()) &&
// We can always split v16i32/v16f32 AVX512 to v8i32/v8f32 AVX2 variants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do it for v8i64/v8f64?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not - avx2 vpermq takes an immediate (X86ISD::VPERMI) - we already handle 512->ymm X86ISD::VPERMI though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's quite queer.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@RKSimon RKSimon merged commit 74f69c4 into llvm:main Apr 9, 2025
11 checks passed
@RKSimon RKSimon deleted the x86-demandelts-avx2-vpermv branch April 9, 2025 10:14
AllinLeeYL pushed a commit to AllinLeeYL/llvm-project that referenced this pull request Apr 10, 2025
…ERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890)

Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…ERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890)

Missed in llvm#133923 - even without AVX512VL, we can replace VPERMV v16f32/v16i32 nodes with the AVX2 v8f32/v8i32 equivalents.
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Apr 29, 2025
Local branch origin/amd-gfx c4e1c89 Merged main:c1e95b2e5e61 into origin/amd-gfx:e60c0ac07789
Remote branch main 74f69c4 [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV v16f32/v16i32 nodes if the upper elements are not demanded (llvm#134890)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants