[X86] Fix arithmetic error in extractVector #128052

daniel-zabawa · 2025-02-20T19:22:20Z

The computation of the element count for the result VT in extractVector is incorrect when vector width does not divide VT.getSizeInBits(), which can occur when the source vector element count is not a power of two, e.g. extracting a vectorWidth 256b vector from a 384b source.

This rewrites the expression so the division is exact given that vectorWidth is a multiple of the source element size.

The computation of the element count for the result VT in extractVector is incorrect when vector width does not divide VT.getSizeInBits(), which can occur when the source vector element count is not a power of two, e.g. extracting a vectorWidth 256b vector from a 384b source. This rewrites the expression so the division is exact given that vectorWidth is a multiple of the source element size.

llvmbot · 2025-02-20T21:32:14Z

@llvm/pr-subscribers-backend-x86

Author: Daniel Zabawa (daniel-zabawa)

Changes

The computation of the element count for the result VT in extractVector is incorrect when vector width does not divide VT.getSizeInBits(), which can occur when the source vector element count is not a power of two, e.g. extracting a vectorWidth 256b vector from a 384b source.

This rewrites the expression so the division is exact given that vectorWidth is a multiple of the source element size.

Full diff: https://github.com/llvm/llvm-project/pull/128052.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+3-3)
(added) llvm/test/CodeGen/X86/pr128052.ll (+30)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 1c9d43ce4c062..d79dd9d5cdd72 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -4066,9 +4066,9 @@ static SDValue extractSubVector(SDValue Vec, unsigned IdxVal, SelectionDAG &DAG,
                                 const SDLoc &dl, unsigned vectorWidth) {
   EVT VT = Vec.getValueType();
   EVT ElVT = VT.getVectorElementType();
-  unsigned Factor = VT.getSizeInBits() / vectorWidth;
-  EVT ResultVT = EVT::getVectorVT(*DAG.getContext(), ElVT,
-                                  VT.getVectorNumElements() / Factor);
+  unsigned ResultNumElts =
+      (VT.getVectorNumElements() * vectorWidth) / VT.getSizeInBits();
+  EVT ResultVT = EVT::getVectorVT(*DAG.getContext(), ElVT, ResultNumElts);
 
   // Extract the relevant vectorWidth bits.  Generate an EXTRACT_SUBVECTOR
   unsigned ElemsPerChunk = vectorWidth / ElVT.getSizeInBits();
diff --git a/llvm/test/CodeGen/X86/pr128052.ll b/llvm/test/CodeGen/X86/pr128052.ll
new file mode 100644
index 0000000000000..1a67e64b69832
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr128052.ll
@@ -0,0 +1,30 @@
+; Ensure assertion is not hit when folding concat of two contiguous extract_subvector operations
+; from a source with a non-power-of-two vector length.
+; RUN: llc -mattr=+avx2 < %s
+
+source_filename = "foo.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo(ptr noundef %pDst, ptr noundef %pSrc) {
+bb0:
+  %sptr1 = getelementptr i8, ptr %pSrc, i64 32
+  %load598 = load <12 x float>, ptr %sptr1, align 1
+  br label %bb1
+bb1:
+  %sptr0 = getelementptr i8, ptr %pSrc, i64 16
+  %load617 = load <12 x float>, ptr %sptr0, align 1
+  %42 = fsub contract <12 x float> %load617, %load598
+  %43 = shufflevector <12 x float> %42, <12 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %44 = fsub contract <12 x float> %load617, %load598
+  %45 = shufflevector <12 x float> %44, <12 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+  %46 = fsub contract <12 x float> %load617, %load598
+  %47 = shufflevector <12 x float> %46, <12 x float> poison, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
+  %dptr0 = getelementptr i8, ptr %pDst, i64 16
+  %dptr1 = getelementptr i8, ptr %pDst, i64 32 
+  %dptr2 = getelementptr i8, ptr %pDst, i64 48
+  store <4 x float> %43, ptr %dptr0, align 1
+  store <4 x float> %45, ptr %dptr1, align 1
+  store <4 x float> %47, ptr %dptr2, align 1
+  ret void
+}

phoebewang · 2025-02-21T07:17:53Z

llvm/test/CodeGen/X86/pr128052.ll

+source_filename = "foo.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"


These are not needed. And add the triple to RUN line, e.g,
; RUN: llc -mtriple=x86_64 -mattr=+avx2 < %s

Use ; RUN: llc -mtriple=x86_64 -mattr=+avx2 < %s | FileCheck %s and regenerate the checks with update_llc_test_checks

phoebewang · 2025-02-21T07:19:01Z

llvm/test/CodeGen/X86/pr128052.ll

+; Ensure assertion is not hit when folding concat of two contiguous extract_subvector operations
+; from a source with a non-power-of-two vector length.


I think we may use utils/update_llc_test_checks.py to generate assemble rather than rely on comments.

Do not use pr128052 as file name. This is used for issue number only.

RKSimon · 2025-02-21T09:30:36Z

llvm/test/CodeGen/X86/pr128052.ll

+  store <4 x float> %45, ptr %dptr1, align 1
+  store <4 x float> %47, ptr %dptr2, align 1
+  ret void
+}


This can probably be simplified further? And don't use numbered variables.

RKSimon · 2025-02-21T09:31:24Z

llvm/test/CodeGen/X86/pr128052.ll

+source_filename = "foo.c"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"


Use ; RUN: llc -mtriple=x86_64 -mattr=+avx2 < %s | FileCheck %s and regenerate the checks with update_llc_test_checks

RKSimon · 2025-02-21T09:31:57Z

llvm/lib/Target/X86/X86ISelLowering.cpp

-                                  VT.getVectorNumElements() / Factor);
+  unsigned ResultNumElts =
+      (VT.getVectorNumElements() * vectorWidth) / VT.getSizeInBits();
+  EVT ResultVT = EVT::getVectorVT(*DAG.getContext(), ElVT, ResultNumElts);


Add an assert to make sure its actually working?
assert(ResultVT.getSizeInBits() == vectorWidth && "Illegal subvector extraction")

phoebewang

LGTM.

RKSimon

LGTM

daniel-zabawa force-pushed the review/x86-extract-subvector-bug branch from 9195c2b to 1df623a Compare February 20, 2025 20:12

daniel-zabawa marked this pull request as ready for review February 20, 2025 21:31

llvmbot added the backend:X86 label Feb 20, 2025

phoebewang requested review from RKSimon and phoebewang February 21, 2025 07:16

phoebewang reviewed Feb 21, 2025

View reviewed changes

RKSimon reviewed Feb 21, 2025

View reviewed changes

reduce/rename test and add assertion

b338971

phoebewang approved these changes Feb 22, 2025

View reviewed changes

RKSimon approved these changes Feb 24, 2025

View reviewed changes

phoebewang merged commit d5148f0 into llvm:main Feb 24, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Fix arithmetic error in extractVector #128052

[X86] Fix arithmetic error in extractVector #128052

Uh oh!

daniel-zabawa commented Feb 20, 2025

Uh oh!

llvmbot commented Feb 20, 2025

Uh oh!

phoebewang Feb 21, 2025

Uh oh!

RKSimon Feb 21, 2025

Uh oh!

phoebewang Feb 21, 2025

Uh oh!

phoebewang Feb 21, 2025

Uh oh!

RKSimon Feb 21, 2025

Uh oh!

RKSimon Feb 21, 2025

Uh oh!

RKSimon Feb 21, 2025

Uh oh!

phoebewang left a comment

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Uh oh!

		; Ensure assertion is not hit when folding concat of two contiguous extract_subvector operations
		; from a source with a non-power-of-two vector length.

[X86] Fix arithmetic error in extractVector #128052

[X86] Fix arithmetic error in extractVector #128052

Uh oh!

Conversation

daniel-zabawa commented Feb 20, 2025

Uh oh!

llvmbot commented Feb 20, 2025

Uh oh!

phoebewang Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!