Skip to content

[PowerPC] adjust cost for vector insert/extract with non const index #79092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 37 additions & 25 deletions llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -697,39 +697,51 @@ InstructionCost PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,

return Cost;

} else if (Val->getScalarType()->isIntegerTy() && Index != -1U) {
} else if (Val->getScalarType()->isIntegerTy()) {
unsigned EltSize = Val->getScalarSizeInBits();
// Computing on 1 bit values requires extra mask or compare operations.
unsigned MaskCost = VecMaskCost && EltSize == 1 ? 1 : 0;
unsigned MaskCostForOneBitSize = (VecMaskCost && EltSize == 1) ? 1 : 0;
// Computing on non const index requires extra mask or compare operations.
unsigned MaskCostForIdx = (Index != -1U) ? 0 : 1;
if (ST->hasP9Altivec()) {
if (ISD == ISD::INSERT_VECTOR_ELT)
// A move-to VSR and a permute/insert. Assume vector operation cost
// for both (cost will be 2x on P9).
return 2 * CostFactor;

// It's an extract. Maybe we can do a cheap move-from VSR.
unsigned EltSize = Val->getScalarSizeInBits();
if (EltSize == 64) {
unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;
if (Index == MfvsrdIndex)
return 1;
} else if (EltSize == 32) {
unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;
if (Index == MfvsrwzIndex)
return 1;
}

// We need a vector extract (or mfvsrld). Assume vector operation cost.
// The cost of the load constant for a vector extract is disregarded
// (invariant, easily schedulable).
return CostFactor + MaskCost;
// P10 has vxform insert which can handle non const index. The MaskCost is
Copy link
Collaborator

@RolandF77 RolandF77 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MaskCost => MaskCostForIdx

// for masking the index.
// P9 has insert for const index. A move-to VSR and a permute/insert.
// Assume vector operation cost for both (cost will be 2x on P9).
if (ISD == ISD::INSERT_VECTOR_ELT) {
if (ST->hasP10Vector())
return CostFactor + MaskCostForIdx;
else if (Index != -1U)
return 2 * CostFactor;
} else if (ISD == ISD::EXTRACT_VECTOR_ELT) {
// It's an extract. Maybe we can do a cheap move-from VSR.
unsigned EltSize = Val->getScalarSizeInBits();
if (EltSize == 64) {
// FIXME: no need to worry about endian, P9 has both mfvsrd/mfvsrld.
unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;
if (Index == MfvsrdIndex)
return 1;
} else if (EltSize == 32) {
unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;
if (Index == MfvsrwzIndex)
return 1;

// For other indexs like non const, P9 has vxform extract. The
// MaskCostForIdx is for masking the index.
return CostFactor + MaskCostForIdx;
}

} else if (ST->hasDirectMove()) {
// We need a vector extract (or mfvsrld). Assume vector operation cost.
// The cost of the load constant for a vector extract is disregarded
// (invariant, easily schedulable).
return CostFactor + MaskCostForOneBitSize + MaskCostForIdx;
}
} else if (ST->hasDirectMove() && Index != -1U) {
// Assume permute has standard cost.
// Assume move-to/move-from VSR have 2x standard cost.
if (ISD == ISD::INSERT_VECTOR_ELT)
return 3;
return 3 + MaskCost;
return 3 + MaskCostForOneBitSize;
}
}

Expand Down
36 changes: 18 additions & 18 deletions llvm/test/Analysis/CostModel/PowerPC/insert_extract.ll
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ define i32 @insert(i32 %arg) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; CHECK-P10-LABEL: 'insert'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <4 x i32> undef, i32 %arg, i32 0
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <4 x i32> undef, i32 %arg, i32 0
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
%x = insertelement <4 x i32> undef, i32 %arg, i32 0
Expand Down Expand Up @@ -109,7 +109,7 @@ define void @test4xi32(<4 x i32> %v1, i32 %x1) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; CHECK-P10-LABEL: 'test4xi32'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
Expand Down Expand Up @@ -239,7 +239,7 @@ define <2 x i64> @insert_i64_x(<2 x i64> %dest, i64 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %x
;
; CHECK-P10-LABEL: 'insert_i64_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %x
;
%x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
Expand All @@ -264,7 +264,7 @@ define <4 x i32> @insert_i32_x(<4 x i32> %dest, i32 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %x
;
; CHECK-P10-LABEL: 'insert_i32_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %x
;
%x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
Expand All @@ -289,7 +289,7 @@ define <8 x i16> @insert_i16_x(<8 x i16> %dest, i16 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %x
;
; CHECK-P10-LABEL: 'insert_i16_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %x
;
%x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
Expand All @@ -314,7 +314,7 @@ define <16 x i8> @insert_i8_x(<16 x i8> %dest, i8 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %x
;
; CHECK-P10-LABEL: 'insert_i8_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %x
;
%x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
Expand All @@ -331,15 +331,15 @@ define i64 @extract_i64_x(<2 x i64> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P9BE-LABEL: 'extract_i64_x'
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P9LE-LABEL: 'extract_i64_x'
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P10-LABEL: 'extract_i64_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
%x = extractelement <2 x i64> %arg, i32 %idx
Expand All @@ -356,15 +356,15 @@ define i32 @extract_i32_x(<4 x i32> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P9BE-LABEL: 'extract_i32_x'
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P9LE-LABEL: 'extract_i32_x'
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P10-LABEL: 'extract_i32_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
%x = extractelement <4 x i32> %arg, i32 %idx
Expand All @@ -381,15 +381,15 @@ define i16 @extract_i16_x(<8 x i16> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P9BE-LABEL: 'extract_i16_x'
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P9LE-LABEL: 'extract_i16_x'
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P10-LABEL: 'extract_i16_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
%x = extractelement <8 x i16> %arg, i32 %idx
Expand All @@ -406,15 +406,15 @@ define i8 @extract_i8_x(<16 x i8> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P9BE-LABEL: 'extract_i8_x'
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P9LE-LABEL: 'extract_i8_x'
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P10-LABEL: 'extract_i8_x'
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
%x = extractelement <16 x i8> %arg, i32 %idx
Expand Down