-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known #108595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known #108595
Conversation
If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization. A bit of context. This change may look a bit strange on it's own given we are currently *not* scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that llvm#108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesIf we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization. A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that #108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime. Full diff: https://github.com/llvm/llvm-project/pull/108595.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2b5e7c47279284..470ca63ba5c875 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1756,6 +1756,14 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
Index = Index % Width;
}
+ // If exact VLEN is known, we will insert/extract into the appropriate
+ // subvector with no additional subvector insert/extract cost.
+ if (auto VLEN = ST->getRealVLen()) {
+ unsigned EltSize = LT.second.getScalarSizeInBits();
+ unsigned M1Max = *VLEN / EltSize;
+ Index = Index % M1Max;
+ }
+
// We could extract/insert the first element without vslidedown/vslideup.
if (Index == 0)
SlideCost = 0;
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
index 4a9e30888cdd1f..618b7bc8945a50 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
@@ -1501,3 +1501,55 @@ define void @extractelement_int_nonpoweroftwo(i32 %x) {
ret void
}
+
+define void @extractelement_vls(i32 %x) vscale_range(2,2) {
+; RV32V-LABEL: 'extractelement_vls'
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV32V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64V-LABEL: 'extractelement_vls'
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV64V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV32ZVE64X-LABEL: 'extractelement_vls'
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64ZVE64X-LABEL: 'extractelement_vls'
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+ %v32i32_0 = extractelement <32 x i32> undef, i32 0
+ %v32i32_4 = extractelement <32 x i32> undef, i32 4
+ %v32i32_5 = extractelement <32 x i32> undef, i32 5
+ %v32i32_8 = extractelement <32 x i32> undef, i32 8
+ %v32i32_9 = extractelement <32 x i32> undef, i32 9
+ %v32i32_11 = extractelement <32 x i32> undef, i32 11
+ %v32i32_12 = extractelement <32 x i32> undef, i32 12
+
+ ret void
+}
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
index 0616e0919b9d9a..c240a75066b10c 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
@@ -1491,3 +1491,55 @@ define void @insertelement_int_nonpoweroftwo(i32 %x) {
ret void
}
+
+define void @insertelement_vls(i32 %x) vscale_range(2,2) {
+; RV32V-LABEL: 'insertelement_vls'
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV32V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV32V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV32V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64V-LABEL: 'insertelement_vls'
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV64V-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV64V-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV64V-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV32ZVE64X-LABEL: 'insertelement_vls'
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV32ZVE64X-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64ZVE64X-LABEL: 'insertelement_vls'
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV64ZVE64X-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+ %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+ %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+ %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+ %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+ %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+ %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+ %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+
+ ret void
+}
|
llvm#108595) If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization. A bit of context. This change may look a bit strange on it's own given we are currently *not* scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that llvm#108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.
If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization.
A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that #108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.