Skip to content

[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known #108595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

preames
Copy link
Collaborator

@preames preames commented Sep 13, 2024

If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization.

A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that #108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.

If we know an exact VLEN, then the index is effectively modulo the
number of elements in a single vector register.  Our lowering performs
this subvector optimization.

A bit of context.  This change may look a bit strange on it's own
given we are currently *not* scaling insert/extract cost by LMUL.
This costing decision needs to change, but is very intertwined with
SLP profitability, and is thus a bit hard to adjust.  I'm hoping
that llvm#108419 will let me
start to untangle this.  This change is basically a case of finding
a subset I can tackle before other dependencies are in place which
does no real harm in the meantime.
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization.

A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that #108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.


Full diff: https://github.com/llvm/llvm-project/pull/108595.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+8)
  • (modified) llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll (+52)
  • (modified) llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll (+52)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2b5e7c47279284..470ca63ba5c875 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1756,6 +1756,14 @@ InstructionCost RISCVTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
       Index = Index % Width;
     }
 
+    // If exact VLEN is known, we will insert/extract into the appropriate
+    // subvector with no additional subvector insert/extract cost.
+    if (auto VLEN = ST->getRealVLen()) {
+      unsigned EltSize = LT.second.getScalarSizeInBits();
+      unsigned M1Max = *VLEN / EltSize;
+      Index = Index % M1Max;
+    }
+
     // We could extract/insert the first element without vslidedown/vslideup.
     if (Index == 0)
       SlideCost = 0;
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
index 4a9e30888cdd1f..618b7bc8945a50 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-extractelement.ll
@@ -1501,3 +1501,55 @@ define void @extractelement_int_nonpoweroftwo(i32 %x) {
 
   ret void
 }
+
+define void @extractelement_vls(i32 %x) vscale_range(2,2) {
+; RV32V-LABEL: 'extractelement_vls'
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64V-LABEL: 'extractelement_vls'
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV32ZVE64X-LABEL: 'extractelement_vls'
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64ZVE64X-LABEL: 'extractelement_vls'
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = extractelement <32 x i32> undef, i32 0
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = extractelement <32 x i32> undef, i32 4
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = extractelement <32 x i32> undef, i32 5
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = extractelement <32 x i32> undef, i32 8
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = extractelement <32 x i32> undef, i32 9
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = extractelement <32 x i32> undef, i32 11
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = extractelement <32 x i32> undef, i32 12
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  %v32i32_0 = extractelement <32 x i32> undef, i32 0
+  %v32i32_4 = extractelement <32 x i32> undef, i32 4
+  %v32i32_5 = extractelement <32 x i32> undef, i32 5
+  %v32i32_8 = extractelement <32 x i32> undef, i32 8
+  %v32i32_9 = extractelement <32 x i32> undef, i32 9
+  %v32i32_11 = extractelement <32 x i32> undef, i32 11
+  %v32i32_12 = extractelement <32 x i32> undef, i32 12
+
+  ret void
+}
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
index 0616e0919b9d9a..c240a75066b10c 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-insertelement.ll
@@ -1491,3 +1491,55 @@ define void @insertelement_int_nonpoweroftwo(i32 %x) {
 
   ret void
 }
+
+define void @insertelement_vls(i32 %x) vscale_range(2,2) {
+; RV32V-LABEL: 'insertelement_vls'
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV32V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64V-LABEL: 'insertelement_vls'
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV64V-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV32ZVE64X-LABEL: 'insertelement_vls'
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV32ZVE64X-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; RV64ZVE64X-LABEL: 'insertelement_vls'
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+; RV64ZVE64X-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  %v32i32_0 = insertelement <32 x i32> undef, i32 undef, i32 0
+  %v32i32_4 = insertelement <32 x i32> undef, i32 undef, i32 4
+  %v32i32_5 = insertelement <32 x i32> undef, i32 undef, i32 5
+  %v32i32_8 = insertelement <32 x i32> undef, i32 undef, i32 8
+  %v32i32_9 = insertelement <32 x i32> undef, i32 undef, i32 9
+  %v32i32_11 = insertelement <32 x i32> undef, i32 undef, i32 11
+  %v32i32_12 = insertelement <32 x i32> undef, i32 undef, i32 12
+
+  ret void
+}

@preames preames merged commit 2e7c7d2 into llvm:main Sep 17, 2024
11 checks passed
@preames preames deleted the pr-riscv-tti-extractinsert-exact-vlen branch September 17, 2024 15:43
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
llvm#108595)

If we know an exact VLEN, then the index is effectively modulo the
number of elements in a single vector register. Our lowering performs
this subvector optimization.

A bit of context. This change may look a bit strange on it's own given
we are currently *not* scaling insert/extract cost by LMUL. This costing
decision needs to change, but is very intertwined with SLP
profitability, and is thus a bit hard to adjust. I'm hoping that
llvm#108419 will let me start to
untangle this. This change is basically a case of finding a subset I can
tackle before other dependencies are in place which does no real harm in
the meantime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants