[RISCV] Add +optimized-nfN-segment-load-store #114414

lukel97 · 2024-10-31T15:09:13Z

This is a follow up to #111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput

This adds tuning features to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60.

It also enables +optimized-nf2-segment-load-store by default in the generic tuning to maintain the previous behaviour when compiled without -mcpu or -mtune.

This is a follow up to llvm#111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput This adds a tuning feature to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60.

llvmbot · 2024-10-31T15:09:47Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is a follow up to #111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput

This adds a tuning feature to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60.

Full diff: https://github.com/llvm/llvm-project/pull/114414.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+7)
(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+3-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+5-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll (+47-23)

diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 1e4bf1b8830bcc..719a39944c6836 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1386,6 +1386,13 @@ def TuneOptimizedZeroStrideLoad
                       "true", "Optimized (perform fewer memory operations)"
                       "zero-stride vector load">;
 
+foreach nf = {3-4} in
+  def TuneOptimizedNF#nf#SegmentLoadStore :
+      SubtargetFeature<"optimized-nf"#nf#"-segment-load-store",
+                       "HasOptimizedNF"#nf#"SegmentLoadStore",
+                       "true", "vlseg"#nf#"eN.v and vsseg"#nf#"eN.v are"
+                       "implemented as a wide memory op and shuffle">;
+
 def Experimental
    : SubtargetFeature<"experimental", "HasExperimental",
                       "true", "Experimental intrinsics">;
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 364aa35f09453b..c73bd5e6a1f7ce 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -472,7 +472,9 @@ def SPACEMIT_X60 : RISCVProcessorModel<"spacemit-x60",
                                         FeatureStdExtZvfh,
                                         FeatureStdExtZvkt,
                                         FeatureStdExtZvl256b]),
-                                       [TuneDLenFactor2]>;
+                                       [TuneDLenFactor2,
+                                        TuneOptimizedNF3SegmentLoadStore,
+                                        TuneOptimizedNF4SegmentLoadStore]>;
 
 def RP2350_HAZARD3 : RISCVProcessorModel<"rp2350-hazard3",
                                          NoSchedModel,
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index f050fb569946d6..936e0a9bfdffcc 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -738,8 +738,11 @@ InstructionCost RISCVTTIImpl::getInterleavedMemoryOpCost(
                                             AddressSpace, DL)) {
 
         // Most available hardware today optimizes NF=2 as as one wide memory op
-        // + Factor * LMUL shuffle ops.
-        if (Factor == 2) {
+        // + Factor * LMUL shuffle ops. Some processors may also optimize NF=3
+        // and NF=4.
+        if (Factor == 2 ||
+            (Factor == 3 && ST->hasOptimizedNF3SegmentLoadStore()) ||
+            (Factor == 4 && ST->hasOptimizedNF4SegmentLoadStore())) {
           InstructionCost Cost =
               getMemoryOpCost(Opcode, VTy, Alignment, AddressSpace, CostKind);
           MVT SubVecVT = getTLI()->getValueType(DL, SubVecTy).getSimpleVT();
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll
index 6477f14e3c6984..b5f26dd001f884 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll
@@ -1,5 +1,7 @@
 ; REQUIRES: asserts
-; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,NO-OPT
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v,+optimized-nf3-segment-load-store -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,OPT-NF3
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v,+optimized-nf4-segment-load-store -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,OPT-NF4
 
 %i8.2 = type {i8, i8}
 define void @i8_factor_2(ptr %data, i64 %n) {
@@ -48,17 +50,28 @@ for.end:
 define void @i8_factor_3(ptr %data, i64 %n) {
 entry:
   br label %for.body
-; CHECK-LABEL: Checking a loop in 'i8_factor_3'
-; CHECK: Cost of 6 for VF 2: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
-; CHECK: Cost of 6 for VF 2: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
-; CHECK: Cost of 12 for VF 4: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
-; CHECK: Cost of 12 for VF 4: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
-; CHECK: Cost of 24 for VF 8: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
-; CHECK: Cost of 24 for VF 8: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
-; CHECK: Cost of 48 for VF 16: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
-; CHECK: Cost of 48 for VF 16: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
-; CHECK: Cost of 96 for VF 32: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
-; CHECK: Cost of 96 for VF 32: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; OPT-NF3-LABEL: Checking a loop in 'i8_factor_3'
+; OPT-NF3: Cost of 4 for VF 2: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; OPT-NF3: Cost of 4 for VF 2: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; OPT-NF3: Cost of 4 for VF 4: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; OPT-NF3: Cost of 4 for VF 4: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; OPT-NF3: Cost of 5 for VF 8: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; OPT-NF3: Cost of 5 for VF 8: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; OPT-NF3: Cost of 7 for VF 16: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; OPT-NF3: Cost of 7 for VF 16: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; OPT-NF3: Cost of 14 for VF 32: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; OPT-NF3: Cost of 14 for VF 32: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; NO-OPT-LABEL: Checking a loop in 'i8_factor_3'
+; NO-OPT: Cost of 6 for VF 2: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; NO-OPT: Cost of 6 for VF 2: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; NO-OPT: Cost of 12 for VF 4: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; NO-OPT: Cost of 12 for VF 4: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; NO-OPT: Cost of 24 for VF 8: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; NO-OPT: Cost of 24 for VF 8: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; NO-OPT: Cost of 48 for VF 16: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; NO-OPT: Cost of 48 for VF 16: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
+; NO-OPT: Cost of 96 for VF 32: INTERLEAVE-GROUP with factor 3 at %l0, ir<%p0>
+; NO-OPT: Cost of 96 for VF 32: INTERLEAVE-GROUP with factor 3 at <badref>, ir<%p0>
 for.body:
   %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
   %p0 = getelementptr inbounds %i8.3, ptr %data, i64 %i, i32 0
@@ -85,17 +98,28 @@ for.end:
 define void @i8_factor_4(ptr %data, i64 %n) {
 entry:
   br label %for.body
-; CHECK-LABEL: Checking a loop in 'i8_factor_4'
-; CHECK: Cost of 8 for VF 2: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
-; CHECK: Cost of 8 for VF 2: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
-; CHECK: Cost of 16 for VF 4: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
-; CHECK: Cost of 16 for VF 4: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
-; CHECK: Cost of 32 for VF 8: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
-; CHECK: Cost of 32 for VF 8: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
-; CHECK: Cost of 64 for VF 16: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
-; CHECK: Cost of 64 for VF 16: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
-; CHECK: Cost of 128 for VF 32: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
-; CHECK: Cost of 128 for VF 32: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; OPT-NF4-LABEL: Checking a loop in 'i8_factor_4'
+; OPT-NF4: Cost of 5 for VF 2: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; OPT-NF4: Cost of 5 for VF 2: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; OPT-NF4: Cost of 5 for VF 4: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; OPT-NF4: Cost of 5 for VF 4: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; OPT-NF4: Cost of 6 for VF 8: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; OPT-NF4: Cost of 6 for VF 8: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; OPT-NF4: Cost of 8 for VF 16: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; OPT-NF4: Cost of 8 for VF 16: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; OPT-NF4: Cost of 16 for VF 32: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; OPT-NF4: Cost of 16 for VF 32: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; NO-OPT-LABEL: Checking a loop in 'i8_factor_4'
+; NO-OPT: Cost of 8 for VF 2: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; NO-OPT: Cost of 8 for VF 2: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; NO-OPT: Cost of 16 for VF 4: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; NO-OPT: Cost of 16 for VF 4: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; NO-OPT: Cost of 32 for VF 8: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; NO-OPT: Cost of 32 for VF 8: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; NO-OPT: Cost of 64 for VF 16: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; NO-OPT: Cost of 64 for VF 16: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
+; NO-OPT: Cost of 128 for VF 32: INTERLEAVE-GROUP with factor 4 at %l0, ir<%p0>
+; NO-OPT: Cost of 128 for VF 32: INTERLEAVE-GROUP with factor 4 at <badref>, ir<%p0>
 for.body:
   %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
   %p0 = getelementptr inbounds %i8.4, ptr %data, i64 %i, i32 0

lukel97 · 2024-10-31T15:10:16Z

cc @camel-cdr

llvm/lib/Target/RISCV/RISCVFeatures.td

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

preames · 2024-10-31T18:09:33Z

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-cost.ll

-; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v,+optimized-nf3-segment-load-store -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,OPT-NF3
-; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v,+optimized-nf4-segment-load-store -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefixes=CHECK,OPT-NF4
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v,-optimized-nf2-segment-load-store -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefix=NO-OPT
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v -debug-only=loop-vectorize -disable-output < %s 2>&1 | FileCheck %s --check-prefix=OPT-NF2


I think you swapped your runlines vs checks for this line and the one above.

+optimized-nf2-segment-load-store is enabled by default with the generic tuning, so the first run line is a negative feature to turn it off. The second run line actually has it enabled but I could make it explicit here if prefered

wangpc-pp · 2024-11-01T03:20:29Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+static bool hasOptimizedSegmentLoadStore(unsigned NF,
+                                         const RISCVSubtarget *ST) {
+  switch (NF) {
+  case 2:


Save some lines via using macros?

I considered it but remembered coming across some PRs where people were actively trying to remove uses of macros in LLVM, e.g. #105551

I'm not strongly opinionated on this, is there a general stance we should take here?

is there a general stance we should take here?

Not found in https://llvm.org/docs/CodingStandards.html. I think it's OK not to use macros here, just a weak suggestion.

wangpc-pp · 2024-11-01T03:21:48Z

llvm/lib/Target/RISCV/RISCVFeatures.td

@@ -1386,6 +1386,13 @@ def TuneOptimizedZeroStrideLoad
                      "true", "Optimized (perform fewer memory operations)"
                      "zero-stride vector load">;

+foreach nf = {2-8} in
+  def TuneOptimizedNF#nf#SegmentLoadStore :


A random thought: There are 250 features for RISCV now, we may need to increase MAX_SUBTARGET_FEATURES soon.

Isn't the limit 5*64 or 320?

Yes, but that won't last long, especially if we add some fusions (like XiangShan).

wangpc-pp

LGTM.

preames

LGTM

zqb-all · 2024-11-03T00:12:03Z

LGTM

llvm-ci · 2024-11-03T23:20:51Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 10 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/7969

Here is the relevant piece of the build log for the reference

Step 10 (Add check check-offload) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (866 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (867 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug47654.cpp (868 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (869 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (870 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (871 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (872 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (873 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (874 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (875 of 879)
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1238.569488

llvm-ci · 2024-11-04T02:01:19Z

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/1363

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: googletest-timeout.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 9
not env -u FILECHECK_OPTS "/opt/freeware/bin/python3.9" /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout    --param gtest_filter=InfiniteLoopSubTest --timeout=1 > /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out
# executed command: not env -u FILECHECK_OPTS /opt/freeware/bin/python3.9 /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --timeout=1
# .---command stderr------------
# | lit.py: /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# RUN: at line 11
FileCheck --check-prefix=CHECK-INF < /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# RUN: at line 16
not env -u FILECHECK_OPTS "/opt/freeware/bin/python3.9" /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout   --param gtest_filter=InfiniteLoopSubTest  --param set_timeout=1   > /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cfgset.out
# executed command: not env -u FILECHECK_OPTS /opt/freeware/bin/python3.9 /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --param set_timeout=1
# .---command stderr------------
# | Traceback (most recent call last):
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 304, in post_process_shard_results
# |     testsuites = json.load(f)["testsuites"]
# |   File "/opt/freeware/lib64/python3.9/json/__init__.py", line 293, in load
# |     return loads(fp.read(),
# |   File "/opt/freeware/lib64/python3.9/json/__init__.py", line 346, in loads
# |     return _default_decoder.decode(s)
# |   File "/opt/freeware/lib64/python3.9/json/decoder.py", line 337, in decode
# |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
# |   File "/opt/freeware/lib64/python3.9/json/decoder.py", line 355, in raw_decode
# |     raise JSONDecodeError("Expecting value", s, err.value) from None
# | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
# | 
# | During handling of the above exception, another exception occurred:
# | 
# | Traceback (most recent call last):
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py", line 6, in <module>
# |     main()
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py", line 130, in main
# |     selected_tests, discovered_tests = GoogleTest.post_process_shard_results(
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 306, in post_process_shard_results
# |     raise RuntimeError(
# | RuntimeError: Failed to parse json file: /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Inputs/googletest-timeout/DummySubDir/OneTest.py-googletest-timeout-58393082-1-2.json
# | 
# `-----------------------------
# RUN: at line 19
FileCheck --check-prefix=CHECK-INF < /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cfgset.out /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# .---command stderr------------
# | /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py:34:14: error: CHECK-INF: expected string not found in input
# | # CHECK-INF: Timed Out: 1
# |              ^
...

This is a follow up to llvm#111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput This adds tuning features to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60. It also enables +optimized-nf2-segment-load-store by default in the generic tuning to maintain the previous behaviour when compiled without -mcpu or -mtune.

lukel97 requested review from preames, topperc and wangpc-pp October 31, 2024 15:09

llvmbot added backend:RISC-V llvm:transforms labels Oct 31, 2024

preames reviewed Oct 31, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVFeatures.td Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Add other NFs, including NF2. Add NF2 to the generic processor model

3984711

lukel97 changed the title ~~[RISCV] Add +optimized-nf{3,4}-segment-load-store~~ [RISCV] Add +optimized-nfN-segment-load-store Oct 31, 2024

preames reviewed Oct 31, 2024

View reviewed changes

wangpc-pp reviewed Nov 1, 2024

View reviewed changes

wangpc-pp approved these changes Nov 1, 2024

View reviewed changes

preames approved these changes Nov 1, 2024

View reviewed changes

lukel97 merged commit beb12f9 into llvm:main Nov 3, 2024
8 checks passed

[RISCV] Add +optimized-nfN-segment-load-store #114414

[RISCV] Add +optimized-nfN-segment-load-store #114414

Uh oh!

Conversation

lukel97 commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Oct 31, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

zqb-all commented Nov 3, 2024

Uh oh!

Uh oh!

llvm-ci commented Nov 3, 2024

Uh oh!

llvm-ci commented Nov 4, 2024

Uh oh!

Uh oh!

lukel97 commented Oct 31, 2024 •

edited

Loading

llvmbot commented Oct 31, 2024 •

edited

Loading