[RISCV] Add searchable table for tune information #66193

wangpc-pp · 2023-09-13T11:25:40Z

There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
SubtargetFeature because some of them are not with enumerable
value, for example, PrefetchDistance used by LoopDataPrefetch.

In this patch, a searchable table RISCVTuneInfoTable is added,
in which each entry contains the CPU name and all tune information
defined in RISCVTuneInfo. Each field of RISCVTuneInfo should
have a default value and processor definitions can override the
default value via let statements.

We don't need to define a RISCVTuneInfo for each processor and
it will use the default value (which is for generic) if no
RISCVTuneInfo defined.

For processors in the same series, a subclass can inherit from
RISCVTuneInfo and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.

When initilizing RISCVSubtarget, we will use TuneCPU as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified CPU, which
is expected I think.

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.

llvmbot · 2023-09-13T11:26:50Z

@llvm/pr-subscribers-backend-risc-v

Changes

There are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.

In this patch, a searchable table RISCVTuneInfoTable is added,
in which each entry contains the CPU name and all tune information
defined in RISCVTuneInfo. Each field of RISCVTuneInfo should
have a default value and processor definitions can override the
default value via let statements.

When initilizing RISCVSubtarget, we will use TuneCPU as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified CPU, which
is expected I think.

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.

--
Full diff: https://github.com/llvm/llvm-project/pull/66193.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (-9)
(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+18-2)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+7)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+19-4)
(modified) llvm/test/CodeGen/RISCV/align-loops.ll (-2)
(modified) llvm/test/CodeGen/RISCV/align.ll (-4)

diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 6381263b37613b3..367f0fbbe44801b 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -950,12 +950,3 @@ def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
     "AllowTaggedGlobals",
     "true", "Use an instruction sequence for taking the address of a global "
     "that allows a memory tag in the upper address bits">;
-
-foreach align = [2, 4, 8, 16, 32, 64] in {
-  def TunePrefFunctionAlignment # align :
-      SubtargetFeature<"pref-func-align-" # align, "PrefFunctionAlignment",
-                       "Align(" # align # ")", "Set preferred function alignment to " # align # " bytes">;
-  def TunePrefLoopAlignment # align :
-      SubtargetFeature<"pref-loop-align-" # align, "PrefLoopAlignment",
-                       "Align(" # align # ")", "Set preferred loop alignment to " # align # " bytes">;
-}
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 01291001cd7ca24..11443811fe068f0 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -10,12 +10,28 @@
 // RISC-V processors supported.
 //===----------------------------------------------------------------------===//
 
+class RISCVTuneInfo {
+  bits<8> PrefFunctionAlignment = 1;
+  bits<8> PrefLoopAlignment = 1;
+}
+
+def RISCVTuneInfoTable : GenericTable {
+  let FilterClass = "RISCVTuneInfo";
+  let CppTypeName = "RISCVTuneInfo";
+  let Fields = ["Name", "PrefFunctionAlignment", "PrefLoopAlignment"];
+}
+
+def getRISCVTuneInfo : SearchIndex {
+  let Table = RISCVTuneInfoTable;
+  let Key = ["Name"];
+}
+
 class RISCVProcessorModel<string n,
                           SchedMachineModel m,
                           list<SubtargetFeature> f,
                           list<SubtargetFeature> tunef = [],
                           string default_march = "">
-      :  ProcessorModel<n, m, f, tunef> {
+      :  ProcessorModel<n, m, f, tunef>, RISCVTuneInfo {
   string DefaultMarch = default_march;
 }
 
@@ -23,7 +39,7 @@ class RISCVTuneProcessorModel<string n,
                               SchedMachineModel m,
                               list<SubtargetFeature> tunef = [],
                               list<SubtargetFeature> f = []>
-      : ProcessorModel<n, m, f,tunef>;
+      : ProcessorModel<n, m, f,tunef>, RISCVTuneInfo;
 
 def GENERIC_RV32 : RISCVProcessorModel<"generic-rv32",
                                        NoSchedModel,
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index aa0275830e2a87a..572aa676edbbef4 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -29,6 +29,12 @@ using namespace llvm;
 #define GET_SUBTARGETINFO_CTOR
 #include "RISCVGenSubtargetInfo.inc"
 
+namespace llvm::RISCVTuneInfoTable {
+
+#define GET_RISCVTuneInfoTable_IMPL
+#include "RISCVGenSearchableTables.inc"
+} // namespace llvm::RISCVTuneInfoTable
+
 static cl::opt<bool> EnableSubRegLiveness("riscv-enable-subreg-liveness",
                                           cl::init(true), cl::Hidden);
 
@@ -66,6 +72,7 @@ RISCVSubtarget::initializeSubtargetDependencies(const Triple &TT, StringRef CPU,
     TuneCPU = CPU;
 
   ParseSubtargetFeatures(CPU, TuneCPU, FS);
+  TuneInfo = RISCVTuneInfoTable::getRISCVTuneInfo(TuneCPU);
   TargetABI = RISCVABI::computeTargetABI(TT, getFeatureBits(), ABIName);
   RISCVFeatures::validate(TT, getFeatureBits());
   return *this;
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index cf64dbc21bd8a8b..027d32d54160793 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -32,6 +32,18 @@
 namespace llvm {
 class StringRef;
 
+namespace RISCVTuneInfoTable {
+
+struct RISCVTuneInfo {
+  const char *Name;
+  uint8_t PrefFunctionAlignment;
+  uint8_t PrefLoopAlignment;
+};
+
+#define GET_RISCVTuneInfoTable_DECL
+#include "RISCVGenSearchableTables.inc"
+} // namespace RISCVTuneInfoTable
+
 class RISCVSubtarget : public RISCVGenSubtargetInfo {
 public:
   enum RISCVProcFamilyEnum : uint8_t {
@@ -54,8 +66,7 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   uint8_t MaxInterleaveFactor = 2;
   RISCVABI::ABI TargetABI = RISCVABI::ABI_Unknown;
   std::bitset<RISCV::NUM_TARGET_REGS> UserReservedRegister;
-  Align PrefFunctionAlignment;
-  Align PrefLoopAlignment;
+  const RISCVTuneInfoTable::RISCVTuneInfo *TuneInfo;
 
   RISCVFrameLowering FrameLowering;
   RISCVInstrInfo InstrInfo;
@@ -96,8 +107,12 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   }
   bool enableMachineScheduler() const override { return true; }
 
-  Align getPrefFunctionAlignment() const { return PrefFunctionAlignment; }
-  Align getPrefLoopAlignment() const { return PrefLoopAlignment; }
+  Align getPrefFunctionAlignment() const {
+    return Align(TuneInfo->PrefFunctionAlignment);
+  }
+  Align getPrefLoopAlignment() const {
+    return Align(TuneInfo->PrefLoopAlignment);
+  }
 
   /// Returns RISC-V processor family.
   /// Avoid this function! CPU specifics should be kept local to this class
diff --git a/llvm/test/CodeGen/RISCV/align-loops.ll b/llvm/test/CodeGen/RISCV/align-loops.ll
index 5ef78c74d03532b..efa03992b6277f6 100644
--- a/llvm/test/CodeGen/RISCV/align-loops.ll
+++ b/llvm/test/CodeGen/RISCV/align-loops.ll
@@ -1,8 +1,6 @@
 ; RUN: llc < %s -mtriple=riscv64 | FileCheck %s
 ; RUN: llc < %s -mtriple=riscv64 -align-loops=16 | FileCheck %s -check-prefix=ALIGN_16
 ; RUN: llc < %s -mtriple=riscv64 -align-loops=32 | FileCheck %s -check-prefix=ALIGN_32
-; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-16 | FileCheck %s -check-prefix=ALIGN_16
-; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-32 | FileCheck %s -check-prefix=ALIGN_32
 
 declare void @foo()
 
diff --git a/llvm/test/CodeGen/RISCV/align.ll b/llvm/test/CodeGen/RISCV/align.ll
index 1fb4585f8422aa4..5807fc14efc292d 100644
--- a/llvm/test/CodeGen/RISCV/align.ll
+++ b/llvm/test/CodeGen/RISCV/align.ll
@@ -2,8 +2,6 @@
 ; RUN:   | FileCheck %s -check-prefix=RV32I
 ; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV32C
-; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck %s -check-prefix=ALIGN-32
 ; RUN: llc -filetype=obj -mtriple=riscv32 < %s -o %t
 ; RUN: llvm-readelf -S %t | FileCheck %s --check-prefixes=SEC,SEC-I
 ; RUN: llc -filetype=obj -mtriple=riscv32 -mattr=+c < %s -o %t
@@ -18,8 +16,6 @@ define void @foo() {
 ;RV32I: foo:
 ;RV32C: .p2align 1
 ;RV32C: foo:
-;ALIGN-32: .p2align 5
-;ALIGN-32: foo:
 entry:
   ret void
 }

michaelmaitland · 2023-09-14T14:22:14Z

llvm/lib/Target/RISCV/RISCVProcessors.td

 class RISCVProcessorModel<string n,
                          SchedMachineModel m,
                          list<SubtargetFeature> f,
                          list<SubtargetFeature> tunef = [],
                          string default_march = "">
-      :  ProcessorModel<n, m, f, tunef> {
+      :  ProcessorModel<n, m, f, tunef>, RISCVTuneInfo {


Should RISCVTuneInfo be an optional argument to ProcessorModel? Should RISCVTuneInfo be an optional argument to RISCVProcessorModel?

If we organize it like this, it becomes more clear to clients of ProcessorModel and RISCVProcessorModel that they have the ability to specify these tunings.

What would overriding these for a specific CPU look like right now. Would we need to add let for each field to each CPU?

I'd kind of like a way to define common sets that can be used by multiple CPUs without being repeating.

I changed the implementation.
We don't need to define a RISCVTuneInfo for each processor and it will use the default value (which is for generic) if no RISCVTuneInfo defined.
For processors in the same series, a subclass can inherit from RISCVTuneInfo and override the fields. And we can also override the fields in processor definitions if there are some differences in the same processor series.
It can be like:

class SiFive7TuneInfo: RISCVTuneInfo { let XXX = xxx; // Override the field in class definition. // ... }; def SIFIVE_7 : RISCVTuneProcessorModel<"sifive-7-series", SiFive7Model, [TuneSiFive7]>, SiFive7TuneInfo; def SIFIVE_U74 : RISCVProcessorModel<"sifive-u74", SiFive7Model, [Feature64Bit, FeatureStdExtZifencei, FeatureStdExtM, FeatureStdExtA, FeatureStdExtF, FeatureStdExtD, FeatureStdExtC], [TuneSiFive7]>, SiFive7TuneInfo { let XXX = xxx; // Override the field in processor definition. // ... }

michaelmaitland · 2023-09-14T14:28:47Z

llvm/test/CodeGen/RISCV/align.ll

@@ -2,8 +2,6 @@
 ; RUN:   | FileCheck %s -check-prefix=RV32I
 ; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV32C
-; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \


Are there any processors that have non-default alignment? We could pass -mcpu and check for the .p2align N line so that we have test coverage.

I don't know one. Maybe you can provide more information about SiFive processors in upstream?

We haven't noticed any performance issues from instruction alignment on SiFive7 and are relying on linker to get this right. I'm not so sure what else we can do to have test coverage for this without an upstream processor that uses this TuneInfo 🤷

I've just learned that different alignment values have different results on different workloads for us. Therefore, I think we should have way to control this value at compile time.

There are IR attributes that override the CPU defaults. falign-loops and falign-functions uses those attributes.

I found some descriptions in https://starfivetech.com/uploads/u74mc_core_complex_manual_21G1.pdf:

3.2.6 Instruction Fetch Unit
The S7 instruction fetch unit is responsible for keeping the pipeline fed with instructions from
memory. The instruction fetch unit delivers up to 8 bytes of instructions per clock cycle to sup-
port superscalar instruction execution. Fetches are always word-aligned and there is a one-
cycle penalty for branching to a 32-bit instruction that is not word-aligned.

It seems we'd better to align functions and hot loops to 4-byte to get best performance. But C extension is supported so the alignment would be 2-byte.

// Function alignments. const Align FunctionAlignment(Subtarget.hasStdExtCOrZca() ? 2 : 4); setMinFunctionAlignment(FunctionAlignment); // Set preferred alignments. setPrefFunctionAlignment(Subtarget.getPrefFunctionAlignment()); setPrefLoopAlignment(Subtarget.getPrefLoopAlignment());

I don't know much about S7 series, so I think you SiFive guys may do the change of alignments in upstream?

wangpc-pp · 2023-09-25T03:24:07Z

Ping.

michaelmaitland · 2023-09-25T12:44:47Z

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

topperc · 2023-09-25T14:22:41Z

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

Alignment can be overridden by frontend options -falign-loops and -falign-functions. This sets an attribute on each function or loop in IR. if those attributes are present, they are maxed with the information in TargetLowering/Subtarget.

wangpc-pp · 2023-09-25T15:10:16Z

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

Alignment can be overridden by frontend options -falign-loops and -falign-functions. This sets an attribute on each function or loop in IR. if those attributes are present, they are maxed with the information in TargetLowering/Subtarget.

Besides, user can set alignment per function/loop via __attribute__(aligned(n)) or something like __attribute__((optimize("align-loops=n"))).

topperc

LGTM other than that comment.

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

There are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`. In this patch, a searchable table `RISCVTuneInfoTable` is added, in which each entry contains the CPU name and all tune information defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should have a default value and processor definitions can override the default value via `let` statements. When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the key to serach the tune info table. So, the behavior here is if we don't specify the tune CPU, we will use specified `CPU`, which is expected I think. This patch almost undoes 61ab106, in which I added tune features of preferred function/loop alignments. More tune information can be added in the future.

Define RISCVTuneInfo by need and fail back to generic if no related RISCVTuneInfo existed

Add assertion

wangpc-pp requested a review from a team as a code owner September 13, 2023 11:25

llvmbot added the backend:RISC-V label Sep 13, 2023

wangpc-pp requested review from asb, jrtc27, preames, kito-cheng and topperc September 13, 2023 11:26

wangpc-pp requested a review from michaelmaitland September 13, 2023 11:28

wangpc-pp mentioned this pull request Sep 13, 2023

[RISCV] Enable LoopDataPrefetch pass #66201

Merged

michaelmaitland reviewed Sep 14, 2023

View reviewed changes

topperc approved these changes Sep 25, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVSubtarget.cpp Show resolved Hide resolved

wangpc-pp added 3 commits September 26, 2023 12:01

fixup! [RISCV] Add searchable table for tune information

ed7c7ea

Define RISCVTuneInfo by need and fail back to generic if no related RISCVTuneInfo existed

fixup! [RISCV] Add searchable table for tune information

0a2703d

Add assertion

wangpc-pp force-pushed the main-tune-info-table branch from 46879f2 to 0a2703d Compare September 26, 2023 04:04

wangpc-pp merged commit 08165c4 into llvm:main Sep 26, 2023

wangpc-pp deleted the main-tune-info-table branch November 24, 2023 04:30

[RISCV] Add searchable table for tune information #66193

[RISCV] Add searchable table for tune information #66193

Uh oh!

Conversation

wangpc-pp commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 13, 2023

Uh oh!

michaelmaitland Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

topperc Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Sep 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

topperc Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Sep 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp commented Sep 25, 2023

Uh oh!

michaelmaitland commented Sep 25, 2023

Uh oh!

topperc commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Sep 25, 2023

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wangpc-pp commented Sep 13, 2023 •

edited

Loading

wangpc-pp Sep 20, 2023 •

edited

Loading

wangpc-pp Sep 20, 2023 •

edited

Loading

topperc commented Sep 25, 2023 •

edited

Loading