Skip to content

[RISCV] Add searchable table for tune information #66193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Sep 13, 2023

There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
SubtargetFeature because some of them are not with enumerable
value, for example, PrefetchDistance used by LoopDataPrefetch.

In this patch, a searchable table RISCVTuneInfoTable is added,
in which each entry contains the CPU name and all tune information
defined in RISCVTuneInfo. Each field of RISCVTuneInfo should
have a default value and processor definitions can override the
default value via let statements.

We don't need to define a RISCVTuneInfo for each processor and
it will use the default value (which is for generic) if no
RISCVTuneInfo defined.

For processors in the same series, a subclass can inherit from
RISCVTuneInfo and override the fields. And we can also override
the fields in processor definitions if there are some differences
in the same processor series.

When initilizing RISCVSubtarget, we will use TuneCPU as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified CPU, which
is expected I think.

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.

@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2023

@llvm/pr-subscribers-backend-risc-v

Changes There are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.

In this patch, a searchable table RISCVTuneInfoTable is added,
in which each entry contains the CPU name and all tune information
defined in RISCVTuneInfo. Each field of RISCVTuneInfo should
have a default value and processor definitions can override the
default value via let statements.

When initilizing RISCVSubtarget, we will use TuneCPU as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified CPU, which
is expected I think.

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.

--
Full diff: https://github.com/llvm/llvm-project/pull/66193.diff

6 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (-9)
  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+18-2)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+7)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+19-4)
  • (modified) llvm/test/CodeGen/RISCV/align-loops.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/align.ll (-4)
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 6381263b37613b3..367f0fbbe44801b 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -950,12 +950,3 @@ def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
     "AllowTaggedGlobals",
     "true", "Use an instruction sequence for taking the address of a global "
     "that allows a memory tag in the upper address bits">;
-
-foreach align = [2, 4, 8, 16, 32, 64] in {
-  def TunePrefFunctionAlignment # align :
-      SubtargetFeature<"pref-func-align-" # align, "PrefFunctionAlignment",
-                       "Align(" # align # ")", "Set preferred function alignment to " # align # " bytes">;
-  def TunePrefLoopAlignment # align :
-      SubtargetFeature<"pref-loop-align-" # align, "PrefLoopAlignment",
-                       "Align(" # align # ")", "Set preferred loop alignment to " # align # " bytes">;
-}
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 01291001cd7ca24..11443811fe068f0 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -10,12 +10,28 @@
 // RISC-V processors supported.
 //===----------------------------------------------------------------------===//
 
+class RISCVTuneInfo {
+  bits<8> PrefFunctionAlignment = 1;
+  bits<8> PrefLoopAlignment = 1;
+}
+
+def RISCVTuneInfoTable : GenericTable {
+  let FilterClass = "RISCVTuneInfo";
+  let CppTypeName = "RISCVTuneInfo";
+  let Fields = ["Name", "PrefFunctionAlignment", "PrefLoopAlignment"];
+}
+
+def getRISCVTuneInfo : SearchIndex {
+  let Table = RISCVTuneInfoTable;
+  let Key = ["Name"];
+}
+
 class RISCVProcessorModel<string n,
                           SchedMachineModel m,
                           list<SubtargetFeature> f,
                           list<SubtargetFeature> tunef = [],
                           string default_march = "">
-      :  ProcessorModel<n, m, f, tunef> {
+      :  ProcessorModel<n, m, f, tunef>, RISCVTuneInfo {
   string DefaultMarch = default_march;
 }
 
@@ -23,7 +39,7 @@ class RISCVTuneProcessorModel<string n,
                               SchedMachineModel m,
                               list<SubtargetFeature> tunef = [],
                               list<SubtargetFeature> f = []>
-      : ProcessorModel<n, m, f,tunef>;
+      : ProcessorModel<n, m, f,tunef>, RISCVTuneInfo;
 
 def GENERIC_RV32 : RISCVProcessorModel<"generic-rv32",
                                        NoSchedModel,
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index aa0275830e2a87a..572aa676edbbef4 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -29,6 +29,12 @@ using namespace llvm;
 #define GET_SUBTARGETINFO_CTOR
 #include "RISCVGenSubtargetInfo.inc"
 
+namespace llvm::RISCVTuneInfoTable {
+
+#define GET_RISCVTuneInfoTable_IMPL
+#include "RISCVGenSearchableTables.inc"
+} // namespace llvm::RISCVTuneInfoTable
+
 static cl::opt<bool> EnableSubRegLiveness("riscv-enable-subreg-liveness",
                                           cl::init(true), cl::Hidden);
 
@@ -66,6 +72,7 @@ RISCVSubtarget::initializeSubtargetDependencies(const Triple &TT, StringRef CPU,
     TuneCPU = CPU;
 
   ParseSubtargetFeatures(CPU, TuneCPU, FS);
+  TuneInfo = RISCVTuneInfoTable::getRISCVTuneInfo(TuneCPU);
   TargetABI = RISCVABI::computeTargetABI(TT, getFeatureBits(), ABIName);
   RISCVFeatures::validate(TT, getFeatureBits());
   return *this;
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index cf64dbc21bd8a8b..027d32d54160793 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -32,6 +32,18 @@
 namespace llvm {
 class StringRef;
 
+namespace RISCVTuneInfoTable {
+
+struct RISCVTuneInfo {
+  const char *Name;
+  uint8_t PrefFunctionAlignment;
+  uint8_t PrefLoopAlignment;
+};
+
+#define GET_RISCVTuneInfoTable_DECL
+#include "RISCVGenSearchableTables.inc"
+} // namespace RISCVTuneInfoTable
+
 class RISCVSubtarget : public RISCVGenSubtargetInfo {
 public:
   enum RISCVProcFamilyEnum : uint8_t {
@@ -54,8 +66,7 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   uint8_t MaxInterleaveFactor = 2;
   RISCVABI::ABI TargetABI = RISCVABI::ABI_Unknown;
   std::bitset<RISCV::NUM_TARGET_REGS> UserReservedRegister;
-  Align PrefFunctionAlignment;
-  Align PrefLoopAlignment;
+  const RISCVTuneInfoTable::RISCVTuneInfo *TuneInfo;
 
   RISCVFrameLowering FrameLowering;
   RISCVInstrInfo InstrInfo;
@@ -96,8 +107,12 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   }
   bool enableMachineScheduler() const override { return true; }
 
-  Align getPrefFunctionAlignment() const { return PrefFunctionAlignment; }
-  Align getPrefLoopAlignment() const { return PrefLoopAlignment; }
+  Align getPrefFunctionAlignment() const {
+    return Align(TuneInfo->PrefFunctionAlignment);
+  }
+  Align getPrefLoopAlignment() const {
+    return Align(TuneInfo->PrefLoopAlignment);
+  }
 
   /// Returns RISC-V processor family.
   /// Avoid this function! CPU specifics should be kept local to this class
diff --git a/llvm/test/CodeGen/RISCV/align-loops.ll b/llvm/test/CodeGen/RISCV/align-loops.ll
index 5ef78c74d03532b..efa03992b6277f6 100644
--- a/llvm/test/CodeGen/RISCV/align-loops.ll
+++ b/llvm/test/CodeGen/RISCV/align-loops.ll
@@ -1,8 +1,6 @@
 ; RUN: llc < %s -mtriple=riscv64 | FileCheck %s
 ; RUN: llc < %s -mtriple=riscv64 -align-loops=16 | FileCheck %s -check-prefix=ALIGN_16
 ; RUN: llc < %s -mtriple=riscv64 -align-loops=32 | FileCheck %s -check-prefix=ALIGN_32
-; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-16 | FileCheck %s -check-prefix=ALIGN_16
-; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-32 | FileCheck %s -check-prefix=ALIGN_32
 
 declare void @foo()
 
diff --git a/llvm/test/CodeGen/RISCV/align.ll b/llvm/test/CodeGen/RISCV/align.ll
index 1fb4585f8422aa4..5807fc14efc292d 100644
--- a/llvm/test/CodeGen/RISCV/align.ll
+++ b/llvm/test/CodeGen/RISCV/align.ll
@@ -2,8 +2,6 @@
 ; RUN:   | FileCheck %s -check-prefix=RV32I
 ; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV32C
-; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \
-; RUN:   | FileCheck %s -check-prefix=ALIGN-32
 ; RUN: llc -filetype=obj -mtriple=riscv32 < %s -o %t
 ; RUN: llvm-readelf -S %t | FileCheck %s --check-prefixes=SEC,SEC-I
 ; RUN: llc -filetype=obj -mtriple=riscv32 -mattr=+c < %s -o %t
@@ -18,8 +16,6 @@ define void @foo() {
 ;RV32I: foo:
 ;RV32C: .p2align 1
 ;RV32C: foo:
-;ALIGN-32: .p2align 5
-;ALIGN-32: foo:
 entry:
   ret void
 }

class RISCVProcessorModel<string n,
SchedMachineModel m,
list<SubtargetFeature> f,
list<SubtargetFeature> tunef = [],
string default_march = "">
: ProcessorModel<n, m, f, tunef> {
: ProcessorModel<n, m, f, tunef>, RISCVTuneInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should RISCVTuneInfo be an optional argument to ProcessorModel? Should RISCVTuneInfo be an optional argument to RISCVProcessorModel?

If we organize it like this, it becomes more clear to clients of ProcessorModel and RISCVProcessorModel that they have the ability to specify these tunings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would overriding these for a specific CPU look like right now. Would we need to add let for each field to each CPU?

I'd kind of like a way to define common sets that can be used by multiple CPUs without being repeating.

Copy link
Contributor Author

@wangpc-pp wangpc-pp Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation.
We don't need to define a RISCVTuneInfo for each processor and it will use the default value (which is for generic) if no RISCVTuneInfo defined.
For processors in the same series, a subclass can inherit from RISCVTuneInfo and override the fields. And we can also override the fields in processor definitions if there are some differences in the same processor series.
It can be like:

class SiFive7TuneInfo: RISCVTuneInfo {
  let XXX = xxx; // Override the field in class definition.
  // ...
};

def SIFIVE_7 : RISCVTuneProcessorModel<"sifive-7-series",
                                       SiFive7Model,
                                       [TuneSiFive7]>, SiFive7TuneInfo;

def SIFIVE_U74 : RISCVProcessorModel<"sifive-u74",
                                     SiFive7Model,
                                     [Feature64Bit,
                                      FeatureStdExtZifencei,
                                      FeatureStdExtM,
                                      FeatureStdExtA,
                                      FeatureStdExtF,
                                      FeatureStdExtD,
                                      FeatureStdExtC],
                                     [TuneSiFive7]>, SiFive7TuneInfo {
  let XXX = xxx; // Override the field in processor definition.
  // ...
}

@@ -2,8 +2,6 @@
; RUN: | FileCheck %s -check-prefix=RV32I
; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV32C
; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any processors that have non-default alignment? We could pass -mcpu and check for the .p2align N line so that we have test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know one. Maybe you can provide more information about SiFive processors in upstream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't noticed any performance issues from instruction alignment on SiFive7 and are relying on linker to get this right. I'm not so sure what else we can do to have test coverage for this without an upstream processor that uses this TuneInfo 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just learned that different alignment values have different results on different workloads for us. Therefore, I think we should have way to control this value at compile time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are IR attributes that override the CPU defaults. falign-loops and falign-functions uses those attributes.

Copy link
Contributor Author

@wangpc-pp wangpc-pp Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some descriptions in https://starfivetech.com/uploads/u74mc_core_complex_manual_21G1.pdf:

3.2.6 Instruction Fetch Unit
The S7 instruction fetch unit is responsible for keeping the pipeline fed with instructions from
memory. The instruction fetch unit delivers up to 8 bytes of instructions per clock cycle to sup-
port superscalar instruction execution. Fetches are always word-aligned and there is a one-
cycle penalty for branching to a 32-bit instruction that is not word-aligned.

It seems we'd better to align functions and hot loops to 4-byte to get best performance. But C extension is supported so the alignment would be 2-byte.

  // Function alignments.
  const Align FunctionAlignment(Subtarget.hasStdExtCOrZca() ? 2 : 4);
  setMinFunctionAlignment(FunctionAlignment);
  // Set preferred alignments.
  setPrefFunctionAlignment(Subtarget.getPrefFunctionAlignment());
  setPrefLoopAlignment(Subtarget.getPrefLoopAlignment());

I don't know much about S7 series, so I think you SiFive guys may do the change of alignments in upstream?

@wangpc-pp
Copy link
Contributor Author

Ping.

@michaelmaitland
Copy link
Contributor

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

@topperc
Copy link
Collaborator

topperc commented Sep 25, 2023

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

Alignment can be overridden by frontend options -falign-loops and -falign-functions. This sets an attribute on each function or loop in IR. if those attributes are present, they are maxed with the information in TargetLowering/Subtarget.

@wangpc-pp
Copy link
Contributor Author

We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use?

Alignment can be overridden by frontend options -falign-loops and -falign-functions. This sets an attribute on each function or loop in IR. if those attributes are present, they are maxed with the information in TargetLowering/Subtarget.

Besides, user can set alignment per function/loop via __attribute__(aligned(n)) or something like __attribute__((optimize("align-loops=n"))).

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than that comment.

There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
`SubtargetFeature` because some of them are not with enumerable
value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.

In this patch, a searchable table `RISCVTuneInfoTable` is added,
in which each entry contains the CPU name and all tune information
defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should
have a default value and processor definitions can override the
default value via `let` statements.

When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the
key to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified `CPU`, which
is expected I think.

This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.
Define RISCVTuneInfo by need and fail back to generic if no related RISCVTuneInfo existed
@wangpc-pp wangpc-pp merged commit 08165c4 into llvm:main Sep 26, 2023
@wangpc-pp wangpc-pp deleted the main-tune-info-table branch November 24, 2023 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants