-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV] Add searchable table for tune information #66193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v ChangesThere are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`.In this patch, a searchable table When initilizing This patch almost undoes 61ab106, in which I added tune features -- 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td index 6381263b37613b3..367f0fbbe44801b 100644 --- a/llvm/lib/Target/RISCV/RISCVFeatures.td +++ b/llvm/lib/Target/RISCV/RISCVFeatures.td @@ -950,12 +950,3 @@ def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals", "AllowTaggedGlobals", "true", "Use an instruction sequence for taking the address of a global " "that allows a memory tag in the upper address bits">; - -foreach align = [2, 4, 8, 16, 32, 64] in { - def TunePrefFunctionAlignment # align : - SubtargetFeature<"pref-func-align-" # align, "PrefFunctionAlignment", - "Align(" # align # ")", "Set preferred function alignment to " # align # " bytes">; - def TunePrefLoopAlignment # align : - SubtargetFeature<"pref-loop-align-" # align, "PrefLoopAlignment", - "Align(" # align # ")", "Set preferred loop alignment to " # align # " bytes">; -} diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td index 01291001cd7ca24..11443811fe068f0 100644 --- a/llvm/lib/Target/RISCV/RISCVProcessors.td +++ b/llvm/lib/Target/RISCV/RISCVProcessors.td @@ -10,12 +10,28 @@ // RISC-V processors supported. //===----------------------------------------------------------------------===// +class RISCVTuneInfo { + bits<8> PrefFunctionAlignment = 1; + bits<8> PrefLoopAlignment = 1; +} + +def RISCVTuneInfoTable : GenericTable { + let FilterClass = "RISCVTuneInfo"; + let CppTypeName = "RISCVTuneInfo"; + let Fields = ["Name", "PrefFunctionAlignment", "PrefLoopAlignment"]; +} + +def getRISCVTuneInfo : SearchIndex { + let Table = RISCVTuneInfoTable; + let Key = ["Name"]; +} + class RISCVProcessorModel<string n, SchedMachineModel m, list<SubtargetFeature> f, list<SubtargetFeature> tunef = [], string default_march = ""> - : ProcessorModel<n, m, f, tunef> { + : ProcessorModel<n, m, f, tunef>, RISCVTuneInfo { string DefaultMarch = default_march; } @@ -23,7 +39,7 @@ class RISCVTuneProcessorModel<string n, SchedMachineModel m, list<SubtargetFeature> tunef = [], list<SubtargetFeature> f = []> - : ProcessorModel<n, m, f,tunef>; + : ProcessorModel<n, m, f,tunef>, RISCVTuneInfo; def GENERIC_RV32 : RISCVProcessorModel<"generic-rv32", NoSchedModel, diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp index aa0275830e2a87a..572aa676edbbef4 100644 --- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp +++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp @@ -29,6 +29,12 @@ using namespace llvm; #define GET_SUBTARGETINFO_CTOR #include "RISCVGenSubtargetInfo.inc" +namespace llvm::RISCVTuneInfoTable { + +#define GET_RISCVTuneInfoTable_IMPL +#include "RISCVGenSearchableTables.inc" +} // namespace llvm::RISCVTuneInfoTable + static cl::opt<bool> EnableSubRegLiveness("riscv-enable-subreg-liveness", cl::init(true), cl::Hidden); @@ -66,6 +72,7 @@ RISCVSubtarget::initializeSubtargetDependencies(const Triple &TT, StringRef CPU, TuneCPU = CPU; ParseSubtargetFeatures(CPU, TuneCPU, FS); + TuneInfo = RISCVTuneInfoTable::getRISCVTuneInfo(TuneCPU); TargetABI = RISCVABI::computeTargetABI(TT, getFeatureBits(), ABIName); RISCVFeatures::validate(TT, getFeatureBits()); return *this; diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h index cf64dbc21bd8a8b..027d32d54160793 100644 --- a/llvm/lib/Target/RISCV/RISCVSubtarget.h +++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h @@ -32,6 +32,18 @@ namespace llvm { class StringRef; +namespace RISCVTuneInfoTable { + +struct RISCVTuneInfo { + const char *Name; + uint8_t PrefFunctionAlignment; + uint8_t PrefLoopAlignment; +}; + +#define GET_RISCVTuneInfoTable_DECL +#include "RISCVGenSearchableTables.inc" +} // namespace RISCVTuneInfoTable + class RISCVSubtarget : public RISCVGenSubtargetInfo { public: enum RISCVProcFamilyEnum : uint8_t { @@ -54,8 +66,7 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo { uint8_t MaxInterleaveFactor = 2; RISCVABI::ABI TargetABI = RISCVABI::ABI_Unknown; std::bitset<RISCV::NUM_TARGET_REGS> UserReservedRegister; - Align PrefFunctionAlignment; - Align PrefLoopAlignment; + const RISCVTuneInfoTable::RISCVTuneInfo *TuneInfo; RISCVFrameLowering FrameLowering; RISCVInstrInfo InstrInfo; @@ -96,8 +107,12 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo { } bool enableMachineScheduler() const override { return true; } - Align getPrefFunctionAlignment() const { return PrefFunctionAlignment; } - Align getPrefLoopAlignment() const { return PrefLoopAlignment; } + Align getPrefFunctionAlignment() const { + return Align(TuneInfo->PrefFunctionAlignment); + } + Align getPrefLoopAlignment() const { + return Align(TuneInfo->PrefLoopAlignment); + } /// Returns RISC-V processor family. /// Avoid this function! CPU specifics should be kept local to this class diff --git a/llvm/test/CodeGen/RISCV/align-loops.ll b/llvm/test/CodeGen/RISCV/align-loops.ll index 5ef78c74d03532b..efa03992b6277f6 100644 --- a/llvm/test/CodeGen/RISCV/align-loops.ll +++ b/llvm/test/CodeGen/RISCV/align-loops.ll @@ -1,8 +1,6 @@ ; RUN: llc < %s -mtriple=riscv64 | FileCheck %s ; RUN: llc < %s -mtriple=riscv64 -align-loops=16 | FileCheck %s -check-prefix=ALIGN_16 ; RUN: llc < %s -mtriple=riscv64 -align-loops=32 | FileCheck %s -check-prefix=ALIGN_32 -; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-16 | FileCheck %s -check-prefix=ALIGN_16 -; RUN: llc < %s -mtriple=riscv64 -mattr=+pref-loop-align-32 | FileCheck %s -check-prefix=ALIGN_32 declare void @foo() diff --git a/llvm/test/CodeGen/RISCV/align.ll b/llvm/test/CodeGen/RISCV/align.ll index 1fb4585f8422aa4..5807fc14efc292d 100644 --- a/llvm/test/CodeGen/RISCV/align.ll +++ b/llvm/test/CodeGen/RISCV/align.ll @@ -2,8 +2,6 @@ ; RUN: | FileCheck %s -check-prefix=RV32I ; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \ ; RUN: | FileCheck %s -check-prefix=RV32C -; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \ -; RUN: | FileCheck %s -check-prefix=ALIGN-32 ; RUN: llc -filetype=obj -mtriple=riscv32 < %s -o %t ; RUN: llvm-readelf -S %t | FileCheck %s --check-prefixes=SEC,SEC-I ; RUN: llc -filetype=obj -mtriple=riscv32 -mattr=+c < %s -o %t @@ -18,8 +16,6 @@ define void @foo() { ;RV32I: foo: ;RV32C: .p2align 1 ;RV32C: foo: -;ALIGN-32: .p2align 5 -;ALIGN-32: foo: entry: ret void } |
class RISCVProcessorModel<string n, | ||
SchedMachineModel m, | ||
list<SubtargetFeature> f, | ||
list<SubtargetFeature> tunef = [], | ||
string default_march = ""> | ||
: ProcessorModel<n, m, f, tunef> { | ||
: ProcessorModel<n, m, f, tunef>, RISCVTuneInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should RISCVTuneInfo
be an optional argument to ProcessorModel
? Should RISCVTuneInfo
be an optional argument to RISCVProcessorModel
?
If we organize it like this, it becomes more clear to clients of ProcessorModel
and RISCVProcessorModel
that they have the ability to specify these tunings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would overriding these for a specific CPU look like right now. Would we need to add let
for each field to each CPU?
I'd kind of like a way to define common sets that can be used by multiple CPUs without being repeating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the implementation.
We don't need to define a RISCVTuneInfo
for each processor and it will use the default value (which is for generic
) if no RISCVTuneInfo
defined.
For processors in the same series, a subclass can inherit from RISCVTuneInfo
and override the fields. And we can also override the fields in processor definitions if there are some differences in the same processor series.
It can be like:
class SiFive7TuneInfo: RISCVTuneInfo {
let XXX = xxx; // Override the field in class definition.
// ...
};
def SIFIVE_7 : RISCVTuneProcessorModel<"sifive-7-series",
SiFive7Model,
[TuneSiFive7]>, SiFive7TuneInfo;
def SIFIVE_U74 : RISCVProcessorModel<"sifive-u74",
SiFive7Model,
[Feature64Bit,
FeatureStdExtZifencei,
FeatureStdExtM,
FeatureStdExtA,
FeatureStdExtF,
FeatureStdExtD,
FeatureStdExtC],
[TuneSiFive7]>, SiFive7TuneInfo {
let XXX = xxx; // Override the field in processor definition.
// ...
}
@@ -2,8 +2,6 @@ | |||
; RUN: | FileCheck %s -check-prefix=RV32I | |||
; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \ | |||
; RUN: | FileCheck %s -check-prefix=RV32C | |||
; RUN: llc -mtriple=riscv32 -mattr=+pref-func-align-32 -verify-machineinstrs < %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any processors that have non-default alignment? We could pass -mcpu and check for the .p2align N
line so that we have test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know one. Maybe you can provide more information about SiFive processors in upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't noticed any performance issues from instruction alignment on SiFive7 and are relying on linker to get this right. I'm not so sure what else we can do to have test coverage for this without an upstream processor that uses this TuneInfo 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just learned that different alignment values have different results on different workloads for us. Therefore, I think we should have way to control this value at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are IR attributes that override the CPU defaults. falign-loops and falign-functions uses those attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some descriptions in https://starfivetech.com/uploads/u74mc_core_complex_manual_21G1.pdf:
3.2.6 Instruction Fetch Unit
The S7 instruction fetch unit is responsible for keeping the pipeline fed with instructions from
memory. The instruction fetch unit delivers up to 8 bytes of instructions per clock cycle to sup-
port superscalar instruction execution. Fetches are always word-aligned and there is a one-
cycle penalty for branching to a 32-bit instruction that is not word-aligned.
It seems we'd better to align functions and hot loops to 4-byte to get best performance. But C
extension is supported so the alignment would be 2-byte.
// Function alignments.
const Align FunctionAlignment(Subtarget.hasStdExtCOrZca() ? 2 : 4);
setMinFunctionAlignment(FunctionAlignment);
// Set preferred alignments.
setPrefFunctionAlignment(Subtarget.getPrefFunctionAlignment());
setPrefLoopAlignment(Subtarget.getPrefLoopAlignment());
I don't know much about S7 series, so I think you SiFive guys may do the change of alignments in upstream?
Ping. |
We collected some data that shows that different values of these variables are better on different benchmarks. Do we have the ability to override these values at compile time so that different benchmarks can specify which alignment to use? |
Alignment can be overridden by frontend options |
Besides, user can set alignment per function/loop via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than that comment.
There are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`. In this patch, a searchable table `RISCVTuneInfoTable` is added, in which each entry contains the CPU name and all tune information defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should have a default value and processor definitions can override the default value via `let` statements. When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the key to serach the tune info table. So, the behavior here is if we don't specify the tune CPU, we will use specified `CPU`, which is expected I think. This patch almost undoes 61ab106, in which I added tune features of preferred function/loop alignments. More tune information can be added in the future.
Define RISCVTuneInfo by need and fail back to generic if no related RISCVTuneInfo existed
46879f2
to
0a2703d
Compare
There are many information that can be used for tuning, like
alignments, cache line size, etc. But we can't make all of them
SubtargetFeature
because some of them are not with enumerablevalue, for example,
PrefetchDistance
used byLoopDataPrefetch
.In this patch, a searchable table
RISCVTuneInfoTable
is added,in which each entry contains the CPU name and all tune information
defined in
RISCVTuneInfo
. Each field ofRISCVTuneInfo
shouldhave a default value and processor definitions can override the
default value via
let
statements.We don't need to define a
RISCVTuneInfo
for each processor andit will use the default value (which is for
generic
) if noRISCVTuneInfo
defined.For processors in the same series, a subclass can inherit from
RISCVTuneInfo
and override the fields. And we can also overridethe fields in processor definitions if there are some differences
in the same processor series.
When initilizing
RISCVSubtarget
, we will useTuneCPU
as thekey to serach the tune info table. So, the behavior here is if
we don't specify the tune CPU, we will use specified
CPU
, whichis expected I think.
This patch almost undoes 61ab106, in which I added tune features
of preferred function/loop alignments. More tune information can
be added in the future.