-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[CodeGen] Introduce Static Data Splitter pass #122183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 Author: Mingming Liu (mingmingl-llvm) Changeshttps://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections. This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce Patch is 20.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122183.diff 13 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineBasicBlock.h b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
index 7fe33c3913f2dd..9ed356c2ab2331 100644
--- a/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+++ b/llvm/include/llvm/CodeGen/MachineBasicBlock.h
@@ -997,6 +997,12 @@ class MachineBasicBlock
/// no changes occurred in the meantime.
bool canSplitCriticalEdge(const MachineBasicBlock *Succ) const;
+ /// Return an index for MachineJumpTableInfo if \p this basic block ends with
+ /// an indirect jump using a jump table, otherwise -1.
+ /// This function is a thin wrapper and forward calls to the per-target method
+ /// `TargetInstrInfo::getjumpTableIndex`.
+ int getJumpTableIndex() const;
+
void pop_front() { Insts.pop_front(); }
void pop_back() { Insts.pop_back(); }
void push_back(MachineInstr *MI) { Insts.push_back(MI); }
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index d696add8a1af53..c0f983d1c6787b 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -88,6 +88,15 @@ template <> struct ilist_callback_traits<MachineBasicBlock> {
}
};
+// The hotness of static data tracked by a MachineFunction and not represented
+// as a global object in the module IR / MIR. Typical examples are
+// MachineJumpTableInfo and MachineConstantPool.
+enum class DataHotness {
+ Unknown,
+ Cold,
+ Hot,
+};
+
/// MachineFunctionInfo - This class can be derived from and used by targets to
/// hold private target-specific information for each MachineFunction. Objects
/// of type are accessed/created with MF::getInfo and destroyed when the
diff --git a/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h b/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
index e8e9c2f6338e06..cc1f54a81b9bb4 100644
--- a/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
@@ -28,6 +28,7 @@ namespace llvm {
class MachineBasicBlock;
class DataLayout;
class raw_ostream;
+enum class DataHotness;
/// MachineJumpTableEntry - One jump table in the jump table info.
///
@@ -35,8 +36,9 @@ struct MachineJumpTableEntry {
/// MBBs - The vector of basic blocks from which to create the jump table.
std::vector<MachineBasicBlock*> MBBs;
- explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock*> &M)
- : MBBs(M) {}
+ DataHotness Hotness;
+
+ explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock *> &M);
};
class MachineJumpTableInfo {
@@ -107,6 +109,8 @@ class MachineJumpTableInfo {
return JumpTables;
}
+ void updateJumpTableHotness(size_t JTI, DataHotness Hotness);
+
/// RemoveJumpTable - Mark the specific index as being dead. This will
/// prevent it from being emitted.
void RemoveJumpTable(unsigned Idx) {
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index d1fac4a304cffe..16423d03ff7018 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -71,6 +71,10 @@ namespace llvm {
/// using profile information.
MachineFunctionPass *createMachineFunctionSplitterPass();
+ /// createStaticDataSplitterPass - This pass partions static data sections
+ /// into a hot and cold section using profile information.
+ MachineFunctionPass *createStaticDataSplitterPass();
+
/// MachineFunctionPrinter pass - This pass prints out the machine function to
/// the given stream as a debugging tool.
MachineFunctionPass *
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 1cb9013bc48cc5..8111afcc1fb20f 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -293,6 +293,7 @@ void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &);
void initializeSpillPlacementWrapperLegacyPass(PassRegistry &);
void initializeStackColoringLegacyPass(PassRegistry &);
void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &);
+void initializeStaticDataSplitterPass(PassRegistry &);
void initializeStackMapLivenessPass(PassRegistry &);
void initializeStackProtectorPass(PassRegistry &);
void initializeStackSafetyGlobalInfoWrapperPassPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 5a4e79d7225db1..f4cddafa009711 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -236,6 +236,7 @@ DUMMY_MACHINE_FUNCTION_PASS("livedebugvalues", LiveDebugValuesPass)
DUMMY_MACHINE_FUNCTION_PASS("lrshrink", LiveRangeShrinkPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-combiner", MachineCombinerPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass)
+DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", StaticDataSplitter)
DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", MachineFunctionSplitterPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-latecleanup", MachineLateInstrsCleanupPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-sanmd", MachineSanitizerBinaryMetadata)
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index 145fd2fac8b564..88f863d8204d09 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -226,6 +226,7 @@ add_llvm_component_library(LLVMCodeGen
StackMaps.cpp
StackProtector.cpp
StackSlotColoring.cpp
+ StaticDataSplitter.cpp
SwiftErrorValueTracking.cpp
SwitchLoweringUtils.cpp
TailDuplication.cpp
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 8efe540770913a..84d92705de0223 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -130,6 +130,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeStackMapLivenessPass(Registry);
initializeStackProtectorPass(Registry);
initializeStackSlotColoringPass(Registry);
+ initializeStaticDataSplitterPass(Registry);
initializeStripDebugMachineModulePass(Registry);
initializeTailDuplicateLegacyPass(Registry);
initializeTargetPassConfigPass(Registry);
diff --git a/llvm/lib/CodeGen/MachineBasicBlock.cpp b/llvm/lib/CodeGen/MachineBasicBlock.cpp
index 5ac6472a01e9fc..af298c5994fbb6 100644
--- a/llvm/lib/CodeGen/MachineBasicBlock.cpp
+++ b/llvm/lib/CodeGen/MachineBasicBlock.cpp
@@ -1426,6 +1426,10 @@ bool MachineBasicBlock::canSplitCriticalEdge(
return true;
}
+int MachineBasicBlock::getJumpTableIndex() const {
+ return findJumpTableIndex(*this);
+}
+
/// Prepare MI to be removed from its bundle. This fixes bundle flags on MI's
/// neighboring instructions so the bundle won't be broken by removing MI.
static void unbundleSingleMI(MachineInstr *MI) {
diff --git a/llvm/lib/CodeGen/MachineFunction.cpp b/llvm/lib/CodeGen/MachineFunction.cpp
index e6b9538fe9a02c..b5a89f3bcf42f1 100644
--- a/llvm/lib/CodeGen/MachineFunction.cpp
+++ b/llvm/lib/CodeGen/MachineFunction.cpp
@@ -1291,6 +1291,10 @@ const unsigned MachineFunction::DebugOperandMemNumber = 1000000;
// MachineJumpTableInfo implementation
//===----------------------------------------------------------------------===//
+MachineJumpTableEntry::MachineJumpTableEntry(
+ const std::vector<MachineBasicBlock *> &MBBs)
+ : MBBs(MBBs), Hotness(DataHotness::Unknown) {}
+
/// Return the size of each entry in the jump table.
unsigned MachineJumpTableInfo::getEntrySize(const DataLayout &TD) const {
// The size of a jump table entry is 4 bytes unless the entry is just the
@@ -1340,6 +1344,15 @@ unsigned MachineJumpTableInfo::createJumpTableIndex(
return JumpTables.size()-1;
}
+void MachineJumpTableInfo::updateJumpTableHotness(size_t JTI,
+ DataHotness Hotness) {
+ assert(JTI < JumpTables.size() && "Invalid JTI!");
+ // Note record the largest hotness is important for mergable data (constant
+ // pools). Even if jump table instances are not merged, record the largest
+ // value seen fwiw.
+ JumpTables[JTI].Hotness = std::max(JumpTables[JTI].Hotness, Hotness);
+}
+
/// If Old is the target of any jump tables, update the jump tables to branch
/// to New instead.
bool MachineJumpTableInfo::ReplaceMBBInJumpTables(MachineBasicBlock *Old,
diff --git a/llvm/lib/CodeGen/StaticDataSplitter.cpp b/llvm/lib/CodeGen/StaticDataSplitter.cpp
new file mode 100644
index 00000000000000..14b9c1b3394d2e
--- /dev/null
+++ b/llvm/lib/CodeGen/StaticDataSplitter.cpp
@@ -0,0 +1,153 @@
+//===- StaticDataSplitter.cpp ---------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass uses profile information to partition static data sections into
+// hot and cold ones. It begins to split jump tables based on profile, and
+// subsequent patches will handle constant pools and other module internal data.
+//
+// For the original RFC of this pass please see
+// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744.
+
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MBFIWrapper.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineJumpTableInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "static-data-splitter"
+
+STATISTIC(NumHotJumpTables, "Number of hot jump tables seen");
+STATISTIC(NumColdJumpTables, "Number of cold jump tables seen");
+STATISTIC(NumUnknownJumpTables,
+ "Number of jump tables with unknown hotness. Such jump tables will "
+ "be placed in the hot-suffixed section by default.");
+
+class StaticDataSplitter : public MachineFunctionPass {
+ const MachineBranchProbabilityInfo *MBPI = nullptr;
+ const MachineBlockFrequencyInfo *MBFI = nullptr;
+ const ProfileSummaryInfo *PSI = nullptr;
+
+ // Returns true iff any jump table is hot-cold categorized.
+ bool splitJumpTables(MachineFunction &MF);
+
+ // Same as above but works on functions with profile information.
+ bool splitJumpTablesWithProfiles(MachineFunction &MF,
+ MachineJumpTableInfo &MJTI);
+
+public:
+ static char ID;
+
+ StaticDataSplitter() : MachineFunctionPass(ID) {
+ initializeStaticDataSplitterPass(*PassRegistry::getPassRegistry());
+ }
+
+ StringRef getPassName() const override { return "Static Data Splitter"; }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ MachineFunctionPass::getAnalysisUsage(AU);
+ AU.addRequired<MachineBranchProbabilityInfoWrapperPass>();
+ AU.addRequired<MachineBlockFrequencyInfoWrapperPass>();
+ AU.addRequired<ProfileSummaryInfoWrapperPass>();
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+};
+
+bool StaticDataSplitter::runOnMachineFunction(MachineFunction &MF) {
+ MBPI = &getAnalysis<MachineBranchProbabilityInfoWrapperPass>().getMBPI();
+ MBFI = &getAnalysis<MachineBlockFrequencyInfoWrapperPass>().getMBFI();
+ PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
+
+ // Split jump tables based on profile information. Subsequent patches will
+ // handle other data types like constant pools, module-internal data, etc.
+ return splitJumpTables(MF);
+}
+
+bool StaticDataSplitter::splitJumpTablesWithProfiles(
+ MachineFunction &MF, MachineJumpTableInfo &MJTI) {
+ int NumChangedJumpTables = 0;
+ // Regard a jump table as hot by default. If the source and all of destination
+ // blocks are cold, regard the jump table as cold.
+ DataHotness Hotness = DataHotness::Hot;
+ for (const auto &MBB : MF) {
+ // IMPORTANT, `getJumpTableIndex` is a thin wrapper around per-target
+ // interface `TargetInstrInfo::getjumpTableIndex`, and only X86 implements
+ // it so far.
+ const int JTI = MBB.getJumpTableIndex();
+ // This is not a source block of jump table.
+ if (JTI == -1)
+ continue;
+
+ bool AllBlocksCold = true;
+
+ if (!PSI->isColdBlock(&MBB, MBFI))
+ AllBlocksCold = false;
+
+ for (const MachineBasicBlock *MBB : MJTI.getJumpTables()[JTI].MBBs)
+ if (!PSI->isColdBlock(MBB, MBFI))
+ AllBlocksCold = false;
+
+ if (AllBlocksCold) {
+ Hotness = DataHotness::Cold;
+ ++NumColdJumpTables;
+ } else {
+ ++NumHotJumpTables;
+ }
+
+ MF.getJumpTableInfo()->updateJumpTableHotness(JTI, Hotness);
+ ++NumChangedJumpTables;
+ }
+ return NumChangedJumpTables > 0;
+}
+
+bool StaticDataSplitter::splitJumpTables(MachineFunction &MF) {
+ MachineJumpTableInfo *MJTI = MF.getJumpTableInfo();
+ if (!MJTI || MJTI->getJumpTables().empty())
+ return false;
+
+ // Place jump tables according to block hotness if block counters are
+ // available. Check function entry count because BFI depends on it to derive
+ // block counters.
+ if (PSI && PSI->hasProfileSummary() && MBFI &&
+ MF.getFunction().getEntryCount())
+ return splitJumpTablesWithProfiles(MF, *MJTI);
+
+ // Conservatively place all jump tables in the hot-suffixed section if profile
+ // information for the function is not available, or the target doesn't
+ // implement `TargetInstrInfo::getJumpTableIndex` yet.
+ for (size_t JTI = 0; JTI < MJTI->getJumpTables().size(); JTI++)
+ MF.getJumpTableInfo()->updateJumpTableHotness(JTI, DataHotness::Hot);
+
+ NumUnknownJumpTables += MJTI->getJumpTables().size();
+ return true;
+}
+
+char StaticDataSplitter::ID = 0;
+
+INITIALIZE_PASS_BEGIN(StaticDataSplitter, DEBUG_TYPE, "Split static data",
+ false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
+INITIALIZE_PASS_END(StaticDataSplitter, DEBUG_TYPE, "Split static data", false,
+ false)
+
+MachineFunctionPass *llvm::createStaticDataSplitterPass() {
+ return new StaticDataSplitter();
+}
diff --git a/llvm/lib/CodeGen/TargetPassConfig.cpp b/llvm/lib/CodeGen/TargetPassConfig.cpp
index d407e9f0871d4c..23929672f11d68 100644
--- a/llvm/lib/CodeGen/TargetPassConfig.cpp
+++ b/llvm/lib/CodeGen/TargetPassConfig.cpp
@@ -1256,6 +1256,7 @@ void TargetPassConfig::addMachinePasses() {
"performance.\n";
}
}
+ addPass(createStaticDataSplitterPass());
addPass(createMachineFunctionSplitterPass());
}
// We run the BasicBlockSections pass if either we need BB sections or BB
diff --git a/llvm/test/CodeGen/X86/jump-table-partition.ll b/llvm/test/CodeGen/X86/jump-table-partition.ll
new file mode 100644
index 00000000000000..3a8f1395f6b283
--- /dev/null
+++ b/llvm/test/CodeGen/X86/jump-table-partition.ll
@@ -0,0 +1,163 @@
+;; -stats requires asserts
+; requires: asserts
+
+; RUN: llc -stop-after=block-placement %s -o - | llc --run-pass=static-data-splitter -stats -x mir -o - 2>&1 | FileCheck %s --check-prefix=STAT
+
+; `func_with_hot_jumptable` contains a hot jump table and `func_with_cold_jumptable` contains a cold one.
+; `func_without_entry_count` simulates the functions without profile information (e.g., not instrumented or not profiled),
+; it's jump table hotness is unknown and regarded as hot conservatively.
+;
+; Tests stat messages are expected.
+; TODO: Update test to verify section suffixes when target-lowering and assembler changes are implemented.
+;
+; STAT-DAG: 1 static-data-splitter - Number of cold jump tables seen
+; STAT-DAG: 1 static-data-splitter - Number of hot jump tables seen
+; STAT-DAG: 1 static-data-splitter - Number of jump tables with unknown hotness
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+@.str.2 = private constant [7 x i8] c"case 3\00"
+@.str.3 = private constant [7 x i8] c"case 4\00"
+@.str.4 = private constant [7 x i8] c"case 5\00"
+@str.9 = private constant [7 x i8] c"case 2\00"
+@str.10 = private constant [7 x i8] c"case 1\00"
+@str.11 = private constant [8 x i8] c"default\00"
+
+define i32 @func_with_hot_jumptable(i32 %num) !prof !13 {
+entry:
+ switch i32 %num, label %sw.default [
+ i32 1, label %sw.bb
+ i32 2, label %sw.bb1
+ i32 3, label %sw.bb3
+ i32 4, label %sw.bb5
+ i32 5, label %sw.bb7
+ ], !prof !14
+
+sw.bb: ; preds = %entry
+ %puts11 = tail call i32 @puts(ptr @str.10)
+ br label %sw.epilog
+
+sw.bb1: ; preds = %entry
+ %puts = tail call i32 @puts(ptr @str.9)
+ br label %sw.epilog
+
+sw.bb3: ; preds = %entry
+ %call4 = tail call i32 (ptr, ...) @printf(ptr @.str.2)
+ br label %sw.bb5
+
+sw.bb5: ; preds = %entry, %sw.bb3
+ %call6 = tail call i32 (ptr, ...) @printf(ptr @.str.3)
+ br label %sw.bb7
+
+sw.bb7: ; preds = %entry, %sw.bb5
+ %call8 = tail call i32 (ptr, ...) @printf(ptr @.str.4)
+ br label %sw.epilog
+
+sw.default: ; preds = %entry
+ %puts12 = tail call i32 @puts(ptr @str.11)
+ br label %sw.epilog
+
+sw.epilog: ; preds = %sw.default, %sw.bb7, %sw.bb1, %sw.bb
+ %div = sdiv i32 %num, 3
+ ret i32 %div
+}
+
+define void @func_with_cold_jumptable(i32 %num) !prof !15 {
+entry:
+ switch i32 %num, label %sw.default [
+ i32 1, label %sw.bb
+ i32 2, label %sw.bb1
+ i32 3, label %sw.bb3
+ i32 4, label %sw.bb5
+ i32 5, label %sw.bb7
+ ], !prof !16
+
+sw.bb: ; preds = %entry
+ %puts10 = tail call i32 @puts(ptr @str.10)
+ br label %sw.epilog
+
+sw.bb1: ; preds = %entry
+ %puts = tail call i32 @puts(ptr @str.9)
+ br label %sw.epilog
+
+sw.bb3: ; preds = %entry
+ %call4 = tail call i32 (ptr, ...) @printf(ptr @.str.2)
+ br label %sw.bb5
+
+sw.bb5: ; preds = %entry, %sw.bb3
+ %call6 = tail call i32 (ptr, ...) @printf(ptr @.str.3)
+ br label %sw.bb7
+
+sw.bb7: ; preds = %entry, %sw.bb5
+ %call8 = tail call i32 (ptr, ...) @printf(ptr @.str.4)
+ br label %sw.epilog
+
+sw.default: ; preds = %entry
+ %puts11 = tail call i32 @puts(ptr @str.11)
+ br label %sw.epilog
+
+sw.epilog: ; preds = %sw.default, %sw.bb7, %sw.bb1, %sw.bb
+ ret void
+}
+
+define void @func_without_entry_count(i32 %num) {
+entry:
+ switch i32 %num, label %sw.default [
+ i32 1, label %sw.bb
+ i32 2, label %sw.bb1
+ i32 3, label %sw.bb3
+ i32 4, label %sw.bb5
+ i32 5, label %sw.bb7
+ ]
+
+sw.bb: ; preds = %entry
+ %puts10 = tail call i32 @puts(ptr @str.10)
+ br label %sw.epilog
+
+sw.bb1: ; preds = %entry
+ %puts = tail call i32 @puts(ptr @str.9)
+ br label %sw.epilog
+
+sw.bb3: ; preds = %entry
+ %call4 = tail call i32 (ptr, ...) @printf(ptr @.str.2)
+ br label %sw.bb5
+
+sw.bb5: ; preds = %entry, %sw.bb3
+ %call6 = tail call i32 (ptr, ...) @printf(ptr @.str.3)
+ br label %sw.bb7
+
+sw.bb7: ; preds = %entry, %sw.bb5
+ %call8 = tail call i32 (ptr, ...) @printf(ptr @.str.4)
+ br label %sw.epilog
+
+sw.default: ; preds = %entry
+ %puts11 = tail call i32 @puts(ptr @str.11)
+ br label %sw.epilog
+
+sw.epilog: ; preds = %sw.default, %sw.bb7, %sw.bb1, %sw.bb
+ ret void
+}
+
+declare i32 @puts(ptr)
+declare i32 @printf(ptr, ....
[truncated]
|
cc @Colibrow It seems I cannot request review from you in the Reviewers list, so tagging you here for feedback and thoughts! |
// available. Check function entry count because BFI depends on it to derive | ||
// block counters. | ||
if (PSI && PSI->hasProfileSummary() && MBFI && | ||
MF.getFunction().getEntryCount()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps use interface Function:hasProfileData()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (!PSI->isColdBlock(&MBB, MBFI)) | ||
AllBlocksCold = false; | ||
|
||
for (const MachineBasicBlock *MBB : MJTI.getJumpTables()[JTI].MBBs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the profile count of the source BB the sum of all target BB's count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the profile count of the source BB the sum of all target BB's count?
I expect this is the case for most cases, but a target BB can in theory end with a conditional branch to itself (as code transformation goes on).
While a destination block does not read a jump table (unless it's also a source block), a hot destination heuristically makes its jump table hot to accommodate for potential profile data skews (caused by debug information loss in the sampled binary for example). I added this in the code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I received multiple feedback about the rationale to count for destination block hotness (thanks @snehasish and @weiguozhi ), and it become clearer to me the argument to make up for profile inaccuracy doesn't really hold very well.
For now I updated the patch to use source block hotness only. I will do a sanity check (with and without destination block hotness) using a PGO binary, and update the data early next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do a sanity check (with and without destination block hotness) using a PGO binary, and update the data early next week.
The TL,DR is that counting destination blocks' hotness on top of counting source block hotness on hot / cold jump table size (as shown by the first table below), but it doesn't change the cold/hot jump table size ratio of a binary too much (as shown by the second table)
Moreover, I printed the function names in which destination block changed jump table hotness for two iFDO-optimized binaries, and manually checked a couple of such function's cycle percentage out of the whole binary. They are mostly cold functions themselves. So it's fine to not account for destination block hotness.
Data size
Binary | hot jump table size (in bytes) before | hot jump table size (in bytes) after | cold jump table size (in bytes) before | cold jump table size (in bytes) after |
---|---|---|---|---|
binary1 | 298736 | 326496 | 1510720 | 1482960 |
binary2 | 73912 | 81448 | 1298560 | 1291024 |
binary | cold / hot jump table size ratio before | cold / hot jump table size ratio after |
---|---|---|
binary 1 | 5.05704033 | 4.542046457 |
binary 2 | 17.56900097 | 15.85089873 |
cc @SharonXSharon It seems I cannot request review from you in the Reviewers list, so tagging you here for feedback and thoughts! |
if (!PSI->isColdBlock(MBB, MBFI)) | ||
AllBlocksCold = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to exit quickly upon finding the hot block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to exit quickly for the original code.
In the updated patch, only source block hotness is used.
llvm/include/llvm/CodeGen/Passes.h
Outdated
@@ -71,6 +71,10 @@ namespace llvm { | |||
/// using profile information. | |||
MachineFunctionPass *createMachineFunctionSplitterPass(); | |||
|
|||
/// createStaticDataSplitterPass - This pass partions static data sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: partitions
Also should it be "partitions static data objects into hot and cold sections"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -107,6 +109,8 @@ class MachineJumpTableInfo { | |||
return JumpTables; | |||
} | |||
|
|||
void updateJumpTableHotness(size_t JTI, DataHotness Hotness); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining this is for the jump table static data object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment for this method, renamed it to updateJumpTableEntryHotness
.
this is for the jump table static data object
It makes sense to comment about how the information is used. The comment in MachineJumpTableEntry
field does this.
|
||
/// MachineJumpTableEntry - One jump table in the jump table info. | ||
/// | ||
struct MachineJumpTableEntry { | ||
/// MBBs - The vector of basic blocks from which to create the jump table. | ||
std::vector<MachineBasicBlock*> MBBs; | ||
|
||
explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock*> &M) | ||
: MBBs(M) {} | ||
DataHotness Hotness; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for the new member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and described its usage in the comment.
llvm/lib/CodeGen/MachineFunction.cpp
Outdated
@@ -1291,6 +1291,10 @@ const unsigned MachineFunction::DebugOperandMemNumber = 1000000; | |||
// MachineJumpTableInfo implementation | |||
//===----------------------------------------------------------------------===// | |||
|
|||
MachineJumpTableEntry::MachineJumpTableEntry( | |||
const std::vector<MachineBasicBlock *> &MBBs) | |||
: MBBs(MBBs), Hotness(DataHotness::Unknown) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you initialize DataHotness Hotness = DataHotness::Unknown;
on L39 in MachineJumpTableInfo then you won't need to add this constructor here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is moved here for coding style considerations in the specific [1] code paths. The code is written in a way that MachineFunctionDataHotness
enum is declared but not defined in header files, and only cpp files see its definition if needed (e.g., llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp uses this field in line 672 in the follow up patch)
[1] For CodeGen, its header directory has {MachineJumpTableInfo.h, MachineConstantPool.h. MachineFunction.h}, and MachineFunction.cpp
defines functions that are declared in the three header files.
//===----------------------------------------------------------------------===// | ||
// | ||
// This pass uses profile information to partition static data sections into | ||
// hot and cold ones. It begins to split jump tables based on profile, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "It begins... internal data." maybe have a short bullet list like "The pass uses branch profile data to assign hotness based section qualifiers for the following types of static data:
- Jump Tables
- Constant pools (TODO)
- Constant strings (TODO)
- Other module internal data (TODO)"
Then we can update the TODOs as we go along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
// subsequent patches will handle constant pools and other module internal data. | ||
// | ||
// For the original RFC of this pass please see | ||
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop the period at the end so it's easier to copy paste etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Co-authored-by: Ellis Hoag <[email protected]>
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated test case is simpler by tuning down -min-jump-table-entries
(from a default of 4 to 2 for x86), and function @foo
has four jump tables (hot and cold interleaved to make {LJTI0_0
, LJTI0_2
} hot and the other two cold) for section testing.
Thanks for all the reviews and please take a another look!
@@ -107,6 +109,8 @@ class MachineJumpTableInfo { | |||
return JumpTables; | |||
} | |||
|
|||
void updateJumpTableHotness(size_t JTI, DataHotness Hotness); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment for this method, renamed it to updateJumpTableEntryHotness
.
this is for the jump table static data object
It makes sense to comment about how the information is used. The comment in MachineJumpTableEntry
field does this.
//===----------------------------------------------------------------------===// | ||
// | ||
// This pass uses profile information to partition static data sections into | ||
// hot and cold ones. It begins to split jump tables based on profile, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
// subsequent patches will handle constant pools and other module internal data. | ||
// | ||
// For the original RFC of this pass please see | ||
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (!PSI->isColdBlock(MBB, MBFI)) | ||
AllBlocksCold = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to exit quickly for the original code.
In the updated patch, only source block hotness is used.
|
||
/// MachineJumpTableEntry - One jump table in the jump table info. | ||
/// | ||
struct MachineJumpTableEntry { | ||
/// MBBs - The vector of basic blocks from which to create the jump table. | ||
std::vector<MachineBasicBlock*> MBBs; | ||
|
||
explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock*> &M) | ||
: MBBs(M) {} | ||
DataHotness Hotness; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and described its usage in the comment.
llvm/lib/CodeGen/MachineFunction.cpp
Outdated
@@ -1291,6 +1291,10 @@ const unsigned MachineFunction::DebugOperandMemNumber = 1000000; | |||
// MachineJumpTableInfo implementation | |||
//===----------------------------------------------------------------------===// | |||
|
|||
MachineJumpTableEntry::MachineJumpTableEntry( | |||
const std::vector<MachineBasicBlock *> &MBBs) | |||
: MBBs(MBBs), Hotness(DataHotness::Unknown) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is moved here for coding style considerations in the specific [1] code paths. The code is written in a way that MachineFunctionDataHotness
enum is declared but not defined in header files, and only cpp files see its definition if needed (e.g., llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp uses this field in line 672 in the follow up patch)
[1] For CodeGen, its header directory has {MachineJumpTableInfo.h, MachineConstantPool.h. MachineFunction.h}, and MachineFunction.cpp
defines functions that are declared in the three header files.
auto Hotness = MachineFunctionDataHotness::Hot; | ||
|
||
// Hotness is based on source basic block hotness. | ||
if (PSI->isColdBlock(&MBB, MBFI)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSI->isColdBlock(..) is about instruction coldness, it might be worth decouple this and introduce a new API to query data access hotness with option control. Can be done a a follow up patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (PSI->isColdBlock(&MBB, MBFI)) | ||
Hotness = MachineFunctionDataHotness::Cold; | ||
|
||
if (MF.getJumpTableInfo()->updateJumpTableEntryHotness(JTI, Hotness)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 'MJTI' instead of 'MF.getJumpTableInfo()'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
%zero = icmp eq i32 %num, 0 | ||
br i1 %zero, label %hot, label %cold, !prof !15 | ||
|
||
hot: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"hot:" --> "cold:"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. And fixed comment in line 17 and line 18 (jt{0,1} are hot and jt{2,3} are cold) accordingly.
%c2cmp = icmp ne i32 %c2, 0 | ||
br i1 %c2cmp, label %return, label %jt3.prologue, !prof !16 | ||
|
||
cold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cold:" --> "hot:"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but maybe wait to see if anyone else has comments
Co-authored-by: Ellis Hoag <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a few minor comments. Thanks!
; RUN: llc --run-pass=static-data-splitter -stats -x mir %t.mir -o - 2>&1 | FileCheck %s --check-prefix=STAT | ||
|
||
; Tests stat messages are expected. | ||
; TODO: Update test to verify section suffixes when target-lowering and assembler changes are implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
looks like a directive here, maybe use COM:
and then note the todo?
https://llvm.org/docs/CommandGuide/FileCheck.html#the-com-directive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
/// MachineJumpTableEntry - One jump table in the jump table info. | ||
/// | ||
struct MachineJumpTableEntry { | ||
/// MBBs - The vector of basic blocks from which to create the jump table. | ||
std::vector<MachineBasicBlock*> MBBs; | ||
|
||
explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock*> &M) | ||
: MBBs(M) {} | ||
MachineFunctionDataHotness Hotness; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for this member to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
llvm/lib/CodeGen/MachineFunction.cpp
Outdated
assert(JTI < JumpTables.size() && "Invalid JTI!"); | ||
// Note record the largest hotness is important for mergable data (constant | ||
// pools). Even if jump table instances are not merged, record the largest | ||
// value seen fwiw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop 'fwiw`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} | ||
|
||
bool StaticDataSplitter::splitJumpTablesWithProfiles( | ||
MachineFunction &MF, MachineJumpTableInfo &MJTI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think MachineFunction can be const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/11/builds/11471 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/63/builds/3635 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/127/builds/2128 Here is the relevant piece of the build log for the reference
|
This patch seems to have broken a number of build bots. Would you mind reverting until a fix is ready? From the error message, it seems like the new test is failing to match correctly in some cases, so maybe its an issue w/
|
Thanks for the workaround in 96410ed :) |
Thanks for the message. Yeah more buildbot failures were observed overnight.. As it's new code path and not enabled by default, I decided to just disable the new test in 96410ed this morning. I pushed 3dec24d to check stat messages in order and re-enable the test everywhere. I'll keep an eye on it (and I still need to figure out how |
@@ -0,0 +1,177 @@ | |||
; -stats requires asserts | |||
; requires: asserts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be REQUIRES
and not requires
. I observed the test being run on a bot I specifically setup to run without assertions enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm able to reproduce https://lab.llvm.org/buildbot/#/builders/127/builds/2148 and the fix 1688c87 was submitted.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/49/builds/1212 Here is the relevant piece of the build log for the reference
|
…es' (read-only) data sections (#122215) #122183 adds a codegen pass to infer machine jump table entry's hotness from the MBB hotness. This is a follow-up PR to produce `.hot` and or `.unlikely` section prefix for jump table's (read-only) data sections in the relocatable `.o` files. When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`, `.rodata.unlikely`} in input sections. It can map `.rodata.hot` and `.rodata` in the input sections to `.rodata.hot` in the executable, and map `.rodata.unlikely` into `.rodata` with a pending extension to `--keep-text-section-prefix` like 059e7cb, or with a linker script. 1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group. 2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement `TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on MJTE's hotness. - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does. 3. The new code path is off by default. - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it readable through `TargetMachine`. - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in `CodeGen/CommandFlags.h/cpp`. - A subsequent patch ([draft](8f36a13)) will add a clang option to enable the new code path. --------- Co-authored-by: Ellis Hoag <[email protected]>
…r jump tables' (read-only) data sections (#122215) llvm/llvm-project#122183 adds a codegen pass to infer machine jump table entry's hotness from the MBB hotness. This is a follow-up PR to produce `.hot` and or `.unlikely` section prefix for jump table's (read-only) data sections in the relocatable `.o` files. When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`, `.rodata.unlikely`} in input sections. It can map `.rodata.hot` and `.rodata` in the input sections to `.rodata.hot` in the executable, and map `.rodata.unlikely` into `.rodata` with a pending extension to `--keep-text-section-prefix` like llvm/llvm-project@059e7cb, or with a linker script. 1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group. 2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement `TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on MJTE's hotness. - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does. 3. The new code path is off by default. - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it readable through `TargetMachine`. - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in `CodeGen/CommandFlags.h/cpp`. - A subsequent patch ([draft](llvm/llvm-project@8f36a13)) will add a clang option to enable the new code path. --------- Co-authored-by: Ellis Hoag <[email protected]>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections.
This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce
.hot
section suffix. The follow up PR #122215 implements such changes.