Skip to content

[CodeGen] Introduce Static Data Splitter pass #122183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions llvm/include/llvm/CodeGen/MachineBasicBlock.h
Original file line number Diff line number Diff line change
Expand Up @@ -997,6 +997,12 @@ class MachineBasicBlock
/// no changes occurred in the meantime.
bool canSplitCriticalEdge(const MachineBasicBlock *Succ) const;

/// Return an index for MachineJumpTableInfo if \p this basic block ends with
/// an indirect jump using a jump table, otherwise -1.
/// This function is a thin wrapper and forward calls to the per-target method
/// `TargetInstrInfo::getjumpTableIndex`.
int getJumpTableIndex() const;

void pop_front() { Insts.pop_front(); }
void pop_back() { Insts.pop_back(); }
void push_back(MachineInstr *MI) { Insts.push_back(MI); }
Expand Down
9 changes: 9 additions & 0 deletions llvm/include/llvm/CodeGen/MachineFunction.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,15 @@ template <> struct ilist_callback_traits<MachineBasicBlock> {
}
};

// The hotness of static data tracked by a MachineFunction and not represented
// as a global object in the module IR / MIR. Typical examples are
// MachineJumpTableInfo and MachineConstantPool.
enum class DataHotness {
Unknown,
Cold,
Hot,
};

/// MachineFunctionInfo - This class can be derived from and used by targets to
/// hold private target-specific information for each MachineFunction. Objects
/// of type are accessed/created with MF::getInfo and destroyed when the
Expand Down
8 changes: 6 additions & 2 deletions llvm/include/llvm/CodeGen/MachineJumpTableInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,17 @@ namespace llvm {
class MachineBasicBlock;
class DataLayout;
class raw_ostream;
enum class DataHotness;

/// MachineJumpTableEntry - One jump table in the jump table info.
///
struct MachineJumpTableEntry {
/// MBBs - The vector of basic blocks from which to create the jump table.
std::vector<MachineBasicBlock*> MBBs;

explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock*> &M)
: MBBs(M) {}
DataHotness Hotness;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for the new member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and described its usage in the comment.


explicit MachineJumpTableEntry(const std::vector<MachineBasicBlock *> &M);
};

class MachineJumpTableInfo {
Expand Down Expand Up @@ -107,6 +109,8 @@ class MachineJumpTableInfo {
return JumpTables;
}

void updateJumpTableHotness(size_t JTI, DataHotness Hotness);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining this is for the jump table static data object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment for this method, renamed it to updateJumpTableEntryHotness.

this is for the jump table static data object

It makes sense to comment about how the information is used. The comment in MachineJumpTableEntry field does this.


/// RemoveJumpTable - Mark the specific index as being dead. This will
/// prevent it from being emitted.
void RemoveJumpTable(unsigned Idx) {
Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/CodeGen/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ namespace llvm {
/// using profile information.
MachineFunctionPass *createMachineFunctionSplitterPass();

/// createStaticDataSplitterPass - This pass partions static data sections
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: partitions

Also should it be "partitions static data objects into hot and cold sections"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// into a hot and cold section using profile information.
MachineFunctionPass *createStaticDataSplitterPass();

/// MachineFunctionPrinter pass - This pass prints out the machine function to
/// the given stream as a debugging tool.
MachineFunctionPass *
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/InitializePasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &);
void initializeSpillPlacementWrapperLegacyPass(PassRegistry &);
void initializeStackColoringLegacyPass(PassRegistry &);
void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &);
void initializeStaticDataSplitterPass(PassRegistry &);
void initializeStackMapLivenessPass(PassRegistry &);
void initializeStackProtectorPass(PassRegistry &);
void initializeStackSafetyGlobalInfoWrapperPassPass(PassRegistry &);
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Passes/MachinePassRegistry.def
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ DUMMY_MACHINE_FUNCTION_PASS("livedebugvalues", LiveDebugValuesPass)
DUMMY_MACHINE_FUNCTION_PASS("lrshrink", LiveRangeShrinkPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-combiner", MachineCombinerPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass)
DUMMY_MACHINE_FUNCTION_PASS("static-data-splitter", StaticDataSplitter)
DUMMY_MACHINE_FUNCTION_PASS("machine-function-splitter", MachineFunctionSplitterPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-latecleanup", MachineLateInstrsCleanupPass)
DUMMY_MACHINE_FUNCTION_PASS("machine-sanmd", MachineSanitizerBinaryMetadata)
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,7 @@ add_llvm_component_library(LLVMCodeGen
StackMaps.cpp
StackProtector.cpp
StackSlotColoring.cpp
StaticDataSplitter.cpp
SwiftErrorValueTracking.cpp
SwitchLoweringUtils.cpp
TailDuplication.cpp
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/CodeGen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeStackMapLivenessPass(Registry);
initializeStackProtectorPass(Registry);
initializeStackSlotColoringPass(Registry);
initializeStaticDataSplitterPass(Registry);
initializeStripDebugMachineModulePass(Registry);
initializeTailDuplicateLegacyPass(Registry);
initializeTargetPassConfigPass(Registry);
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/MachineBasicBlock.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1426,6 +1426,10 @@ bool MachineBasicBlock::canSplitCriticalEdge(
return true;
}

int MachineBasicBlock::getJumpTableIndex() const {
return findJumpTableIndex(*this);
}

/// Prepare MI to be removed from its bundle. This fixes bundle flags on MI's
/// neighboring instructions so the bundle won't be broken by removing MI.
static void unbundleSingleMI(MachineInstr *MI) {
Expand Down
13 changes: 13 additions & 0 deletions llvm/lib/CodeGen/MachineFunction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1291,6 +1291,10 @@ const unsigned MachineFunction::DebugOperandMemNumber = 1000000;
// MachineJumpTableInfo implementation
//===----------------------------------------------------------------------===//

MachineJumpTableEntry::MachineJumpTableEntry(
const std::vector<MachineBasicBlock *> &MBBs)
: MBBs(MBBs), Hotness(DataHotness::Unknown) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you initialize DataHotness Hotness = DataHotness::Unknown; on L39 in MachineJumpTableInfo then you won't need to add this constructor here.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor is moved here for coding style considerations in the specific [1] code paths. The code is written in a way that MachineFunctionDataHotness enum is declared but not defined in header files, and only cpp files see its definition if needed (e.g., llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp uses this field in line 672 in the follow up patch)

[1] For CodeGen, its header directory has {MachineJumpTableInfo.h, MachineConstantPool.h. MachineFunction.h}, and MachineFunction.cpp defines functions that are declared in the three header files.


/// Return the size of each entry in the jump table.
unsigned MachineJumpTableInfo::getEntrySize(const DataLayout &TD) const {
// The size of a jump table entry is 4 bytes unless the entry is just the
Expand Down Expand Up @@ -1340,6 +1344,15 @@ unsigned MachineJumpTableInfo::createJumpTableIndex(
return JumpTables.size()-1;
}

void MachineJumpTableInfo::updateJumpTableHotness(size_t JTI,
DataHotness Hotness) {
assert(JTI < JumpTables.size() && "Invalid JTI!");
// Note record the largest hotness is important for mergable data (constant
// pools). Even if jump table instances are not merged, record the largest
// value seen fwiw.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop 'fwiw`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

JumpTables[JTI].Hotness = std::max(JumpTables[JTI].Hotness, Hotness);
}

/// If Old is the target of any jump tables, update the jump tables to branch
/// to New instead.
bool MachineJumpTableInfo::ReplaceMBBInJumpTables(MachineBasicBlock *Old,
Expand Down
154 changes: 154 additions & 0 deletions llvm/lib/CodeGen/StaticDataSplitter.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
//===- StaticDataSplitter.cpp ---------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This pass uses profile information to partition static data sections into
// hot and cold ones. It begins to split jump tables based on profile, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "It begins... internal data." maybe have a short bullet list like "The pass uses branch profile data to assign hotness based section qualifiers for the following types of static data:

  • Jump Tables
  • Constant pools (TODO)
  • Constant strings (TODO)
  • Other module internal data (TODO)"

Then we can update the TODOs as we go along.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// subsequent patches will handle constant pools and other module internal data.
//
// For the original RFC of this pass please see
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop the period at the end so it's easier to copy paste etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/CodeGen/MBFIWrapper.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"

using namespace llvm;

#define DEBUG_TYPE "static-data-splitter"

STATISTIC(NumHotJumpTables, "Number of hot jump tables seen");
STATISTIC(NumColdJumpTables, "Number of cold jump tables seen");
STATISTIC(NumUnknownJumpTables,
"Number of jump tables with unknown hotness. Such jump tables will "
"be placed in the hot-suffixed section by default.");

class StaticDataSplitter : public MachineFunctionPass {
const MachineBranchProbabilityInfo *MBPI = nullptr;
const MachineBlockFrequencyInfo *MBFI = nullptr;
const ProfileSummaryInfo *PSI = nullptr;

// Returns true iff any jump table is hot-cold categorized.
bool splitJumpTables(MachineFunction &MF);

// Same as above but works on functions with profile information.
bool splitJumpTablesWithProfiles(MachineFunction &MF,
MachineJumpTableInfo &MJTI);

public:
static char ID;

StaticDataSplitter() : MachineFunctionPass(ID) {
initializeStaticDataSplitterPass(*PassRegistry::getPassRegistry());
}

StringRef getPassName() const override { return "Static Data Splitter"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {
MachineFunctionPass::getAnalysisUsage(AU);
AU.addRequired<MachineBranchProbabilityInfoWrapperPass>();
AU.addRequired<MachineBlockFrequencyInfoWrapperPass>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();
}

bool runOnMachineFunction(MachineFunction &MF) override;
};

bool StaticDataSplitter::runOnMachineFunction(MachineFunction &MF) {
MBPI = &getAnalysis<MachineBranchProbabilityInfoWrapperPass>().getMBPI();
MBFI = &getAnalysis<MachineBlockFrequencyInfoWrapperPass>().getMBFI();
PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();

// Split jump tables based on profile information. Subsequent patches will
// handle other data types like constant pools, module-internal data, etc.
return splitJumpTables(MF);
}

bool StaticDataSplitter::splitJumpTablesWithProfiles(
MachineFunction &MF, MachineJumpTableInfo &MJTI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MachineFunction can be const.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

int NumChangedJumpTables = 0;
// Regard a jump table as hot by default. If the source and all of destination
// blocks are cold, regard the jump table as cold. While a destination block
// does not read a jump table (unless it's also a source block), a hot
// destination heuristically makes its jump table hot to accommodate for
// potential profile data skews (from sampled profiles, for example).
DataHotness Hotness = DataHotness::Hot;
for (const auto &MBB : MF) {
// IMPORTANT, `getJumpTableIndex` is a thin wrapper around per-target
// interface `TargetInstrInfo::getjumpTableIndex`, and only X86 implements
// it so far.
const int JTI = MBB.getJumpTableIndex();
// This is not a source block of jump table.
if (JTI == -1)
continue;

bool AllBlocksCold = true;

if (!PSI->isColdBlock(&MBB, MBFI))
AllBlocksCold = false;

for (const MachineBasicBlock *MBB : MJTI.getJumpTables()[JTI].MBBs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the profile count of the source BB the sum of all target BB's count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the profile count of the source BB the sum of all target BB's count?

I expect this is the case for most cases, but a target BB can in theory end with a conditional branch to itself (as code transformation goes on).

While a destination block does not read a jump table (unless it's also a source block), a hot destination heuristically makes its jump table hot to accommodate for potential profile data skews (caused by debug information loss in the sampled binary for example). I added this in the code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I received multiple feedback about the rationale to count for destination block hotness (thanks @snehasish and @weiguozhi ), and it become clearer to me the argument to make up for profile inaccuracy doesn't really hold very well.

For now I updated the patch to use source block hotness only. I will do a sanity check (with and without destination block hotness) using a PGO binary, and update the data early next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do a sanity check (with and without destination block hotness) using a PGO binary, and update the data early next week.

The TL,DR is that counting destination blocks' hotness on top of counting source block hotness on hot / cold jump table size (as shown by the first table below), but it doesn't change the cold/hot jump table size ratio of a binary too much (as shown by the second table)

Moreover, I printed the function names in which destination block changed jump table hotness for two iFDO-optimized binaries, and manually checked a couple of such function's cycle percentage out of the whole binary. They are mostly cold functions themselves. So it's fine to not account for destination block hotness.


Data size

Binary hot jump table size (in bytes) before hot jump table size (in bytes) after cold jump table size (in bytes) before cold jump table size (in bytes) after
binary1 298736 326496 1510720 1482960
binary2 73912 81448 1298560 1291024
binary cold / hot jump table size ratio before cold / hot jump table size ratio after
binary 1 5.05704033 4.542046457
binary 2 17.56900097 15.85089873

if (!PSI->isColdBlock(MBB, MBFI))
AllBlocksCold = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to exit quickly upon finding the hot block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to exit quickly for the original code.

In the updated patch, only source block hotness is used.


if (AllBlocksCold) {
Hotness = DataHotness::Cold;
++NumColdJumpTables;
} else {
++NumHotJumpTables;
}

MF.getJumpTableInfo()->updateJumpTableHotness(JTI, Hotness);
++NumChangedJumpTables;
}
return NumChangedJumpTables > 0;
}

bool StaticDataSplitter::splitJumpTables(MachineFunction &MF) {
MachineJumpTableInfo *MJTI = MF.getJumpTableInfo();
if (!MJTI || MJTI->getJumpTables().empty())
return false;

// Place jump tables according to block hotness if function has profile data.
if (PSI && PSI->hasProfileSummary() && MBFI &&
MF.getFunction().hasProfileData())
return splitJumpTablesWithProfiles(MF, *MJTI);

// Conservatively place all jump tables in the hot-suffixed section if profile
// information for the function is not available, or the target doesn't
// implement `TargetInstrInfo::getJumpTableIndex` yet.
for (size_t JTI = 0; JTI < MJTI->getJumpTables().size(); JTI++)
MF.getJumpTableInfo()->updateJumpTableHotness(JTI, DataHotness::Hot);

NumUnknownJumpTables += MJTI->getJumpTables().size();
return true;
}

char StaticDataSplitter::ID = 0;

INITIALIZE_PASS_BEGIN(StaticDataSplitter, DEBUG_TYPE, "Split static data",
false, false)
INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(StaticDataSplitter, DEBUG_TYPE, "Split static data", false,
false)

MachineFunctionPass *llvm::createStaticDataSplitterPass() {
return new StaticDataSplitter();
}
7 changes: 7 additions & 0 deletions llvm/lib/CodeGen/TargetPassConfig.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,11 @@ static cl::opt<bool>
GCEmptyBlocks("gc-empty-basic-blocks", cl::init(false), cl::Hidden,
cl::desc("Enable garbage-collecting empty basic blocks"));

static cl::opt<bool>
SplitStaticData("split-static-data", cl::Hidden, cl::init(false),
cl::desc("Split static data sections into hot and cold "
"section ones using profile information"));

/// Allow standard passes to be disabled by command line options. This supports
/// simple binary flags that either suppress the pass or do nothing.
/// i.e. -disable-mypass=false has no effect.
Expand Down Expand Up @@ -1257,6 +1262,8 @@ void TargetPassConfig::addMachinePasses() {
}
}
addPass(createMachineFunctionSplitterPass());
if (SplitStaticData)
addPass(createStaticDataSplitterPass());
}
// We run the BasicBlockSections pass if either we need BB sections or BB
// address map (or both).
Expand Down
Loading
Loading