Skip to content

[BOLT] Support pre-aggregated basic sample profile #140196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 29 additions & 16 deletions bolt/include/bolt/Profile/DataAggregator.h
Original file line number Diff line number Diff line change
Expand Up @@ -370,33 +370,46 @@ class DataAggregator : public DataReader {
/// memory.
///
/// File format syntax:
/// {B|F|f|T} [<start_id>:]<start_offset> [<end_id>:]<end_offset> [<ft_end>]
/// <count> [<mispred_count>]
/// E <event>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify multiple events per file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is possible to group samples by event name, but currently we don't have any logic based on the event names other than passing them through to fdata and YAML.

/// S <start> <count>
/// T <start> <end> <ft_end> <count>
/// B <start> <end> <count> <mispred_count>
/// [Ff] <start> <end> <count>
///
/// B - indicates an aggregated branch
/// F - an aggregated fall-through
/// where <start>, <end>, <ft_end> have the format [<id>:]<offset>
///
/// E - name of the sampling event used for subsequent entries
/// S - indicates an aggregated basic sample at <start>
/// B - indicates an aggregated branch from <start> to <end>
/// F - an aggregated fall-through from <start> to <end>
/// f - an aggregated fall-through with external origin - used to disambiguate
/// between a return hitting a basic block head and a regular internal
/// jump to the block
/// T - an aggregated trace: branch with a fall-through (from, to, ft_end)
///
/// <start_id> - build id of the object containing the start address. We can
/// skip it for the main binary and use "X" for an unknown object. This will
/// save some space and facilitate human parsing.
///
/// <start_offset> - hex offset from the object base load address (0 for the
/// main executable unless it's PIE) to the start address.
/// T - an aggregated trace: branch from <start> to <end> with a fall-through
/// to <ft_end>
///
/// <end_id>, <end_offset> - same for the end address.
/// <id> - build id of the object containing the address. We can skip it for
/// the main binary and use "X" for an unknown object. This will save some
/// space and facilitate human parsing.
///
/// <ft_end> - same for the fallthrough_end address.
/// <offset> - hex offset from the object base load address (0 for the
/// main executable unless it's PIE) to the address.
///
/// <count> - total aggregated count of the branch or a fall-through.
/// <count> - total aggregated count.
///
/// <mispred_count> - the number of times the branch was mispredicted.
/// Omitted for fall-throughs.
///
/// Example:
/// Basic samples profile:
/// E cycles
/// S 41be50 3
/// E br_inst_retired.near_taken
/// S 41be60 6
///
/// Trace profile combining branches and fall-throughs:
/// T 4b196f 4b19e0 4b19ef 2
///
/// Legacy branch profile with separate branches and fall-throughs:
/// F 41be50 41be50 3
/// F 41be90 41be90 4
/// B 4b1942 39b57f0 3 0
Expand Down
130 changes: 80 additions & 50 deletions bolt/lib/Profile/DataAggregator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1204,77 +1204,106 @@ ErrorOr<Location> DataAggregator::parseLocationOrOffset() {
}

std::error_code DataAggregator::parseAggregatedLBREntry() {
while (checkAndConsumeFS()) {
}
enum AggregatedLBREntry : char {
INVALID = 0,
EVENT_NAME, // E
TRACE, // T
SAMPLE, // S
BRANCH, // B
FT, // F
FT_EXTERNAL_ORIGIN // f
} Type = INVALID;

// The number of fields to parse, set based on Type.
int AddrNum = 0;
int CounterNum = 0;
// Storage for parsed fields.
StringRef EventName;
std::optional<Location> Addr[3];
int64_t Counters[2];

while (Type == INVALID || Type == EVENT_NAME) {
while (checkAndConsumeFS()) {
}
ErrorOr<StringRef> StrOrErr =
parseString(FieldSeparator, Type == EVENT_NAME);
if (std::error_code EC = StrOrErr.getError())
return EC;
StringRef Str = StrOrErr.get();

ErrorOr<StringRef> TypeOrErr = parseString(FieldSeparator);
if (std::error_code EC = TypeOrErr.getError())
return EC;
enum AggregatedLBREntry { TRACE, BRANCH, FT, FT_EXTERNAL_ORIGIN, INVALID };
auto Type = StringSwitch<AggregatedLBREntry>(TypeOrErr.get())
.Case("T", TRACE)
.Case("B", BRANCH)
.Case("F", FT)
.Case("f", FT_EXTERNAL_ORIGIN)
.Default(INVALID);
if (Type == INVALID) {
reportError("expected T, B, F or f");
return make_error_code(llvm::errc::io_error);
}
if (Type == EVENT_NAME) {
EventName = Str;
break;
}

while (checkAndConsumeFS()) {
}
ErrorOr<Location> From = parseLocationOrOffset();
if (std::error_code EC = From.getError())
return EC;
Type = StringSwitch<AggregatedLBREntry>(Str)
.Case("T", TRACE)
.Case("S", SAMPLE)
.Case("E", EVENT_NAME)
.Case("B", BRANCH)
.Case("F", FT)
.Case("f", FT_EXTERNAL_ORIGIN)
.Default(INVALID);

if (Type == INVALID) {
reportError("expected T, S, E, B, F or f");
return make_error_code(llvm::errc::io_error);
}

while (checkAndConsumeFS()) {
using SSI = StringSwitch<int>;
AddrNum = SSI(Str).Case("T", 3).Case("S", 1).Case("E", 0).Default(2);
CounterNum = SSI(Str).Case("B", 2).Case("E", 0).Default(1);
}
ErrorOr<Location> To = parseLocationOrOffset();
if (std::error_code EC = To.getError())
return EC;

ErrorOr<Location> TraceFtEnd = std::error_code();
if (Type == AggregatedLBREntry::TRACE) {
for (int I = 0; I < AddrNum; ++I) {
while (checkAndConsumeFS()) {
}
TraceFtEnd = parseLocationOrOffset();
if (std::error_code EC = TraceFtEnd.getError())
ErrorOr<Location> AddrOrErr = parseLocationOrOffset();
if (std::error_code EC = AddrOrErr.getError())
return EC;
Addr[I] = AddrOrErr.get();
}

while (checkAndConsumeFS()) {
}
ErrorOr<int64_t> Frequency =
parseNumberField(FieldSeparator, Type != AggregatedLBREntry::BRANCH);
if (std::error_code EC = Frequency.getError())
return EC;

uint64_t Mispreds = 0;
if (Type == AggregatedLBREntry::BRANCH) {
for (int I = 0; I < CounterNum; ++I) {
while (checkAndConsumeFS()) {
}
ErrorOr<int64_t> MispredsOrErr = parseNumberField(FieldSeparator, true);
if (std::error_code EC = MispredsOrErr.getError())
ErrorOr<int64_t> CountOrErr =
parseNumberField(FieldSeparator, I + 1 == CounterNum);
if (std::error_code EC = CountOrErr.getError())
return EC;
Mispreds = static_cast<uint64_t>(MispredsOrErr.get());
Counters[I] = CountOrErr.get();
}

if (!checkAndConsumeNewLine()) {
reportError("expected end of line");
return make_error_code(llvm::errc::io_error);
}

BinaryFunction *FromFunc = getBinaryFunctionContainingAddress(From->Offset);
BinaryFunction *ToFunc = getBinaryFunctionContainingAddress(To->Offset);
if (Type == EVENT_NAME) {
EventNames.insert(EventName);
return std::error_code();
}

for (BinaryFunction *BF : {FromFunc, ToFunc})
if (BF)
BF->setHasProfileAvailable();
const uint64_t FromOffset = Addr[0]->Offset;
BinaryFunction *FromFunc = getBinaryFunctionContainingAddress(FromOffset);
if (FromFunc)
FromFunc->setHasProfileAvailable();

int64_t Count = Counters[0];
int64_t Mispreds = Counters[1];

if (Type == SAMPLE) {
BasicSamples[FromOffset] += Count;
NumTotalSamples += Count;
return std::error_code();
}

uint64_t Count = static_cast<uint64_t>(Frequency.get());
const uint64_t ToOffset = Addr[1]->Offset;
BinaryFunction *ToFunc = getBinaryFunctionContainingAddress(ToOffset);
if (ToFunc)
ToFunc->setHasProfileAvailable();

Trace Trace(From->Offset, To->Offset);
Trace Trace(FromOffset, ToOffset);
// Taken trace
if (Type == TRACE || Type == BRANCH) {
TakenBranchInfo &Info = BranchLBRs[Trace];
Expand All @@ -1285,8 +1314,9 @@ std::error_code DataAggregator::parseAggregatedLBREntry() {
}
// Construct fallthrough part of the trace
if (Type == TRACE) {
Trace.From = To->Offset;
Trace.To = TraceFtEnd->Offset;
const uint64_t TraceFtEndOffset = Addr[2]->Offset;
Trace.From = ToOffset;
Trace.To = TraceFtEndOffset;
Type = FromFunc == ToFunc ? FT : FT_EXTERNAL_ORIGIN;
}
// Add fallthrough trace
Expand Down
19 changes: 19 additions & 0 deletions bolt/test/X86/Inputs/pre-aggregated-basic.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
E cycles
S 4005f0 1
S 4005f0 1
S 400610 1
S 400ad1 2
S 400b10 1
S 400bb7 1
S 400bbc 2
S 400d90 1
S 400dae 1
S 400e00 2
S 401170 22
S 401180 58
S 4011a0 33
S 4011a9 33
S 4011ad 58
S 4011b2 22
S X:7f36d18d60c0 2
S X:7f36d18f2ce0 1
10 changes: 10 additions & 0 deletions bolt/test/X86/pre-aggregated-perf.test
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,16 @@ RUN: llvm-bolt %t.exe -o %t.bolt.yaml --pa -p %p/Inputs/pre-aggregated.txt \
RUN: --aggregate-only --profile-format=yaml --profile-use-dfs
RUN: cat %t.bolt.yaml | FileCheck %s -check-prefix=NEWFORMAT

## Test pre-aggregated basic profile
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated-basic.txt -o %t.ba \
RUN: 2>&1 | FileCheck %s --check-prefix=BASIC-ERROR
RUN: perf2bolt %t.exe -o %t --pa -p %p/Inputs/pre-aggregated-basic.txt -o %t.ba.nl \
RUN: -nl 2>&1 | FileCheck %s --check-prefix=BASIC-SUCCESS
RUN: FileCheck %s --input-file %t.ba.nl --check-prefix CHECK-BASIC-NL
BASIC-ERROR: BOLT-INFO: 0 out of 7 functions in the binary (0.0%) have non-empty execution profile
BASIC-SUCCESS: BOLT-INFO: 4 out of 7 functions in the binary (57.1%) have non-empty execution profile
CHECK-BASIC-NL: no_lbr cycles

PERF2BOLT: 0 [unknown] 7f36d18d60c0 1 main 53c 0 2
PERF2BOLT: 1 main 451 1 SolveCubic 0 0 2
PERF2BOLT: 1 main 490 0 [unknown] 4005f0 0 1
Expand Down
6 changes: 3 additions & 3 deletions bolt/test/link_fdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@
fdata_pat = re.compile(r"([01].*) (?P<exec>\d+) (?P<mispred>\d+)")

# Pre-aggregated profile:
# {T|B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> [<ft_end>]
# <count> [<mispred_count>]
preagg_pat = re.compile(r"(?P<type>[TBFf]) (?P<offsets_count>.*)")
# {T|S|E|B|F|f} <start> [<end>] [<ft_end>] <count> [<mispred_count>]
# <loc>: [<id>:]<offset>
preagg_pat = re.compile(r"(?P<type>[TSBFf]) (?P<offsets_count>.*)")

# No-LBR profile:
# <is symbol?> <closest elf symbol or DSO name> <relative address> <count>
Expand Down
Loading