Skip to content

[BOLT] Match functions with name similarity #95884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
fab60ab
[𝘀𝗽𝗿] changes to main this commit is based on
sayhaan Jun 18, 2024
7ac4c1a
[𝘀𝗽𝗿] initial version
shawbyoung Jun 18, 2024
0a5a829
[𝘀𝗽𝗿] changes introduced through rebase
shawbyoung Jun 21, 2024
190949c
spr amend
shawbyoung Jun 21, 2024
34652b2
spr amend
shawbyoung Jun 21, 2024
2d23bbd
spr amend
shawbyoung Jun 21, 2024
9e6bb26
spr amend
shawbyoung Jun 21, 2024
ff561cb
[𝘀𝗽𝗿] changes introduced through rebase
shawbyoung Jun 21, 2024
58993f9
spr amend
shawbyoung Jun 21, 2024
669afca
spr amend
shawbyoung Jun 21, 2024
9c021f8
spr amend
shawbyoung Jun 21, 2024
9fc1899
spr amend
shawbyoung Jun 21, 2024
d687bc0
Added test
shawbyoung Jun 24, 2024
bba68e6
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 24, 2024
2d99875
Changed DeriveNameSpace argument
shawbyoung Jun 24, 2024
550d6ac
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 25, 2024
be21785
spr diff --update-message
shawbyoung Jun 25, 2024
5493cd4
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 25, 2024
0ddd72b
Refactored DeriveNamespace and expanded flag description
shawbyoung Jun 25, 2024
5d3e8db
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 26, 2024
0eeeeb6
Removed tmp file
shawbyoung Jun 26, 2024
85c74bd
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 26, 2024
5da6a09
Small optimization
shawbyoung Jun 26, 2024
c40b612
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 28, 2024
e9974a9
Refactored into function
shawbyoung Jun 28, 2024
3885a7f
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 28, 2024
d00d613
Changed verbosity
shawbyoung Jun 28, 2024
60d7d92
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jun 28, 2024
c65bce0
Formatting
shawbyoung Jun 28, 2024
4c63f17
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jul 3, 2024
873afb2
Using BufferSize from getContextDeclCxtName
shawbyoung Jul 3, 2024
d61c640
[𝘀𝗽𝗿] changes introduced through rebase
maksfb Jul 3, 2024
a801ff3
Comments
shawbyoung Jul 3, 2024
32cce08
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Jul 3, 2024
1dab537
Rebase
shawbyoung Jul 3, 2024
60047ab
Changing buffer initialization in DeriveNamespace
shawbyoung Jul 3, 2024
df20265
[𝘀𝗽𝗿] changes introduced through rebase
aaupov Jul 3, 2024
2e0eae4
Rebase
shawbyoung Jul 3, 2024
39a6901
Rebase
shawbyoung Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions bolt/docs/CommandLineArgumentReference.md
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,10 @@

Use a modified clustering algorithm geared towards minimizing branches

- `--name-similarity-function-matching-threshold=<uint>`

Match functions using namespace and edit distance.

- `--no-inline`

Disable all inlining (overrides other inlining options)
Expand Down
3 changes: 3 additions & 0 deletions bolt/include/bolt/Profile/YAMLProfileReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ class YAMLProfileReader : public ProfileReaderBase {
ProfiledFunctions.emplace(&BF);
}

/// Matches functions with similarly named profiled functions.
uint64_t matchWithNameSimilarity(BinaryContext &BC);

/// Check if the profile uses an event with a given \p Name.
bool usesEvent(StringRef Name) const;
};
Expand Down
121 changes: 121 additions & 0 deletions bolt/lib/Profile/YAMLProfileReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@
#include "bolt/Core/BinaryFunction.h"
#include "bolt/Passes/MCF.h"
#include "bolt/Profile/ProfileYAMLMapping.h"
#include "bolt/Utils/NameResolver.h"
#include "bolt/Utils/Utils.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/edit_distance.h"
#include "llvm/Demangle/Demangle.h"
#include "llvm/Support/CommandLine.h"

using namespace llvm;
Expand All @@ -24,6 +27,11 @@ extern cl::OptionCategory BoltOptCategory;
extern cl::opt<bool> InferStaleProfile;
extern cl::opt<bool> Lite;

cl::opt<unsigned> NameSimilarityFunctionMatchingThreshold(
"name-similarity-function-matching-threshold",
cl::desc("Match functions using namespace and edit distance"), cl::init(0),
cl::Hidden, cl::cat(BoltOptCategory));

static llvm::cl::opt<bool>
IgnoreHash("profile-ignore-hash",
cl::desc("ignore hash while reading function profile"),
Expand Down Expand Up @@ -350,6 +358,111 @@ bool YAMLProfileReader::mayHaveProfileData(const BinaryFunction &BF) {
return false;
}

uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
uint64_t MatchedWithNameSimilarity = 0;
ItaniumPartialDemangler Demangler;

// Demangle and derive namespace from function name.
auto DemangleName = [&](std::string &FunctionName) {
StringRef RestoredName = NameResolver::restore(FunctionName);
return demangle(RestoredName);
};
auto DeriveNameSpace = [&](std::string &DemangledName) {
if (Demangler.partialDemangle(DemangledName.c_str()))
return std::string("");
std::vector<char> Buffer(DemangledName.begin(), DemangledName.end());
size_t BufferSize;
char *NameSpace =
Demangler.getFunctionDeclContextName(&Buffer[0], &BufferSize);
return std::string(NameSpace, BufferSize);
};

// Maps namespaces to associated function block counts and gets profile
// function names and namespaces to minimize the number of BFs to process and
// avoid repeated name demangling/namespace derivation.
StringMap<std::set<uint32_t>> NamespaceToProfiledBFSizes;
std::vector<std::string> ProfileBFDemangledNames;
ProfileBFDemangledNames.reserve(YamlBP.Functions.size());
std::vector<std::string> ProfiledBFNamespaces;
ProfiledBFNamespaces.reserve(YamlBP.Functions.size());

for (auto &YamlBF : YamlBP.Functions) {
std::string YamlBFDemangledName = DemangleName(YamlBF.Name);
ProfileBFDemangledNames.push_back(YamlBFDemangledName);
std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName);
ProfiledBFNamespaces.push_back(YamlBFNamespace);
NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks);
}

StringMap<std::vector<BinaryFunction *>> NamespaceToBFs;

// Maps namespaces to BFs excluding binary functions with no equal sized
// profiled functions belonging to the same namespace.
for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
std::string DemangledName = BF->getDemangledName();
std::string Namespace = DeriveNameSpace(DemangledName);

auto NamespaceToProfiledBFSizesIt =
NamespaceToProfiledBFSizes.find(Namespace);
// Skip if there are no ProfileBFs with a given \p Namespace.
if (NamespaceToProfiledBFSizesIt == NamespaceToProfiledBFSizes.end())
continue;
// Skip if there are no ProfileBFs in a given \p Namespace with
// equal number of blocks.
if (NamespaceToProfiledBFSizesIt->second.count(BF->size()) == 0)
continue;
auto NamespaceToBFsIt = NamespaceToBFs.find(Namespace);
if (NamespaceToBFsIt == NamespaceToBFs.end())
NamespaceToBFs[Namespace] = {BF};
else
NamespaceToBFsIt->second.push_back(BF);
}

// Iterates through all profiled functions and binary functions belonging to
// the same namespace and matches based on edit distance threshold.
assert(YamlBP.Functions.size() == ProfiledBFNamespaces.size() &&
ProfiledBFNamespaces.size() == ProfileBFDemangledNames.size());
for (size_t I = 0; I < YamlBP.Functions.size(); ++I) {
yaml::bolt::BinaryFunctionProfile &YamlBF = YamlBP.Functions[I];
std::string &YamlBFNamespace = ProfiledBFNamespaces[I];
if (YamlBF.Used)
continue;
// Skip if there are no BFs in a given \p Namespace.
auto It = NamespaceToBFs.find(YamlBFNamespace);
if (It == NamespaceToBFs.end())
continue;

std::string &YamlBFDemangledName = ProfileBFDemangledNames[I];
std::vector<BinaryFunction *> BFs = It->second;
unsigned MinEditDistance = UINT_MAX;
BinaryFunction *ClosestNameBF = nullptr;

// Determines BF the closest to the profiled function, in the
// same namespace.
for (BinaryFunction *BF : BFs) {
if (ProfiledFunctions.count(BF))
continue;
if (BF->size() != YamlBF.NumBasicBlocks)
continue;
std::string BFDemangledName = BF->getDemangledName();
unsigned BFEditDistance =
StringRef(BFDemangledName).edit_distance(YamlBFDemangledName);
if (BFEditDistance < MinEditDistance) {
MinEditDistance = BFEditDistance;
ClosestNameBF = BF;
}
}

if (ClosestNameBF &&
MinEditDistance <= opts::NameSimilarityFunctionMatchingThreshold) {
matchProfileToFunction(YamlBF, *ClosestNameBF);
++MatchedWithNameSimilarity;
}
}

return MatchedWithNameSimilarity;
}

Error YAMLProfileReader::readProfile(BinaryContext &BC) {
if (opts::Verbosity >= 1) {
outs() << "BOLT-INFO: YAML profile with hash: ";
Expand Down Expand Up @@ -461,6 +574,12 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
matchProfileToFunction(YamlBF, *BF);

// Uses name similarity to match functions that were not matched by name.
uint64_t MatchedWithNameSimilarity =
opts::NameSimilarityFunctionMatchingThreshold > 0
? matchWithNameSimilarity(BC)
: 0;

for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
if (!YamlBF.Used && opts::Verbosity >= 1)
errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name
Expand All @@ -473,6 +592,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
<< " functions with hash\n";
outs() << "BOLT-INFO: matched " << MatchedWithLTOCommonName
<< " functions with matching LTO common names\n";
outs() << "BOLT-INFO: matched " << MatchedWithNameSimilarity
<< " functions with similar names\n";
}

// Set for parseFunctionProfile().
Expand Down
63 changes: 63 additions & 0 deletions bolt/test/X86/name-similarity-function-matching.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Tests function matching in YAMLProfileReader by name similarity.

# REQUIRES: system-linux
# RUN: split-file %s %t
# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown %t/main.s -o %t.o
# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -nostdlib
# RUN: llvm-bolt %t.exe -o %t.out --data %t/yaml -v=2 \
# RUN: --print-cfg --name-similarity-function-matching-threshold=1 --funcs=main --profile-ignore-hash=0 2>&1 | FileCheck %s

# CHECK: BOLT-INFO: matched 1 functions with similar names

#--- main.s
.globl main
.type main, @function
main:
.cfi_startproc
.LBB00:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
testq %rax, %rax
js .LBB03
.LBB01:
jne .LBB04
.LBB02:
nop
.LBB03:
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.LBB04:
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
## For relocations against .text
.reloc 0, R_X86_64_NONE
.cfi_endproc
.size main, .-main

#--- yaml
---
header:
profile-version: 1
binary-name: 'hashing-based-function-matching.s.tmp.exe'
binary-build-id: '<unknown>'
profile-flags: [ lbr ]
profile-origin: branch profile reader
profile-events: ''
dfs-order: false
hash-func: xxh3
functions:
- name: main2
fid: 0
hash: 0x0000000000000001
exec: 1
nblocks: 5
blocks:
- bid: 1
insns: 1
succ: [ { bid: 3, cnt: 1} ]
...