-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[memprof] Use CallStackRadixTreeBuilder in the V3 format #94708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[memprof] Use CallStackRadixTreeBuilder in the V3 format #94708
Conversation
This patch integrates CallStackRadixTreeBuilder into the V3 format, reducing the profile size to about 27% of the V2 profile size. - Serialization: writeMemProfCallStackArray just needs to write out the radix tree array prepared by CallStackRadixTreeBuilder. Mappings from CallStackIds to LinearCallStackIds are moved by new function CallStackRadixTreeBuilder::takeCallStackPos. - Deserialization: Deserializing a call stack is the same as deserializing an array encoded in the obvious manner -- the length followed by the payload, except that we need to follow a pointer to the parent to take advantage of common prefixes once in a while. This patch teaches LinearCallStackIdConverter to how to handle those pointers.
@llvm/pr-subscribers-pgo Author: Kazu Hirata (kazutakahirata) ChangesThis patch integrates CallStackRadixTreeBuilder into the V3 format,
Full diff: https://github.com/llvm/llvm-project/pull/94708.diff 2 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 6b0fa6cd6541..9db0113be5de 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -900,9 +900,17 @@ struct LinearCallStackIdConverter {
Frames.reserve(NumFrames);
for (; NumFrames; --NumFrames) {
LinearFrameId Elem =
- support::endian::readNext<LinearFrameId, llvm::endianness::little>(
- Ptr);
+ support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+ // Follow a pointer to the parent, if any.
+ if (static_cast<std::make_signed_t<LinearFrameId>>(Elem) < 0) {
+ Ptr += (-Elem) * sizeof(LinearFrameId);
+ Elem =
+ support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+ }
+ // We shouldn't encounter another pointer.
+ assert(static_cast<std::make_signed_t<LinearFrameId>>(Elem) >= 0);
Frames.push_back(FrameIdToFrame(Elem));
+ Ptr += sizeof(LinearFrameId);
}
return Frames;
@@ -1028,6 +1036,10 @@ class CallStackRadixTreeBuilder {
getCallStackPos() const {
return CallStackPos;
}
+
+ llvm::DenseMap<CallStackId, LinearCallStackId> takeCallStackPos() {
+ return std::move(CallStackPos);
+ }
};
// Verify that each CallStackId is computed with hashCallStack. This function
diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index e58e6b8acfc8..a73f72a534f1 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -547,19 +547,11 @@ writeMemProfCallStackArray(
llvm::DenseMap<memprof::CallStackId, memprof::LinearCallStackId>
MemProfCallStackIndexes;
- MemProfCallStackIndexes.reserve(MemProfCallStackData.size());
- uint64_t CallStackBase = OS.tell();
- for (const auto &[CSId, CallStack] : MemProfCallStackData) {
- memprof::LinearCallStackId CallStackIndex =
- (OS.tell() - CallStackBase) / sizeof(memprof::LinearCallStackId);
- MemProfCallStackIndexes.insert({CSId, CallStackIndex});
- const llvm::SmallVector<memprof::FrameId> CS = CallStack;
- OS.write32(CS.size());
- for (const auto F : CS) {
- assert(MemProfFrameIndexes.contains(F));
- OS.write32(MemProfFrameIndexes[F]);
- }
- }
+ memprof::CallStackRadixTreeBuilder Builder;
+ Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes);
+ for (auto I : Builder.getRadixArray())
+ OS.write32(I);
+ MemProfCallStackIndexes = Builder.takeCallStackPos();
// Release the memory of this vector as it is no longer needed.
MemProfCallStackData.clear();
@@ -695,8 +687,8 @@ static Error writeMemProfV2(ProfOStream &OS,
// uint64_t Schema entry 1
// ....
// uint64_t Schema entry N - 1
-// OnDiskChainedHashTable MemProfFrameData
-// OnDiskChainedHashTable MemProfCallStackData
+// Frames serialized one after another
+// Call stacks encoded as a radix tree
// OnDiskChainedHashTable MemProfRecordData
static Error writeMemProfV3(ProfOStream &OS,
memprof::IndexedMemProfData &MemProfData,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a couple of suggestions
@@ -1028,6 +1036,10 @@ class CallStackRadixTreeBuilder { | |||
getCallStackPos() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove getCallStackPos and replace its users with takeCallStackPos? Looks like existing callers are all unit tests where we could do this.
support::endian::readNext<LinearFrameId, llvm::endianness::little>( | ||
Ptr); | ||
support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr); | ||
// Follow a pointer to the parent, if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe point to the comment block further down that describes the radix tree format.
Expand a comment to point to CallStackRadixTreeBuilder.
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
Serialization: writeMemProfCallStackArray just needs to write out
the radix tree array prepared by CallStackRadixTreeBuilder.
Mappings from CallStackIds to LinearCallStackIds are moved by new
function CallStackRadixTreeBuilder::takeCallStackPos.
Deserialization: Deserializing a call stack is the same as
deserializing an array encoded in the obvious manner -- the length
followed by the payload, except that we need to follow a pointer to
the parent to take advantage of common prefixes once in a while.
This patch teaches LinearCallStackIdConverter to how to handle those
pointers.