Skip to content

[memprof] Use CallStackRadixTreeBuilder in the V3 format #94708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 7, 2024

Conversation

kazutakahirata
Copy link
Contributor

This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.

  • Serialization: writeMemProfCallStackArray just needs to write out
    the radix tree array prepared by CallStackRadixTreeBuilder.
    Mappings from CallStackIds to LinearCallStackIds are moved by new
    function CallStackRadixTreeBuilder::takeCallStackPos.

  • Deserialization: Deserializing a call stack is the same as
    deserializing an array encoded in the obvious manner -- the length
    followed by the payload, except that we need to follow a pointer to
    the parent to take advantage of common prefixes once in a while.
    This patch teaches LinearCallStackIdConverter to how to handle those
    pointers.

This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.

- Serialization: writeMemProfCallStackArray just needs to write out
  the radix tree array prepared by CallStackRadixTreeBuilder.
  Mappings from CallStackIds to LinearCallStackIds are moved by new
  function CallStackRadixTreeBuilder::takeCallStackPos.

- Deserialization: Deserializing a call stack is the same as
  deserializing an array encoded in the obvious manner -- the length
  followed by the payload, except that we need to follow a pointer to
  the parent to take advantage of common prefixes once in a while.
  This patch teaches LinearCallStackIdConverter to how to handle those
  pointers.
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jun 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 7, 2024

@llvm/pr-subscribers-pgo

Author: Kazu Hirata (kazutakahirata)

Changes

This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.

  • Serialization: writeMemProfCallStackArray just needs to write out
    the radix tree array prepared by CallStackRadixTreeBuilder.
    Mappings from CallStackIds to LinearCallStackIds are moved by new
    function CallStackRadixTreeBuilder::takeCallStackPos.

  • Deserialization: Deserializing a call stack is the same as
    deserializing an array encoded in the obvious manner -- the length
    followed by the payload, except that we need to follow a pointer to
    the parent to take advantage of common prefixes once in a while.
    This patch teaches LinearCallStackIdConverter to how to handle those
    pointers.


Full diff: https://github.com/llvm/llvm-project/pull/94708.diff

2 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/MemProf.h (+14-2)
  • (modified) llvm/lib/ProfileData/InstrProfWriter.cpp (+7-15)
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index 6b0fa6cd6541..9db0113be5de 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -900,9 +900,17 @@ struct LinearCallStackIdConverter {
     Frames.reserve(NumFrames);
     for (; NumFrames; --NumFrames) {
       LinearFrameId Elem =
-          support::endian::readNext<LinearFrameId, llvm::endianness::little>(
-              Ptr);
+          support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+      // Follow a pointer to the parent, if any.
+      if (static_cast<std::make_signed_t<LinearFrameId>>(Elem) < 0) {
+        Ptr += (-Elem) * sizeof(LinearFrameId);
+        Elem =
+            support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+      }
+      // We shouldn't encounter another pointer.
+      assert(static_cast<std::make_signed_t<LinearFrameId>>(Elem) >= 0);
       Frames.push_back(FrameIdToFrame(Elem));
+      Ptr += sizeof(LinearFrameId);
     }
 
     return Frames;
@@ -1028,6 +1036,10 @@ class CallStackRadixTreeBuilder {
   getCallStackPos() const {
     return CallStackPos;
   }
+
+  llvm::DenseMap<CallStackId, LinearCallStackId> takeCallStackPos() {
+    return std::move(CallStackPos);
+  }
 };
 
 // Verify that each CallStackId is computed with hashCallStack.  This function
diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index e58e6b8acfc8..a73f72a534f1 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -547,19 +547,11 @@ writeMemProfCallStackArray(
   llvm::DenseMap<memprof::CallStackId, memprof::LinearCallStackId>
       MemProfCallStackIndexes;
 
-  MemProfCallStackIndexes.reserve(MemProfCallStackData.size());
-  uint64_t CallStackBase = OS.tell();
-  for (const auto &[CSId, CallStack] : MemProfCallStackData) {
-    memprof::LinearCallStackId CallStackIndex =
-        (OS.tell() - CallStackBase) / sizeof(memprof::LinearCallStackId);
-    MemProfCallStackIndexes.insert({CSId, CallStackIndex});
-    const llvm::SmallVector<memprof::FrameId> CS = CallStack;
-    OS.write32(CS.size());
-    for (const auto F : CS) {
-      assert(MemProfFrameIndexes.contains(F));
-      OS.write32(MemProfFrameIndexes[F]);
-    }
-  }
+  memprof::CallStackRadixTreeBuilder Builder;
+  Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes);
+  for (auto I : Builder.getRadixArray())
+    OS.write32(I);
+  MemProfCallStackIndexes = Builder.takeCallStackPos();
 
   // Release the memory of this vector as it is no longer needed.
   MemProfCallStackData.clear();
@@ -695,8 +687,8 @@ static Error writeMemProfV2(ProfOStream &OS,
 // uint64_t Schema entry 1
 // ....
 // uint64_t Schema entry N - 1
-// OnDiskChainedHashTable MemProfFrameData
-// OnDiskChainedHashTable MemProfCallStackData
+// Frames serialized one after another
+// Call stacks encoded as a radix tree
 // OnDiskChainedHashTable MemProfRecordData
 static Error writeMemProfV3(ProfOStream &OS,
                             memprof::IndexedMemProfData &MemProfData,

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a couple of suggestions

@@ -1028,6 +1036,10 @@ class CallStackRadixTreeBuilder {
getCallStackPos() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove getCallStackPos and replace its users with takeCallStackPos? Looks like existing callers are all unit tests where we could do this.

support::endian::readNext<LinearFrameId, llvm::endianness::little>(
Ptr);
support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
// Follow a pointer to the parent, if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe point to the comment block further down that describes the radix tree format.

Expand a comment to point to CallStackRadixTreeBuilder.
@kazutakahirata kazutakahirata merged commit c348e26 into llvm:main Jun 7, 2024
7 checks passed
@kazutakahirata kazutakahirata deleted the memprof_v3_use_radix_tree branch June 7, 2024 14:19
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants