Skip to content

[BOLT][heatmap] Produce zoomed-out heatmaps #140153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 30, 2025

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented May 15, 2025

Add a capability to produce multiple heatmaps with given bucket sizes.

The default heatmap block size (64B) could be too fine-grained for
large binaries. Extend the option block-size to accept a list of
bucket sizes for additional heatmaps with coarser granularity. The
heatmap is simply rescaled so provided sizes should be multiples of
each other. Human-readable suffixes can be used, e.g. 4K, 16kb, 1MiB.

New defaults: 64B (base bucket size), 4KB (default page size),
256KB (for large binaries).

Test Plan: updated heatmap-preagg.test

Created using spr 1.3.4
@llvmbot llvmbot added the BOLT label May 15, 2025
@aaupov aaupov changed the title [BOLT][heatmap] Produce zoomed-out heatmap [BOLT][heatmap] Produce zoomed-out heatmaps May 15, 2025
@llvmbot
Copy link
Member

llvmbot commented May 15, 2025

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Add an option --heatmap-zoom-out=bucket_size1,bucket_size2,... to
print additional heatmaps with coarser granularity. This makes it easier
to navigate the heatmap, compared to the default setting of 64B buckets
that could be too fine-grained.

The option rescales an existing heatmap, so the provided bucket sizes
should be multiples of the original bucket size (--block-size), and be
provided in ascending order. If rescaling is impossible, no heatmap is
produced.

Suggested values to use: 4096 (default page size), 16384 (16k page),
1048576 (1MB for XL workloads).

Test Plan: updated heatmap-preagg.test


Full diff: https://github.com/llvm/llvm-project/pull/140153.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Profile/Heatmap.h (+3)
  • (modified) bolt/lib/Profile/DataAggregator.cpp (+15)
  • (modified) bolt/lib/Profile/Heatmap.cpp (+14-1)
  • (modified) bolt/test/X86/heatmap-preagg.test (+13-1)
diff --git a/bolt/include/bolt/Profile/Heatmap.h b/bolt/include/bolt/Profile/Heatmap.h
index 9813e7fed486d..bf3d1c91c0aa5 100644
--- a/bolt/include/bolt/Profile/Heatmap.h
+++ b/bolt/include/bolt/Profile/Heatmap.h
@@ -85,6 +85,9 @@ class Heatmap {
   void printSectionHotness(raw_ostream &OS) const;
 
   size_t size() const { return Map.size(); }
+
+  /// Increase bucket size to \p TargetSize, recomputing the heatmap.
+  bool resizeBucket(uint64_t TargetSize);
 };
 
 } // namespace bolt
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 6beb60741406e..aa681e633c0d8 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -68,6 +68,12 @@ FilterPID("pid",
   cl::Optional,
   cl::cat(AggregatorCategory));
 
+static cl::list<uint64_t>
+    HeatmapZoomOut("heatmap-zoom-out", cl::CommaSeparated,
+                   cl::desc("print secondary heatmaps with given bucket sizes"),
+                   cl::value_desc("bucket_size"), cl::Optional,
+                   cl::cat(HeatmapCategory));
+
 static cl::opt<bool>
 IgnoreBuildID("ignore-build-id",
   cl::desc("continue even if build-ids in input binary and perf.data mismatch"),
@@ -1365,6 +1371,15 @@ std::error_code DataAggregator::printLBRHeatMap() {
     HM.printCDF(opts::HeatmapOutput + ".csv");
     HM.printSectionHotness(opts::HeatmapOutput + "-section-hotness.csv");
   }
+  // Provide coarse-grained heatmap if requested via --heatmap-zoom-out
+  for (const uint64_t NewBucketSize : opts::HeatmapZoomOut) {
+    if (!HM.resizeBucket(NewBucketSize))
+      break;
+    if (opts::HeatmapOutput == "-")
+      HM.print(opts::HeatmapOutput);
+    else
+      HM.print(formatv("{0}-{1}", opts::HeatmapOutput, NewBucketSize).str());
+  }
 
   return std::error_code();
 }
diff --git a/bolt/lib/Profile/Heatmap.cpp b/bolt/lib/Profile/Heatmap.cpp
index c66c2e5487613..4aaf6dc344a85 100644
--- a/bolt/lib/Profile/Heatmap.cpp
+++ b/bolt/lib/Profile/Heatmap.cpp
@@ -81,7 +81,7 @@ void Heatmap::print(raw_ostream &OS) const {
   // the Address.
   auto startLine = [&](uint64_t Address, bool Empty = false) {
     changeColor(DefaultColor);
-    const uint64_t LineAddress = Address / BytesPerLine * BytesPerLine;
+    const uint64_t LineAddress = alignTo(Address, BytesPerLine);
 
     if (MaxAddress > 0xffffffff)
       OS << format("0x%016" PRIx64 ": ", LineAddress);
@@ -364,5 +364,18 @@ void Heatmap::printSectionHotness(raw_ostream &OS) const {
     OS << formatv("[unmapped], 0x0, 0x0, {0:f4}, 0, 0\n",
                   100.0 * UnmappedHotness / NumTotalCounts);
 }
+
+bool Heatmap::resizeBucket(uint64_t TargetSize) {
+  if (TargetSize <= BucketSize)
+    return false;
+  std::map<uint64_t, uint64_t> NewMap;
+  for (const auto [Bucket, Count] : Map) {
+    const uint64_t Address = Bucket * BucketSize;
+    NewMap[Address / TargetSize] += Count;
+  }
+  Map = NewMap;
+  BucketSize = TargetSize;
+  return true;
+}
 } // namespace bolt
 } // namespace llvm
diff --git a/bolt/test/X86/heatmap-preagg.test b/bolt/test/X86/heatmap-preagg.test
index 306e74800a353..9539269ff0d47 100644
--- a/bolt/test/X86/heatmap-preagg.test
+++ b/bolt/test/X86/heatmap-preagg.test
@@ -3,8 +3,11 @@
 RUN: yaml2obj %p/Inputs/blarge_new.yaml &> %t.exe
 ## Non-BOLTed input binary
 RUN: llvm-bolt-heatmap %t.exe -o %t --pa -p %p/Inputs/blarge_new.preagg.txt \
-RUN:   2>&1 | FileCheck --check-prefix CHECK-HEATMAP %s
+RUN:   --heatmap-zoom-out 128,1024 2>&1 | FileCheck --check-prefix CHECK-HEATMAP %s
 RUN: FileCheck %s --check-prefix CHECK-SEC-HOT --input-file %t-section-hotness.csv
+RUN: FileCheck %s --check-prefix CHECK-HM-64 --input-file %t
+RUN: FileCheck %s --check-prefix CHECK-HM-128 --input-file %t-128
+RUN: FileCheck %s --check-prefix CHECK-HM-1024 --input-file %t-1024
 
 ## BOLTed input binary
 RUN: llvm-bolt %t.exe -o %t.out --pa -p %p/Inputs/blarge_new.preagg.txt \
@@ -24,6 +27,15 @@ CHECK-SEC-HOT-NEXT: .plt, 0x401020, 0x4010b0, 4.7583, 66.6667, 0.0317
 CHECK-SEC-HOT-NEXT: .text, 0x4010b0, 0x401c25, 78.3872, 85.1064, 0.6671
 CHECK-SEC-HOT-NEXT: .fini, 0x401c28, 0x401c35, 0.0000, 0.0000, 0.0000
 
+# Only check start addresses – can't check colors, and FileCheck doesn't strip
+# color codes by default. Reference output:
+# HM-64:   0x00404000: ABBcccccccccccccccCCCCCCCCCccccCCCCCCCCcc....CC
+# HM-128:  0x00408000: ABCCCCCCCCCCCCCCCCCCc.CC
+# HM-1024: 0x00440000: ACC
+CHECK-HM-64:   0x00404000:
+CHECK-HM-128:  0x00408000:
+CHECK-HM-1024: 0x00440000:
+
 CHECK-HEATMAP-BAT: PERF2BOLT: read 79 aggregated LBR entries
 CHECK-HEATMAP-BAT: HEATMAP: invalid traces: 2
 

aaupov added 2 commits May 15, 2025 15:14
Created using spr 1.3.4
Created using spr 1.3.4
Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great option! It may be worth listing this in the Heatmaps.md, alongside the suggested usage.

Suggested values to use: 4096 (default page size), 16384 (16k page),
1048576 (1MB for XL workloads).

aaupov added 2 commits May 16, 2025 11:09
Created using spr 1.3.4
Created using spr 1.3.4
@maksfb
Copy link
Contributor

maksfb commented May 23, 2025

Instead of introducing the new option, can we reuse --block-size= and make it accept multiple values or a different format?

If rescaling is impossible, no heatmap is produced.

Without a printed warning? I understand that the size limitation is driven by implementation, but at the very least we should report that there is no expected output.

Alternatively, use a different option format that encapsulates the limitations. E.g., --block-size=<initial_size>{:<scale>:<count>} or --block-size=<initial_size>{:<scale1>,<scale2>...}.

@aaupov
Copy link
Contributor Author

aaupov commented May 23, 2025

Instead of introducing the new option, can we reuse --block-size= and make it accept multiple values or a different format?

If rescaling is impossible, no heatmap is produced.

Without a printed warning? I understand that the size limitation is driven by implementation, but at the very least we should report that there is no expected output.

Alternatively, use a different option format that encapsulates the limitations. E.g., --block-size=<initial_size>{:<scale>:<count>} or --block-size=<initial_size>{:<scale1>,<scale2>...}.

Good call. I prefer the latter approach with explicit scales.

aaupov added 2 commits May 23, 2025 15:36
Created using spr 1.3.4
Created using spr 1.3.4
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me.

A couple comments regarding the interface:

  • If we are producing multiple files by default, we might have to change docs in more places.
  • I find the power-of-two limitation for the scale unnecessary.
  • The variance in default scale steps looks a bit unnatural (64, 4, 64 again). I would keep just 64 and 64 making the default sizes 64 bytes, 4 KB, 256 KB.

@aaupov
Copy link
Contributor Author

aaupov commented May 27, 2025

@maksfb:

  • If we are producing multiple files by default, we might have to change docs in more places.

Makes sense. I'll update heatmap doc, anything else?

  • I find the power-of-two limitation for the scale unnecessary.

This made it impossible to enter invalid scales. While working on a custom parser, I realized that a better UX would be to allow specifying bucket sizes in natural units (e.g. 4KB, 1MB):

--block-size=default_size[,size1,...]

If you agree, I'll switch to that and print warnings in case of invalid sizes/rescales.

  • The variance in default scale steps looks a bit unnatural (64, 4, 64 again). I would keep just 64 and 64 making the default sizes 64 bytes, 4 KB, 256 KB.

It's the downstream of bucket size selection. 1MB works really well for XL binaries, to have a high-level view of the whole address space. I'd like to keep 1MB for that reason. We can add 256KB for sure.

LMK if that sounds good.

@maksfb
Copy link
Contributor

maksfb commented May 28, 2025

@maksfb:

  • If we are producing multiple files by default, we might have to change docs in more places.

Makes sense. I'll update heatmap doc, anything else?

llvm-bolt-heatmap usage message if possible.

  • I find the power-of-two limitation for the scale unnecessary.

This made it impossible to enter invalid scales. While working on a custom parser, I realized that a better UX would be to allow specifying bucket sizes in natural units (e.g. 4KB, 1MB):

--block-size=default_size[,size1,...]

If you agree, I'll switch to that and print warnings in case of invalid sizes/rescales.

Okay. Then the burden falls on the proper UI messaging and error/warning diagnostics.

  • The variance in default scale steps looks a bit unnatural (64, 4, 64 again). I would keep just 64 and 64 making the default sizes 64 bytes, 4 KB, 256 KB.

It's the downstream of bucket size selection. 1MB works really well for XL binaries, to have a high-level view of the whole address space. I'd like to keep 1MB for that reason. We can add 256KB for sure.

LMK if that sounds good.

If 256KB is difficult to read for large binaries, 1MB works for me.

aaupov added 4 commits May 30, 2025 10:38
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
@aaupov
Copy link
Contributor Author

aaupov commented May 30, 2025

@maksfb:

  • If we are producing multiple files by default, we might have to change docs in more places.

Makes sense. I'll update heatmap doc, anything else?

llvm-bolt-heatmap usage message if possible.

Added usage message:

OVERVIEW:  BOLT Code Heatmap tool

  Produces code heatmaps using sampled profile

  Inputs:
  - Binary (supports BOLT-optimized binaries),
  - Sampled profile collected from the binary:
    - perf data or pre-aggregated profile data (instrumentation profile not supported)
    - perf data can have basic (IP) or branch-stack (LBR) samples

  Outputs:
  - Heatmaps: colored ASCII (requires a color-capable terminal or a conversion tool like `aha`)
    Multiple heatmaps are produced by default with different granularities (set by `block-size` option)
  - Section hotness: per-section samples% and utilization%
  - Cumulative distribution: working set size corresponding to a given percentile of samples
  • I find the power-of-two limitation for the scale unnecessary.

This made it impossible to enter invalid scales. While working on a custom parser, I realized that a better UX would be to allow specifying bucket sizes in natural units (e.g. 4KB, 1MB):

--block-size=default_size[,size1,...]

If you agree, I'll switch to that and print warnings in case of invalid sizes/rescales.

Okay. Then the burden falls on the proper UI messaging and error/warning diagnostics.

Added check for sorted values. Not checking the provided values though.

  • The variance in default scale steps looks a bit unnatural (64, 4, 64 again). I would keep just 64 and 64 making the default sizes 64 bytes, 4 KB, 256 KB.

It's the downstream of bucket size selection. 1MB works really well for XL binaries, to have a high-level view of the whole address space. I'd like to keep 1MB for that reason. We can add 256KB for sure.
LMK if that sounds good.

If 256KB is difficult to read for large binaries, 1MB works for me.

256KB works well for large binaries.

Created using spr 1.3.4
Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments.

@aaupov aaupov merged commit 5047a33 into main May 30, 2025
8 of 9 checks passed
@aaupov aaupov deleted the users/aaupov/spr/boltheatmap-produce-zoomed-out-heatmap branch May 30, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants