Skip to content

[DWARF] Refactor .debug_names bucket count computation #88087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Apr 9, 2024

getDebugNamesBucketAndHashCount lures users to provide an array to
compute the bucket count using an O(n log n) sort. This is inefficient
as hash table based uniquifying is faster.

The performance issue matters less for Clang as the number of names is
relatively small. For ld.lld --debug-names, I plan to compute the
unique hash count as a side product of parallel entry pool computation,
and I just need a function to suggest a bucket count.

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Apr 9, 2024

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-llvm-binary-utilities

Author: Fangrui Song (MaskRay)

Changes

getDebugNamesBucketAndHashCount lures users to provide an array to
compute the bucket count using an O(n log n) sort. This is inefficient
as hash table based uniquifying is faster.

The performance issue matters less for Clang as the number of names is
relatively small. For ld.lld --debug-names, I plan to compute the
unique hash count as a side product of parallel entry pool computation,
and I just need a function to suggest a bucket count.


Full diff: https://github.com/llvm/llvm-project/pull/88087.diff

2 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.h (+6-16)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp (+3-4)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index a53e79bf6e39c1..298700c8941eec 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -613,23 +613,13 @@ enum AcceleratorTable {
   DW_hash_function_djb = 0u
 };
 
-// Uniquify the string hashes and calculate the bucket count for the
-// DWARF v5 Accelerator Table. NOTE: This function effectively consumes the
-// 'Hashes' input parameter.
-inline std::pair<uint32_t, uint32_t>
-getDebugNamesBucketAndHashCount(MutableArrayRef<uint32_t> Hashes) {
-  uint32_t BucketCount = 0;
-
-  sort(Hashes);
-  uint32_t UniqueHashCount = llvm::unique(Hashes) - Hashes.begin();
+// Return a suggested bucket count for the DWARF v5 Accelerator Table.
+inline uint32_t getDebugNamesBucketCount(uint32_t UniqueHashCount) {
   if (UniqueHashCount > 1024)
-    BucketCount = UniqueHashCount / 4;
-  else if (UniqueHashCount > 16)
-    BucketCount = UniqueHashCount / 2;
-  else
-    BucketCount = std::max<uint32_t>(UniqueHashCount, 1);
-
-  return {BucketCount, UniqueHashCount};
+    return UniqueHashCount / 4;
+  if (UniqueHashCount > 16)
+    return UniqueHashCount / 2;
+  return std::max<uint32_t>(UniqueHashCount, 1);
 }
 
 // Constants for the GNU pubnames/pubtypes extensions supporting gdb index.
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 2e8e7d0a88af03..5b679fd3b9f92b 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -32,14 +32,13 @@
 using namespace llvm;
 
 void AccelTableBase::computeBucketCount() {
-  // First get the number of unique hashes.
   SmallVector<uint32_t, 0> Uniques;
   Uniques.reserve(Entries.size());
   for (const auto &E : Entries)
     Uniques.push_back(E.second.HashValue);
-
-  std::tie(BucketCount, UniqueHashCount) =
-      llvm::dwarf::getDebugNamesBucketAndHashCount(Uniques);
+  llvm::sort(Uniques);
+  UniqueHashCount = llvm::unique(Uniques) - Uniques.begin();
+  BucketCount = dwarf::getDebugNamesBucketCount(UniqueHashCount);
 }
 
 void AccelTableBase::finalize(AsmPrinter *Asm, StringRef Prefix) {

Copy link
Contributor

@felipepiovezan felipepiovezan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, especially because we're getting rid of sorting an input array, instead sorting it where it was created.

@MaskRay MaskRay merged commit d3016aa into main Apr 9, 2024
@MaskRay MaskRay deleted the users/MaskRay/spr/dwarf-refactor-debug_names-bucket-count-computation branch April 9, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants