Skip to content

[ThinLTO][Bitcode] Generate import type in bitcode #87600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 22, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Apr 4, 2024

For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly.

…eSummary::GVFlags

- This change doesn't set the bit in the summaries, just make the code changes.
Base automatically changed from users/minglotus-6/spr/summary to main April 11, 2024 02:46
mingmingl-llvm added a commit that referenced this pull request Apr 11, 2024
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.

This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.

In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
#87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.

- The next change is #87600

[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L1608-L1764
[3] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L856
[4] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Linker/IRMover.cpp#L605
@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO]Generate import type in bitcode writer [ThinLTO]Generate import type in postlink combined summary Apr 16, 2024
@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO]Generate import type in postlink combined summary [ThinLTO]Generate import type in per-module combined summary Apr 16, 2024
@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO]Generate import type in per-module combined summary [ThinLTO][BitcodeWriter] Write import type in per-module combined summary Apr 16, 2024
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review May 7, 2024 02:00
@llvmbot llvmbot added the llvm:ir label May 7, 2024
@llvmbot
Copy link
Member

llvmbot commented May 7, 2024

@llvm/pr-subscribers-lto
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Mingming Liu (minglotus-6)

Changes

Given a list of modules and a function f, there are multiple combinations of imports (each module may need f as definition, declaration, or doesn't reference it), and GVFlags.ImportType should be a function of &lt;module, value-summary&gt;.

When writing the per-module combined summary, bitcode writer looks up a map to write to the ImportType bit in the given module.

Tests:

This is a follow-up patch of #87597.


Full diff: https://github.com/llvm/llvm-project/pull/87600.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Bitcode/BitcodeWriter.h (+10-5)
  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+6)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+39-11)
diff --git a/llvm/include/llvm/Bitcode/BitcodeWriter.h b/llvm/include/llvm/Bitcode/BitcodeWriter.h
index 248d33f4502ef0..e94b074d9788e6 100644
--- a/llvm/include/llvm/Bitcode/BitcodeWriter.h
+++ b/llvm/include/llvm/Bitcode/BitcodeWriter.h
@@ -102,7 +102,8 @@ class raw_ostream;
 
     void writeIndex(
         const ModuleSummaryIndex *Index,
-        const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex);
+        const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+        const ModuleToGVSummaryPtrSet *ModuleToDecSummaries);
   };
 
   /// Write the specified module to the specified raw output stream.
@@ -147,10 +148,14 @@ class raw_ostream;
   /// where it will be written in a new bitcode block. This is used when
   /// writing the combined index file for ThinLTO. When writing a subset of the
   /// index for a distributed backend, provide the \p ModuleToSummariesForIndex
-  /// map.
-  void writeIndexToFile(const ModuleSummaryIndex &Index, raw_ostream &Out,
-                        const std::map<std::string, GVSummaryMapTy>
-                            *ModuleToSummariesForIndex = nullptr);
+  /// map. For each module, \p ModuleToDecSummaries specifies the set of
+  /// summaries for which the corresponding value should be imported as a
+  /// declaration (prototype).
+  void writeIndexToFile(
+      const ModuleSummaryIndex &Index, raw_ostream &Out,
+      const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex =
+          nullptr,
+      const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr);
 
   /// If EmbedBitcode is set, save a copy of the llvm IR as data in the
   ///  __LLVM,__bitcode section (.llvmbc on non-MacOS).
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 5d137d4b3553cf..66209b8cf3ea2d 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -1272,6 +1272,12 @@ using ModulePathStringTableTy = StringMap<ModuleHash>;
 /// a particular module, and provide efficient access to their summary.
 using GVSummaryMapTy = DenseMap<GlobalValue::GUID, GlobalValueSummary *>;
 
+/// A set of global value summary pointers.
+using GVSummaryPtrSet = SmallPtrSet<GlobalValueSummary *, 4>;
+
+/// The key is module path, and value is a set of global value summary pointers.
+using ModuleToGVSummaryPtrSet = std::map<std::string, GVSummaryPtrSet>;
+
 /// Map of a type GUID to type id string and summary (multimap used
 /// in case of GUID conflicts).
 using TypeIdSummaryMapTy =
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 6d01e3b4d82189..a060265a9bc8dd 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -428,6 +428,10 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
   /// The combined index to write to bitcode.
   const ModuleSummaryIndex &Index;
 
+  /// For each module, provides the set of global value summaries for which the
+  /// value (function, function alias, etc) should be imported as a declaration.
+  const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr;
+
   /// When writing a subset of the index for distributed backends, client
   /// provides a map of modules to the corresponding GUIDs/summaries to write.
   const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex;
@@ -452,11 +456,17 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
   /// Constructs a IndexBitcodeWriter object for the given combined index,
   /// writing to the provided \p Buffer. When writing a subset of the index
   /// for a distributed backend, provide a \p ModuleToSummariesForIndex map.
-  IndexBitcodeWriter(BitstreamWriter &Stream, StringTableBuilder &StrtabBuilder,
-                     const ModuleSummaryIndex &Index,
-                     const std::map<std::string, GVSummaryMapTy>
-                         *ModuleToSummariesForIndex = nullptr)
+  /// If provided, \p ModuleToDecSummaries specifies the set of summaries for
+  /// which the corresponding functions or aliased functions should be imported
+  /// as a declaration (but not definition) for each module.
+  IndexBitcodeWriter(
+      BitstreamWriter &Stream, StringTableBuilder &StrtabBuilder,
+      const ModuleSummaryIndex &Index,
+      const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr,
+      const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex =
+          nullptr)
       : BitcodeWriterBase(Stream, StrtabBuilder), Index(Index),
+        ModuleToDecSummaries(ModuleToDecSummaries),
         ModuleToSummariesForIndex(ModuleToSummariesForIndex) {
     // Assign unique value ids to all summaries to be written, for use
     // in writing out the call graph edges. Save the mapping from GUID
@@ -1202,7 +1212,8 @@ static uint64_t getEncodedFFlags(FunctionSummary::FFlags Flags) {
 
 // Decode the flags for GlobalValue in the summary. See getDecodedGVSummaryFlags
 // in BitcodeReader.cpp.
-static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
+static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags,
+                                         bool ImportAsDecl = false) {
   uint64_t RawFlags = 0;
 
   RawFlags |= Flags.NotEligibleToImport; // bool
@@ -1217,7 +1228,8 @@ static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
 
   RawFlags |= (Flags.Visibility << 8); // 2 bits
 
-  RawFlags |= (Flags.ImportType << 10); // 1 bit
+  unsigned ImportType = Flags.ImportType | ImportAsDecl;
+  RawFlags |= (ImportType << 10); // 1 bit
 
   return RawFlags;
 }
@@ -4543,6 +4555,17 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
   Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
   unsigned AllocAbbrev = Stream.EmitAbbrev(std::move(Abbv));
 
+  auto shouldImportValueAsDecl = [&](GlobalValueSummary *GVS) -> bool {
+    if (ModuleToDecSummaries == nullptr)
+      return false;
+    auto Iter = ModuleToDecSummaries->find(GVS->modulePath().str());
+    if (Iter == ModuleToDecSummaries->end())
+      return false;
+    // For the current module, the value for GVS should be imported as a
+    // declaration.
+    return Iter->second.contains(GVS);
+  };
+
   // The aliases are emitted as a post-pass, and will point to the value
   // id of the aliasee. Save them in a vector for post-processing.
   SmallVector<AliasSummary *, 64> Aliases;
@@ -4653,7 +4676,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
     NameVals.push_back(*ValueId);
     assert(ModuleIdMap.count(FS->modulePath()));
     NameVals.push_back(ModuleIdMap[FS->modulePath()]);
-    NameVals.push_back(getEncodedGVSummaryFlags(FS->flags()));
+    NameVals.push_back(
+        getEncodedGVSummaryFlags(FS->flags(), shouldImportValueAsDecl(FS)));
     NameVals.push_back(FS->instCount());
     NameVals.push_back(getEncodedFFlags(FS->fflags()));
     NameVals.push_back(FS->entryCount());
@@ -4702,7 +4726,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
     NameVals.push_back(AliasValueId);
     assert(ModuleIdMap.count(AS->modulePath()));
     NameVals.push_back(ModuleIdMap[AS->modulePath()]);
-    NameVals.push_back(getEncodedGVSummaryFlags(AS->flags()));
+    NameVals.push_back(
+        getEncodedGVSummaryFlags(AS->flags(), shouldImportValueAsDecl(AS)));
     auto AliaseeValueId = SummaryToValueIdMap[&AS->getAliasee()];
     assert(AliaseeValueId);
     NameVals.push_back(AliaseeValueId);
@@ -5036,8 +5061,10 @@ void BitcodeWriter::writeModule(const Module &M,
 
 void BitcodeWriter::writeIndex(
     const ModuleSummaryIndex *Index,
-    const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex) {
+    const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+    const ModuleToGVSummaryPtrSet *ModuleToDecSummaries) {
   IndexBitcodeWriter IndexWriter(*Stream, StrtabBuilder, *Index,
+                                 ModuleToDecSummaries,
                                  ModuleToSummariesForIndex);
   IndexWriter.write();
 }
@@ -5090,12 +5117,13 @@ void IndexBitcodeWriter::write() {
 // index for a distributed backend, provide a \p ModuleToSummariesForIndex map.
 void llvm::writeIndexToFile(
     const ModuleSummaryIndex &Index, raw_ostream &Out,
-    const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex) {
+    const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+    const ModuleToGVSummaryPtrSet *ModuleToDecSummaries) {
   SmallVector<char, 0> Buffer;
   Buffer.reserve(256 * 1024);
 
   BitcodeWriter Writer(Buffer);
-  Writer.writeIndex(&Index, ModuleToSummariesForIndex);
+  Writer.writeIndex(&Index, ModuleToSummariesForIndex, ModuleToDecSummaries);
   Writer.writeStrtab();
 
   Out.write((char *)&Buffer.front(), Buffer.size());

@mingmingl-llvm
Copy link
Contributor Author

Tagging @jvoung for review since I cannot add you, thanks!

/// A set of global value summary pointers.
using GVSummaryPtrSet = SmallPtrSet<GlobalValueSummary *, 4>;

/// The key is module path, and value is a set of global value summary pointers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big issue, but was wondering if the module path is needed or if a set instead of map would work? It seemed like each GlobalValueSummary* would only be imported from one module for declaration. Or is it possible that for a particular GVS_i ?

mod_A -> GVS_i
mod_B -> GVS_i

Otherwise, the map does match the gatherImportedSummariesForModule more closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think it makes sense to use a set of pointers as suggested, given ModuleToGVSummaryPtrSet is computed using pointers as opposed to GUIDs. Going to make the change.

While it's possible to have {mod_A, GUID_foo} and {mod_B, GUID_foo} entries because of comdats, each GlobalValueSummary* will be associated with exactly one module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Going to update the subsequent patch after updating this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, thanks for the clarification!

@@ -428,6 +428,10 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
/// The combined index to write to bitcode.
const ModuleSummaryIndex &Index;

/// For each module, provides the set of global value summaries for which the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe indicate "When writing a subset of the index" also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to "When writing combined summaries, provides the set of global value summaries...".

While we write a subset of the index only for combined summaries, 'combined summary' is slightly preferred as it hints this map doesn't apply for per-module summary or the index.

@mingmingl-llvm mingmingl-llvm marked this pull request as draft May 20, 2024 05:07
Base automatically changed from users/minglotus-6/spr/summary2 to main May 20, 2024 05:22
mingmingl-llvm added a commit that referenced this pull request May 20, 2024
…hinLTO under a default-off new option (#88024)

The goal is to populate `declaration` import status if a new flag`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases (better call-graph sort and cross-module auto-init)
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.

[1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] #87597 (comment)
@mingmingl-llvm mingmingl-llvm changed the base branch from main to users/minglotus-6/spr/summary2 May 20, 2024 05:23
@mingmingl-llvm mingmingl-llvm changed the base branch from users/minglotus-6/spr/summary2 to main May 20, 2024 05:24
@mingmingl-llvm mingmingl-llvm changed the base branch from main to users/minglotus-6/spr/summary2 May 20, 2024 05:25
@mingmingl-llvm mingmingl-llvm changed the base branch from users/minglotus-6/spr/summary2 to main May 20, 2024 05:33
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review May 20, 2024 05:34
mingmingl-llvm added a commit that referenced this pull request May 20, 2024
…ibuted ThinLTO under a default-off new option" (#92718)

The original PR is reviewed in
#88024, and this PR adds one
line (b9f04d1)
to fix test

Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log.
- This should fix build bot failure like
https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and
https://lab.llvm.org/buildbot/#/builders/9/builds/43876
- I could repro the failure and see interleaved log messages by using
`-thinlto-threads=all`

**Original Commit Message:**

The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
1. Set 'import-instr-evolution-factor' to 1.0 (default 0.7) to make sure
  import limit remains 7 (otherwise import limit decays through each
  edge)
2. Updated bar* function names
3. Explain why aliasee is null for bar* functions
@mingmingl-llvm mingmingl-llvm merged commit 6262763 into main May 22, 2024
4 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/minglotus-6/spr/summary1 branch May 22, 2024 16:52
mingmingl-llvm added a commit to mingmingl-llvm/llvm-project that referenced this pull request Jun 5, 2024
mingmingl-llvm added a commit that referenced this pull request Jun 5, 2024
…94502)

This reverts commit 6262763, to prepare
for the revert of #92718.


#92718 causes LTO indexing OOM
in some applications.
mingmingl-llvm added a commit that referenced this pull request Jun 20, 2024
…ibuted ThinLTO under a default-off new option" (#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of #92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
mingmingl-llvm added a commit that referenced this pull request Jun 30, 2024
@mingmingl-llvm mingmingl-llvm restored the users/minglotus-6/spr/summary1 branch July 1, 2024 04:36
mingmingl-llvm added a commit that referenced this pull request Jul 9, 2024
#87600 was reverted in order to
revert
6262763.
Now #95482 is fix forward for
6262763.
This patch is a reland for
#87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ibuted ThinLTO under a default-off new option" (llvm#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of llvm#92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](llvm#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, llvm#87600
will do this.
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
llvm#87600 was reverted in order to
revert
llvm@6262763.
Now llvm#95482 is fix forward for
llvm@6262763.
This patch is a reland for
llvm#87600

**Changes on top of original patch**
In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of
`GVSummaryPtrSet` an `unordered_set` which is more memory efficient when
the number of elements is smaller than 128 [1]

**Original commit message**

For distributed ThinLTO, the LTO indexing step generates combined
summary for each module, and postlink pipeline reads the combined
summary which stores the information for link-time optimization.

This patch populates the 'import type' of a summary in bitcode, and
updates bitcode reader to parse the bit correctly.

[1]
https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants