-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ThinLTO][Bitcode] Generate import type in bitcode #87600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eSummary::GVFlags - This change doesn't set the bit in the summaries, just make the code changes.
The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls [1] when importing the function definition costs too much compile time (e.g., the function is too large has no `noinline` attribute). 1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions [2] 2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by `<ImportModule, Function>`. This change encodes the def/decl state in `GlobalValueSummary::GVFlags`. In the subsequent changes 1. The indexing step `computeImportForModule` [3] will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer. 2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in #87600 3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype [4]) into the destination module. - The next change is #87600 [1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L1608-L1764 [3] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L856 [4] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Linker/IRMover.cpp#L605
@llvm/pr-subscribers-lto @llvm/pr-subscribers-llvm-ir Author: Mingming Liu (minglotus-6) ChangesGiven a list of modules and a function When writing the per-module combined summary, bitcode writer looks up a map to write to the Tests:
This is a follow-up patch of #87597. Full diff: https://github.com/llvm/llvm-project/pull/87600.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Bitcode/BitcodeWriter.h b/llvm/include/llvm/Bitcode/BitcodeWriter.h
index 248d33f4502ef0..e94b074d9788e6 100644
--- a/llvm/include/llvm/Bitcode/BitcodeWriter.h
+++ b/llvm/include/llvm/Bitcode/BitcodeWriter.h
@@ -102,7 +102,8 @@ class raw_ostream;
void writeIndex(
const ModuleSummaryIndex *Index,
- const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex);
+ const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries);
};
/// Write the specified module to the specified raw output stream.
@@ -147,10 +148,14 @@ class raw_ostream;
/// where it will be written in a new bitcode block. This is used when
/// writing the combined index file for ThinLTO. When writing a subset of the
/// index for a distributed backend, provide the \p ModuleToSummariesForIndex
- /// map.
- void writeIndexToFile(const ModuleSummaryIndex &Index, raw_ostream &Out,
- const std::map<std::string, GVSummaryMapTy>
- *ModuleToSummariesForIndex = nullptr);
+ /// map. For each module, \p ModuleToDecSummaries specifies the set of
+ /// summaries for which the corresponding value should be imported as a
+ /// declaration (prototype).
+ void writeIndexToFile(
+ const ModuleSummaryIndex &Index, raw_ostream &Out,
+ const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex =
+ nullptr,
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr);
/// If EmbedBitcode is set, save a copy of the llvm IR as data in the
/// __LLVM,__bitcode section (.llvmbc on non-MacOS).
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 5d137d4b3553cf..66209b8cf3ea2d 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -1272,6 +1272,12 @@ using ModulePathStringTableTy = StringMap<ModuleHash>;
/// a particular module, and provide efficient access to their summary.
using GVSummaryMapTy = DenseMap<GlobalValue::GUID, GlobalValueSummary *>;
+/// A set of global value summary pointers.
+using GVSummaryPtrSet = SmallPtrSet<GlobalValueSummary *, 4>;
+
+/// The key is module path, and value is a set of global value summary pointers.
+using ModuleToGVSummaryPtrSet = std::map<std::string, GVSummaryPtrSet>;
+
/// Map of a type GUID to type id string and summary (multimap used
/// in case of GUID conflicts).
using TypeIdSummaryMapTy =
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index 6d01e3b4d82189..a060265a9bc8dd 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -428,6 +428,10 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
/// The combined index to write to bitcode.
const ModuleSummaryIndex &Index;
+ /// For each module, provides the set of global value summaries for which the
+ /// value (function, function alias, etc) should be imported as a declaration.
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr;
+
/// When writing a subset of the index for distributed backends, client
/// provides a map of modules to the corresponding GUIDs/summaries to write.
const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex;
@@ -452,11 +456,17 @@ class IndexBitcodeWriter : public BitcodeWriterBase {
/// Constructs a IndexBitcodeWriter object for the given combined index,
/// writing to the provided \p Buffer. When writing a subset of the index
/// for a distributed backend, provide a \p ModuleToSummariesForIndex map.
- IndexBitcodeWriter(BitstreamWriter &Stream, StringTableBuilder &StrtabBuilder,
- const ModuleSummaryIndex &Index,
- const std::map<std::string, GVSummaryMapTy>
- *ModuleToSummariesForIndex = nullptr)
+ /// If provided, \p ModuleToDecSummaries specifies the set of summaries for
+ /// which the corresponding functions or aliased functions should be imported
+ /// as a declaration (but not definition) for each module.
+ IndexBitcodeWriter(
+ BitstreamWriter &Stream, StringTableBuilder &StrtabBuilder,
+ const ModuleSummaryIndex &Index,
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries = nullptr,
+ const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex =
+ nullptr)
: BitcodeWriterBase(Stream, StrtabBuilder), Index(Index),
+ ModuleToDecSummaries(ModuleToDecSummaries),
ModuleToSummariesForIndex(ModuleToSummariesForIndex) {
// Assign unique value ids to all summaries to be written, for use
// in writing out the call graph edges. Save the mapping from GUID
@@ -1202,7 +1212,8 @@ static uint64_t getEncodedFFlags(FunctionSummary::FFlags Flags) {
// Decode the flags for GlobalValue in the summary. See getDecodedGVSummaryFlags
// in BitcodeReader.cpp.
-static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
+static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags,
+ bool ImportAsDecl = false) {
uint64_t RawFlags = 0;
RawFlags |= Flags.NotEligibleToImport; // bool
@@ -1217,7 +1228,8 @@ static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
RawFlags |= (Flags.Visibility << 8); // 2 bits
- RawFlags |= (Flags.ImportType << 10); // 1 bit
+ unsigned ImportType = Flags.ImportType | ImportAsDecl;
+ RawFlags |= (ImportType << 10); // 1 bit
return RawFlags;
}
@@ -4543,6 +4555,17 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
Abbv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8));
unsigned AllocAbbrev = Stream.EmitAbbrev(std::move(Abbv));
+ auto shouldImportValueAsDecl = [&](GlobalValueSummary *GVS) -> bool {
+ if (ModuleToDecSummaries == nullptr)
+ return false;
+ auto Iter = ModuleToDecSummaries->find(GVS->modulePath().str());
+ if (Iter == ModuleToDecSummaries->end())
+ return false;
+ // For the current module, the value for GVS should be imported as a
+ // declaration.
+ return Iter->second.contains(GVS);
+ };
+
// The aliases are emitted as a post-pass, and will point to the value
// id of the aliasee. Save them in a vector for post-processing.
SmallVector<AliasSummary *, 64> Aliases;
@@ -4653,7 +4676,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
NameVals.push_back(*ValueId);
assert(ModuleIdMap.count(FS->modulePath()));
NameVals.push_back(ModuleIdMap[FS->modulePath()]);
- NameVals.push_back(getEncodedGVSummaryFlags(FS->flags()));
+ NameVals.push_back(
+ getEncodedGVSummaryFlags(FS->flags(), shouldImportValueAsDecl(FS)));
NameVals.push_back(FS->instCount());
NameVals.push_back(getEncodedFFlags(FS->fflags()));
NameVals.push_back(FS->entryCount());
@@ -4702,7 +4726,8 @@ void IndexBitcodeWriter::writeCombinedGlobalValueSummary() {
NameVals.push_back(AliasValueId);
assert(ModuleIdMap.count(AS->modulePath()));
NameVals.push_back(ModuleIdMap[AS->modulePath()]);
- NameVals.push_back(getEncodedGVSummaryFlags(AS->flags()));
+ NameVals.push_back(
+ getEncodedGVSummaryFlags(AS->flags(), shouldImportValueAsDecl(AS)));
auto AliaseeValueId = SummaryToValueIdMap[&AS->getAliasee()];
assert(AliaseeValueId);
NameVals.push_back(AliaseeValueId);
@@ -5036,8 +5061,10 @@ void BitcodeWriter::writeModule(const Module &M,
void BitcodeWriter::writeIndex(
const ModuleSummaryIndex *Index,
- const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex) {
+ const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries) {
IndexBitcodeWriter IndexWriter(*Stream, StrtabBuilder, *Index,
+ ModuleToDecSummaries,
ModuleToSummariesForIndex);
IndexWriter.write();
}
@@ -5090,12 +5117,13 @@ void IndexBitcodeWriter::write() {
// index for a distributed backend, provide a \p ModuleToSummariesForIndex map.
void llvm::writeIndexToFile(
const ModuleSummaryIndex &Index, raw_ostream &Out,
- const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex) {
+ const std::map<std::string, GVSummaryMapTy> *ModuleToSummariesForIndex,
+ const ModuleToGVSummaryPtrSet *ModuleToDecSummaries) {
SmallVector<char, 0> Buffer;
Buffer.reserve(256 * 1024);
BitcodeWriter Writer(Buffer);
- Writer.writeIndex(&Index, ModuleToSummariesForIndex);
+ Writer.writeIndex(&Index, ModuleToSummariesForIndex, ModuleToDecSummaries);
Writer.writeStrtab();
Out.write((char *)&Buffer.front(), Buffer.size());
|
Tagging @jvoung for review since I cannot add you, thanks! |
/// A set of global value summary pointers. | ||
using GVSummaryPtrSet = SmallPtrSet<GlobalValueSummary *, 4>; | ||
|
||
/// The key is module path, and value is a set of global value summary pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big issue, but was wondering if the module path is needed or if a set instead of map would work? It seemed like each GlobalValueSummary* would only be imported from one module for declaration. Or is it possible that for a particular GVS_i ?
mod_A -> GVS_i
mod_B -> GVS_i
Otherwise, the map does match the gatherImportedSummariesForModule more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think it makes sense to use a set of pointers as suggested, given ModuleToGVSummaryPtrSet
is computed using pointers as opposed to GUIDs. Going to make the change.
While it's possible to have {mod_A, GUID_foo} and {mod_B, GUID_foo} entries because of comdats, each GlobalValueSummary*
will be associated with exactly one module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Going to update the subsequent patch after updating this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, thanks for the clarification!
@@ -428,6 +428,10 @@ class IndexBitcodeWriter : public BitcodeWriterBase { | |||
/// The combined index to write to bitcode. | |||
const ModuleSummaryIndex &Index; | |||
|
|||
/// For each module, provides the set of global value summaries for which the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe indicate "When writing a subset of the index" also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment to "When writing combined summaries, provides the set of global value summaries...".
While we write a subset of the index only for combined summaries, 'combined summary' is slightly preferred as it hints this map doesn't apply for per-module summary or the index.
…hinLTO under a default-off new option (#88024) The goal is to populate `declaration` import status if a new flag`-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases (better call-graph sort and cross-module auto-init) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, #87600 will do this. [1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] #87597 (comment)
…ibuted ThinLTO under a default-off new option" (#92718) The original PR is reviewed in #88024, and this PR adds one line (b9f04d1) to fix test Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log. - This should fix build bot failure like https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and https://lab.llvm.org/buildbot/#/builders/9/builds/43876 - I could repro the failure and see interleaved log messages by using `-thinlto-threads=all` **Original Commit Message:** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](#87597 (comment))) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, #87600 will do this.
1. Set 'import-instr-evolution-factor' to 1.0 (default 0.7) to make sure import limit remains 7 (otherwise import limit decays through each edge) 2. Updated bar* function names 3. Explain why aliasee is null for bar* functions
This reverts commit 6262763.
…ibuted ThinLTO under a default-off new option" (#95482) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of #92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 [2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849 [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526 **Original commit message** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](#87597 (comment))) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, #87600 will do this.
#87600 was reverted in order to revert 6262763. Now #95482 is fix forward for 6262763. This patch is a reland for #87600 **Changes on top of original patch** In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] **Original commit message** For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
…ibuted ThinLTO under a default-off new option" (llvm#95482) Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`. Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This is a reland of llvm#92718 * `DenseMap` allocates space for a large number of key/value pairs and wastes space when the number of elements are small. * While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2] when the number of elements is small (for example, 3 or 4 elements). The programmer manual [3] also mentions it could waste space. * Experiments show `FunctionsToImportTy.size()` is smaller than 4 for multiple binaries with high indexing ram usage. `unordered_map` grows factor is at most 2 in llvm libc [4] for insert operations. With this change, `ComputeCrossModuleImport` ram increase is smaller than 0.5G on a couple of binaries with high indexing ram usage. A wider range of (pre-release) tests pass. [1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 [2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849 [3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h [4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526 **Original commit message** The goal is to populate `declaration` import status if a new flag `-import-declaration` is on. * For in-process ThinLTO, the `declaration` status is visible to backend `function-import` pass, so `FunctionImporter::importFunctions` should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle `declaration` summaries. Two use cases ([better call-graph sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5) or [cross-module auto-init](llvm#87597 (comment))) would use this bit differently. * For distributed ThinLTO, the `declaration` status is not serialized to bitcode. As discussed, llvm#87600 will do this.
llvm#87600 was reverted in order to revert llvm@6262763. Now llvm#95482 is fix forward for llvm@6262763. This patch is a reland for llvm#87600 **Changes on top of original patch** In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] **Original commit message** For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] https://github.com/llvm/llvm-project/blob/393eff4e02e7ab3d234d246a8d6912c8e745e6f9/llvm/lib/Support/SmallPtrSet.cpp#L43
For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization.
This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly.