Skip to content

Reland "[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option" #92718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 20, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented May 20, 2024

The original PR is reviewed in #88024, and this PR adds one line to fix test

Limit to one thread for in-process ThinLTO to test LLVM_DEBUG log.

   llvm/test/ThinLTO/X86/import_callee_declaration.ll:89:19: error: IMPORTDUMP-DAG: expected string not found in input
; IMPORTDUMP-DAG: Is importing function definition 13568239288960714650 small_indirect_callee from lib.cc

    45: Starting import for Module main.bc 
dag:89'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         46: Not importing function 11825436545918268459 callee from lib.cc 
dag:89'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         47: Is importing function declaration 14343440786664691134 large_indirect_callee from lib.cc 
dag:89'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dag:89'1     ?                                                                                         possible intended match
         48: Is importing functionStarting import for Module definition 13568239288960714650lib.bc  
dag:89'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         49: small_indirect_callee from lib.cc 

Original Commit Message:

The goal is to populate declaration import status if a new flag -import-declaration is on.

  • For in-process ThinLTO, the declaration status is visible to backend function-import pass, so FunctionImporter::importFunctions should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle declaration summaries. Two use cases (better call-graph sort or cross-module auto-init) would use this bit differently.

  • For distributed ThinLTO, the declaration status is not serialized to bitcode. As discussed, [ThinLTO][Bitcode] Generate import type in bitcode #87600 will do this.

@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review May 20, 2024 06:17
@mingmingl-llvm mingmingl-llvm merged commit e33db24 into main May 20, 2024
7 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/minglotus-6/spr/summary2 branch May 20, 2024 15:55
mingmingl-llvm added a commit that referenced this pull request Jun 5, 2024
…94502)

This reverts commit 6262763, to prepare
for the revert of #92718.


#92718 causes LTO indexing OOM
in some applications.
mingmingl-llvm added a commit to mingmingl-llvm/llvm-project that referenced this pull request Jun 5, 2024
…or distributed ThinLTO under a default-off new option" (llvm#92718)"

This reverts commit e33db24.
mingmingl-llvm added a commit that referenced this pull request Jun 5, 2024
…or distributed ThinLTO under a default-off new option" (#92718) (#94503)

This reverts commit e33db24.

The change from *set to *map increases memory usage, and caused indexing
OOM in some applications. Need to profile offline to bring the memory
usage down.
mingmingl-llvm added a commit that referenced this pull request Jun 20, 2024
…ibuted ThinLTO under a default-off new option" (#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of #92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ibuted ThinLTO under a default-off new option" (llvm#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of llvm#92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](llvm#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, llvm#87600
will do this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant