-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Import small cold functions #82980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import small cold functions #82980
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 95b36e9bc49fc971dbbbc328f77136be7f753b3b with merge df0df906bd16e3d83f3f52a984101dc638c848e3... |
☀️ Try build successful - checks-actions |
Queued df0df906bd16e3d83f3f52a984101dc638c848e3 with parent a4d9624, future comparison URL. |
Finished benchmarking try commit (df0df906bd16e3d83f3f52a984101dc638c848e3): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The perf result look quite promising, with small improvements in instruction counts, wall-time and max-rss. The change also reduced size of librustc_driver.so from 102.5 MiB to 97.1 MiB. The main premise behind the change is that with PGO, we will have cold functions with negative inline cost, and we should continue inlining them. I don't know if this is something we would like to land in this form, but could be interesting from Rust PGO perspective, so |
The Rust code is often written under an assumption that for generic methods inline attribute is mostly unnecessary, since for optimized builds using ThinLTO, a method will be generated in at least one CGU and available for import. For example, deref implementations for Box, Vec, MutexGuard, and MutexGuard are not currently marked as inline, neither is identity implementation of From trait. In PGO builds, when functions are determined to be cold, the default multiplier of zero will stop the import, even for completely trivial functions. Increase slightly the default multiplier from 0 to 0.1 to import them regardless.
95b36e9
to
1aee808
Compare
I'll take a look at this next week. Thanks for the PR, @tmiasko! |
Hm, the numbers look like an improvement overall. However, I'm wondering, in the PGO case, wouldn't the compiler have accurate information about the coldness of a function? In which case it should actually be beneficial not to import cold functions? On the other hand, as you say, a multiplier of I think we can just merge this. It's just a small tweak and does not seem to a negative impact on LLVM times. @bors r+ |
📌 Commit 1aee808 has been approved by |
☀️ Test successful - checks-actions |
Inlining trivial functions often reduces caller code size, and inlining heuristics for cold functions try to avoid inlining otherwise, so yes it should be beneficial even with completely accurate profile information. For example, an error handling code path using
and after:
|
@tmiasko @michaelwoerister Can you talk a bit more about the performance regressions seen here? Some of the regressions are somewhat severe, so it would be nice to talk through them a bit more. Why is the performance regression worth it? |
The main change here is that we allow LLVM to import functions across module boundaries during ThinLTO even if a given function has been marked as Because this PR re-jiggers LLVM inlining decisions it can be expected to have a mixed influence on compile times. Since that effect seems to be positive overall (with the exception of two small synthetic test cases) I think this change is a net win. |
Thanks @michaelwoerister! That makes sense to me. Would it be worth gating this on --release only? |
No, I don't think that would make a difference since all the non-release builds don't run ThinLTO anyway. The change only affects one of the ThinLTO phases. |
Makes sense. OK. |
The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be code generated in at least one
CGU and available for import.
For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.
In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, no matter how trivial the
implementation.
Increase slightly the default multiplier from 0 to 0.1.
r? @ghost