-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add more BOLT optimizations #121902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more BOLT optimizations #121902
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add more BOLT optimizations This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential). As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/ r? ghost
FWIW, we tried the peephole flag several times, and it never showed any performance improvement. I don't think that we have tried the other flags though. |
I chose the flags that we use by asking the BOLT maintainers on Discord which flags would they suggest, and went with their suggestions :) But that was many months ago, so it's possible that there are new/better flags. One of the BOLT maintainers is suggesting new flags (#119418), however this is currently waiting on the LLVM 18 release, which should happen in a few days. |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e454644): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 652.672s -> 653.308s (0.10%) |
Looks like this increases memory usage, but doesn't seem to improve perf. |
src/tools/opt-dist/src/bolt.rs
Outdated
@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<( | |||
// Split function code into hot and code regions | |||
.arg("-split-functions") | |||
// Split as many basic blocks as possible | |||
.arg("-split-all-cold") | |||
// Move jump tables to a separate section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you removed an incorrect comment.
src/tools/opt-dist/src/bolt.rs
Outdated
@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<( | |||
// Split function code into hot and code regions | |||
.arg("-split-functions") | |||
// Split as many basic blocks as possible | |||
.arg("-split-all-cold") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really need to remove this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was not intended. I'll revert this line and change the flags based on feedback.
I'll change this slightly based on feedback. I think it's pretty weird that no optimizations at all are being displayed, being that inlining small functions tends to be pretty powerful. I'll do another benchmark with the fixed code (removing peepholes and restoring a flag I accidentally removed) and if that doesn't work, we'll just have to let this one die. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add more BOLT optimizations This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential). As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/ r? ghost
@@ -76,8 +78,6 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<( | |||
// Inline functions smaller than 32 bytes | |||
.arg("-inline-small-functions") | |||
.arg("-inline-small-functions-bytes=32") | |||
// Perform peephole optimizations | |||
.arg("-peepholes=all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Kobzol comments:
we tried the peephole flag several times, and it never showed any performance improvement
If it's already tried, I won't test it again on Rust's infra. (And sadly I'm having problems of benchmarking it in my own servers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Android maintainers (that use BOLT in their own Rust toolchain build) had some success with it, but we weren't able to replicate it.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e1d8daa): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 645.218s -> 644.71s (-0.08%) |
Hmmm... I think that I'll close this, that time improvement isn't enough compared to the regressions. I'll try making that CI change we talked about in Zulip |
This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential).
As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/
r? ghost