Skip to content

Add more BOLT optimizations #121902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

blyxyas
Copy link
Member

@blyxyas blyxyas commented Mar 2, 2024

This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential).

As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/
r? ghost

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 2, 2024
@blyxyas
Copy link
Member Author

blyxyas commented Mar 2, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 2, 2024
@bors
Copy link
Collaborator

bors commented Mar 2, 2024

⌛ Trying commit 4ebb694 with merge e454644...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 2, 2024
Add more BOLT optimizations

This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential).

As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/
r? ghost
@Kobzol
Copy link
Contributor

Kobzol commented Mar 2, 2024

FWIW, we tried the peephole flag several times, and it never showed any performance improvement. I don't think that we have tried the other flags though.

@Kobzol
Copy link
Contributor

Kobzol commented Mar 2, 2024

I chose the flags that we use by asking the BOLT maintainers on Discord which flags would they suggest, and went with their suggestions :) But that was many months ago, so it's possible that there are new/better flags. One of the BOLT maintainers is suggesting new flags (#119418), however this is currently waiting on the LLVM 18 release, which should happen in a few days.

@bors
Copy link
Collaborator

bors commented Mar 2, 2024

☀️ Try build successful - checks-actions
Build commit: e454644 (e454644c8c94778b7d1d731e6718023d1963ff07)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e454644): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.7% [0.8%, 6.3%] 56
Regressions ❌
(secondary)
3.9% [1.9%, 9.0%] 56
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.7% [0.8%, 6.3%] 56

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 652.672s -> 653.308s (0.10%)
Artifact size: 175.51 MiB -> 175.58 MiB (0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 2, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Mar 2, 2024

Looks like this increases memory usage, but doesn't seem to improve perf.

@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
// Split function code into hot and code regions
.arg("-split-functions")
// Split as many basic blocks as possible
.arg("-split-all-cold")
// Move jump tables to a separate section
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you removed an incorrect comment.

@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
// Split function code into hot and code regions
.arg("-split-functions")
// Split as many basic blocks as possible
.arg("-split-all-cold")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really need to remove this optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was not intended. I'll revert this line and change the flags based on feedback.

@blyxyas
Copy link
Member Author

blyxyas commented Mar 4, 2024

I'll change this slightly based on feedback. I think it's pretty weird that no optimizations at all are being displayed, being that inlining small functions tends to be pretty powerful. I'll do another benchmark with the fixed code (removing peepholes and restoring a flag I accidentally removed) and if that doesn't work, we'll just have to let this one die.

@blyxyas
Copy link
Member Author

blyxyas commented Mar 4, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 4, 2024
@bors
Copy link
Collaborator

bors commented Mar 4, 2024

⌛ Trying commit e67a236 with merge e1d8daa...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 4, 2024
Add more BOLT optimizations

This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential).

As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/
r? ghost
@@ -76,8 +78,6 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
// Inline functions smaller than 32 bytes
.arg("-inline-small-functions")
.arg("-inline-small-functions-bytes=32")
// Perform peephole optimizations
.arg("-peepholes=all")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Kobzol comments:

we tried the peephole flag several times, and it never showed any performance improvement

If it's already tried, I won't test it again on Rust's infra. (And sadly I'm having problems of benchmarking it in my own servers)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Android maintainers (that use BOLT in their own Rust toolchain build) had some success with it, but we weren't able to replicate it.

@bors
Copy link
Collaborator

bors commented Mar 4, 2024

☀️ Try build successful - checks-actions
Build commit: e1d8daa (e1d8daacc3eeaf4c59296f02f9fa2edb31d57a9b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e1d8daa): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.1%, 0.3%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [0.6%, 5.9%] 86
Regressions ❌
(secondary)
3.6% [0.4%, 7.2%] 71
Improvements ✅
(primary)
-1.9% [-1.9%, -1.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.8% [-1.9%, 5.9%] 87

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 645.218s -> 644.71s (-0.08%)
Artifact size: 175.01 MiB -> 175.07 MiB (0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 4, 2024
@blyxyas
Copy link
Member Author

blyxyas commented Mar 5, 2024

Hmmm... I think that I'll close this, that time improvement isn't enough compared to the regressions. I'll try making that CI change we talked about in Zulip

@blyxyas blyxyas closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants