Add more BOLT optimizations #121902

blyxyas · 2024-03-02T15:45:51Z

This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential).

As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/
r? ghost

blyxyas · 2024-03-02T15:47:55Z

@bors try @rust-timer queue

bors · 2024-03-02T15:49:06Z

⌛ Trying commit 4ebb694 with merge e454644...

Add more BOLT optimizations This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential). As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/ r? ghost

Kobzol · 2024-03-02T16:50:07Z

FWIW, we tried the peephole flag several times, and it never showed any performance improvement. I don't think that we have tried the other flags though.

Kobzol · 2024-03-02T16:55:55Z

I chose the flags that we use by asking the BOLT maintainers on Discord which flags would they suggest, and went with their suggestions :) But that was many months ago, so it's possible that there are new/better flags. One of the BOLT maintainers is suggesting new flags (#119418), however this is currently waiting on the LLVM 18 release, which should happen in a few days.

bors · 2024-03-02T17:19:23Z

☀️ Try build successful - checks-actions
Build commit: e454644 (e454644c8c94778b7d1d731e6718023d1963ff07)

rust-timer · 2024-03-02T18:32:58Z

Finished benchmarking commit (e454644): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.7%	[0.8%, 6.3%]	56
Regressions ❌ (secondary)	3.9%	[1.9%, 9.0%]	56
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.7%	[0.8%, 6.3%]	56

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 652.672s -> 653.308s (0.10%)
Artifact size: 175.51 MiB -> 175.58 MiB (0.04%)

Kobzol · 2024-03-02T20:50:23Z

Looks like this increases memory usage, but doesn't seem to improve perf.

AJIOB · 2024-03-03T10:42:53Z

src/tools/opt-dist/src/bolt.rs

@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
        // Split function code into hot and code regions
        .arg("-split-functions")
        // Split as many basic blocks as possible
-        .arg("-split-all-cold")
-        // Move jump tables to a separate section


Looks like you removed an incorrect comment.

AJIOB · 2024-03-03T10:43:19Z

src/tools/opt-dist/src/bolt.rs

@@ -66,11 +66,18 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
        // Split function code into hot and code regions
        .arg("-split-functions")
        // Split as many basic blocks as possible
-        .arg("-split-all-cold")


Should we really need to remove this optimization?

This change was not intended. I'll revert this line and change the flags based on feedback.

blyxyas · 2024-03-04T13:22:25Z

I'll change this slightly based on feedback. I think it's pretty weird that no optimizations at all are being displayed, being that inlining small functions tends to be pretty powerful. I'll do another benchmark with the fixed code (removing peepholes and restoring a flag I accidentally removed) and if that doesn't work, we'll just have to let this one die.

blyxyas · 2024-03-04T13:22:56Z

@bors try @rust-timer queue

bors · 2024-03-04T13:24:10Z

⌛ Trying commit e67a236 with merge e1d8daa...

Add more BOLT optimizations This PR adds some more BOLT flags. I'm not sure why they were not added on the first place. I sadly cannot benchmark the changes on my machines (I'm having some problems). To not expand the build time too much, I only added the most influential on final times (inlining small functions and peepholes being the most influential). As this is just a experiment, let's not assign anyone yet. Let's hope that this one PR improves runtimes. Improving performance is hard :/ r? ghost

AJIOB · 2024-03-04T13:29:25Z

src/tools/opt-dist/src/bolt.rs

@@ -76,8 +78,6 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
        // Inline functions smaller than 32 bytes
        .arg("-inline-small-functions")
        .arg("-inline-small-functions-bytes=32")
-        // Perform peephole optimizations
-        .arg("-peepholes=all")


Is it useless?

Based on Kobzol comments:

we tried the peephole flag several times, and it never showed any performance improvement

If it's already tried, I won't test it again on Rust's infra. (And sadly I'm having problems of benchmarking it in my own servers)

The Android maintainers (that use BOLT in their own Rust toolchain build) had some success with it, but we weren't able to replicate it.

bors · 2024-03-04T14:52:59Z

☀️ Try build successful - checks-actions
Build commit: e1d8daa (e1d8daacc3eeaf4c59296f02f9fa2edb31d57a9b)

rust-timer · 2024-03-04T16:06:37Z

Finished benchmarking commit (e1d8daa): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.2%	[0.1%, 0.3%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[0.6%, 5.9%]	86
Regressions ❌ (secondary)	3.6%	[0.4%, 7.2%]	71
Improvements ✅ (primary)	-1.9%	[-1.9%, -1.9%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.8%	[-1.9%, 5.9%]	87

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 645.218s -> 644.71s (-0.08%)
Artifact size: 175.01 MiB -> 175.07 MiB (0.04%)

blyxyas · 2024-03-05T15:09:52Z

Hmmm... I think that I'll close this, that time improvement isn't enough compared to the regressions. I'll try making that CI change we talked about in Zulip

Add more optimizations

4ebb694

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 2, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 2, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 2, 2024

AJIOB suggested changes Mar 3, 2024

View reviewed changes

Restore -split-all-cold and remove peepholes based on feedback

e67a236

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 4, 2024

AJIOB reviewed Mar 4, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 4, 2024

blyxyas closed this Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more BOLT optimizations #121902

Add more BOLT optimizations #121902

blyxyas commented Mar 2, 2024

blyxyas commented Mar 2, 2024

This comment has been minimized.

bors commented Mar 2, 2024

Kobzol commented Mar 2, 2024 •

edited

Loading

Kobzol commented Mar 2, 2024

bors commented Mar 2, 2024

This comment has been minimized.

rust-timer commented Mar 2, 2024

Kobzol commented Mar 2, 2024

AJIOB Mar 3, 2024

AJIOB Mar 3, 2024

blyxyas Mar 3, 2024

blyxyas commented Mar 4, 2024

blyxyas commented Mar 4, 2024

This comment has been minimized.

bors commented Mar 4, 2024

AJIOB Mar 4, 2024

blyxyas Mar 4, 2024

Kobzol Mar 4, 2024

bors commented Mar 4, 2024

This comment has been minimized.

rust-timer commented Mar 4, 2024

blyxyas commented Mar 5, 2024

Add more BOLT optimizations #121902

Add more BOLT optimizations #121902

Conversation

blyxyas commented Mar 2, 2024

blyxyas commented Mar 2, 2024

This comment has been minimized.

bors commented Mar 2, 2024

Kobzol commented Mar 2, 2024 • edited Loading

Kobzol commented Mar 2, 2024

bors commented Mar 2, 2024

This comment has been minimized.

rust-timer commented Mar 2, 2024

Overall result: no relevant changes - no action needed

Kobzol commented Mar 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blyxyas commented Mar 4, 2024

blyxyas commented Mar 4, 2024

This comment has been minimized.

bors commented Mar 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Mar 4, 2024

This comment has been minimized.

rust-timer commented Mar 4, 2024

Overall result: ❌ regressions - no action needed

blyxyas commented Mar 5, 2024

Kobzol commented Mar 2, 2024 •

edited

Loading