Skip to content

Ensure strings created with const_str get the unnamed_addr attribute #117930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

thomcc
Copy link
Member

@thomcc thomcc commented Nov 15, 2023

This function (const_str) is only used when we need to invent a string during codegen -- for example, for a panic message to pass when codegening some of the assert/panic/etc terminators (for stuff like divide by zero).

AFAICT all other consts, such as the user-defined ones from const eval, should already be getting this attribute (things that come from a ConstAllocation do, for example). Which means that the "unnamed" part is even more true than usual here, these aren't strings that even exist as far as the user can tell.

Setting this attribute allows LLVM to merge these constants, leading to significant binary size savings (much more than I would expect). On x86_64-unknown-linux-gnu, t takes a build of ripgrep (release without debug info) from 9.7MiB to 6.0MiB (a savings of over 30%!?), and a build of rustc_driver's shared object from 123MiB to 112MiB (less drastic, but still over 10% reduced).

The effect on ripgrep is substantially reduced on macOS for reasons beyond me (I may have fucked up the test), only saving around 0.2MiB, although rustc_driver is still around 10MB or smaller than it had been previously.

This raises some questions, such as "does that mean 1/3 of ripgrep was made of division by zero complaints?" I'm not sure, that may be the case. The output of strings path/to/rg is ~2MB smaller, so it seems like a lot of it was. Allowing these to be merged presumably also allow functions that contain them to be merged (if the addresses had semantic meaning, then it stands).

I intend to do some more analysis here, but I got this up as soon as I realized that this attribute was only missing for internal const strings, and all other ones already get it.

Edit: The wins are much more marginal, but there's some argument to do this for the sake of consistency.

@thomcc thomcc added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Nov 15, 2023
@rustbot
Copy link
Collaborator

rustbot commented Nov 15, 2023

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 15, 2023
@nikic
Copy link
Contributor

nikic commented Nov 15, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 15, 2023
@bors
Copy link
Collaborator

bors commented Nov 15, 2023

⌛ Trying commit 268f5c5 with merge 4f2e555...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 15, 2023
Ensure strings created with `const_str` get the `unnamed_addr` attribute

This function (`const_str`) is only used when we need to invent a string during codegen -- for example, for a panic message to pass when codegening some of the assert/panic/etc terminators (for stuff like divide by zero).

AFAICT all other consts, such as the user-defined ones from const eval, should already be getting this attribute (things that come from a ConstAllocation do, for example). Which means that the "unnamed" part is even more true than usual here, these aren't strings that even exist as far as the user can tell.

Setting this attribute allows LLVM to merge these constants, leading to significant binary size savings (much more than I would expect). On x86_64-unknown-linux-gnu, t takes a build of ripgrep (release without debug info) from 9.7MiB to 6.0MiB (a savings of over 30%!?), and a build of rustc_driver's shared object from ~123MiB to ~112MiB (less drastic, but still over 10% reduced).

The effect on ripgrep is substantially reduced on macOS for reasons beyond me (I may have fucked up the test), only saving around 0.2MiB, although rustc_driver is still around 10MB or smaller than it had been previously.

This raises some questions, such as "does that mean 1/3 of ripgrep was made of division by zero complaints?" I'm not sure, that may be the case. The output of `strings path/to/rg` is ~2MB smaller, so it seems like a lot of it was. Allowing these to be merged presumably also allow functions that contain them to be merged (if the addresses had semantic meaning, then it stands).

I intend to do some more analysis here, but I got this up as soon as I realized that this attribute was only missing for internal const strings, and all other ones already get it.
@bors
Copy link
Collaborator

bors commented Nov 15, 2023

☀️ Try build successful - checks-actions
Build commit: 4f2e555 (4f2e555b4dbf81546d145f9f1bc9e1e99da7acfd)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4f2e555): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [0.5%, 1.7%] 3
Regressions ❌
(secondary)
2.5% [2.5%, 2.5%] 1
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
-4.2% [-4.2%, -4.2%] 1
All ❌✅ (primary) 0.1% [-2.6%, 1.7%] 4

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 674.6s -> 672.542s (-0.31%)
Artifact size: 311.10 MiB -> 311.08 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 15, 2023
@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

@rustbot author

Somewhat confused by those results, I'll look into it (may not have time until the weekend). Obviously if this doesn't improve things then it's not worth doing.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 15, 2023
@nbdd0121
Copy link
Contributor

I think it's still worth doing regardless size saving, just to get more consistency with all the constants.

@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

I have limited time for the rest of the week, so I can't investigate too deeply until the weekend. But this seems to only reduce the unstripped size. It looks like they're roughly the same size after striping.

No theories on my end, really, although in my estimation that still makes it useful (although I do feel somewhat dumb).

@nbdd0121
Copy link
Contributor

nbdd0121 commented Nov 15, 2023

The use of const_str is very limited, they are only used when codegen needs to generate a string (currently basically only when codegenning an assert terminator). For the majority of cases strings are CTFE allocs, which we already mark as unnamed when generating LLVM IR.

@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

Yeah, I mentioned that. The confusing thing is what actually is being reduced and why it's so much (for unstripped).

@lqd
Copy link
Member

lqd commented Nov 15, 2023

why it's so much (for unstripped).

What numbers are you seeing in these cases, on which ripgrep revision, and are they different from https://perf.rust-lang.org/compare.html?start=698fcc8219e6dd690b148a23af10a0e5747621fe&end=4f2e555b4dbf81546d145f9f1bc9e1e99da7acfd&stat=size%3Alinked_artifact&name=ripgrep&showRawData=true&nonRelevant=true?

(Note that rustc-perf doesn't test the most recent revision -- but I can reproduce its "size:linked_artifact" results locally on its revision of ripgrep).

Locally on the latest ripgrep revision, I'm also not seeing a big difference myself (-15k in debug, +15k in release; with the default ripgrep settings, that is with debuginfo in release).

@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

Ripgrep revision 7099e174acbcbd940f57e4ab4913fee4040c826e, built from source in the repo, after patching the Cargo.toml to disable profile.release.debug as follows:

diff --git a/Cargo.toml b/Cargo.toml
index 3a90556..48836d5 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -79,7 +79,7 @@ simd-accel = ["grep/simd-accel"]
 pcre2 = ["grep/pcre2"]

 [profile.release]
-debug = 1
+#debug = 1

 [package.metadata.deb]
 features = ["pcre2"]

For measuring the size of the rustc_driver library, I was comparing between the size of the versions in stage1 (build/x86_64-unknown-linux-gnu/stage1/lib/librustc_driver-94856bdf012de46b.so) and stage2 (build/x86_64-unknown-linux-gnu/stage2/lib/librustc_driver-39b917df1cd4fe20.so) versions in the build root (obviously an unscientific test given all the PGO and stuff we run on the real versions).

@lqd
Copy link
Member

lqd commented Nov 15, 2023

And your ripgrep numbers on that commit are from "9.7MiB to 6.0MiB", right? I'm locally seeing a <3KB difference with this PR 4f2e555b4dbf81546d145f9f1bc9e1e99da7acfd compared to its parent 698fcc8219e6dd690b148a23af10a0e5747621fe:

-rwxrwxr-x 1 lqd lqd 10147280 Nov 15 19:30 rg-release-defaultdebuginfo-master*
-rwxrwxr-x 1 lqd lqd 10144328 Nov 15 19:30 rg-release-defaultdebuginfo-pr117930*

@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

Hmmm... my test methodology is just:

$ cargo clean --quiet && cargo build --release --quiet && stat -c 'old %n %s' target/release/rg
old target/release/rg 10149656

$ cargo clean --quiet && cargo +stage2 build --release --quiet && stat -c 'new %n %s' target/release/rg
new target/release/rg 6298320

I can check on a machine other than the cloud desktop over the weekend.

Edit: Here's the information for the nightly rustc I'm comparing against.

$ rustc --verbose --version
rustc 1.75.0-nightly (edf0b1db0 2023-11-10)
binary: rustc
commit-hash: edf0b1db0a7f29d71ee82cfc53bdc170fe74e501
commit-date: 2023-11-10
host: x86_64-unknown-linux-gnu
release: 1.75.0-nightly
LLVM version: 17.0.4

@lqd
Copy link
Member

lqd commented Nov 15, 2023

Yeah you're risking comparing different configs: you may also have std's debuginfo disabled in config.toml (which is around these 3MiB)? Simulating that with strip -g on this PR's results gets me close to your 6MiB: 6136880

@thomcc
Copy link
Member Author

thomcc commented Nov 15, 2023

Ahh, that probably explains it. I'll be more careful about that in the future. I didn't realize that you end up with std's debuginfo for a normal debug=0 release build, although I do see why that would happen.

Well, I suppose this PR is considerably less worthwhile then, although I'm somewhat glad there isn't such a low hanging fruit still. That said, given that we do emit this attribute on every other constant, I think the PR is still probably worthwhile.

@thomcc
Copy link
Member Author

thomcc commented Nov 16, 2023

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 16, 2023
@nikic
Copy link
Contributor

nikic commented Nov 16, 2023

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 16, 2023

📌 Commit 268f5c5 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 16, 2023
@bors
Copy link
Collaborator

bors commented Nov 16, 2023

⌛ Testing commit 268f5c5 with merge 48d8100...

@bors
Copy link
Collaborator

bors commented Nov 16, 2023

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 48d8100 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 16, 2023
@bors bors merged commit 48d8100 into rust-lang:master Nov 16, 2023
@rustbot rustbot added this to the 1.76.0 milestone Nov 16, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (48d8100): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-1.1%, -0.4%] 5
Improvements ✅
(secondary)
-2.4% [-2.6%, -2.2%] 2
All ❌✅ (primary) -0.7% [-1.1%, -0.4%] 5

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 675.534s -> 675.809s (0.04%)
Artifact size: 313.63 MiB -> 313.60 MiB (-0.01%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants