Skip to content

Cut code size for feature hashing #118348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

Mark-Simulacrum
Copy link
Member

This locally cuts ~32 kB of .text instructions.

This isn't really a clear win in terms of readability. IMO the code size benefits are worth it (even if they're not necessarily present in the x86_64 hyperoptimized build, I expect them to translate similarly to other platforms). Ultimately there's lots of "small ish" low hanging fruit like this that I'm seeing that seems worth tackling to me, and could translate into larger wins in aggregate.

This locally cuts ~32 kB of .text instructions.
@rustbot
Copy link
Collaborator

rustbot commented Nov 27, 2023

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 27, 2023
/// Note that the total feature count is pretty small, so this is not a huge array.
#[inline]
pub fn all_features(&self) -> [u8; NUM_FEATURES] {
[$(self.$feature as u8),+]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't like this I might look into seeing how painful/annoying it would be to make the tail of the struct be a repr(C) struct that we can cast into &[u8] directly. Theoretically if we're not reordering fields that's basically already what should happen here. In any case, there's only ~250 bytes here so it's not a big copy anyway.

@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 27, 2023
@bors
Copy link
Collaborator

bors commented Nov 27, 2023

⌛ Trying commit 1487bd6 with merge 7946c2c...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 27, 2023
…=<try>

Cut code size for feature hashing

This locally cuts ~32 kB of .text instructions.

This isn't really a clear win in terms of readability. IMO the code size benefits are worth it (even if they're not necessarily present in the x86_64 hyperoptimized build, I expect them to translate similarly to other platforms). Ultimately there's lots of "small ish" low hanging fruit like this that I'm seeing that seems worth tackling to me, and could translate into larger wins in aggregate.
///
/// Note that the total feature count is pretty small, so this is not a huge array.
#[inline]
pub fn all_features(&self) -> [u8; NUM_FEATURES] {
Copy link
Member

@compiler-errors compiler-errors Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it better to turn it into a u8 from a bool? Better HashStable impl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we should add a specialization there as well but [u8] operates in blocks while [bool] will operate one by one. In practice since this is compiling to a loop vs. fully unrolling (what we were doing before this PR afaict) it's not that big a difference but seems like an easy win. Also discourages usage of this array for anything but hashing.

@bors
Copy link
Collaborator

bors commented Nov 27, 2023

☀️ Try build successful - checks-actions
Build commit: 7946c2c (7946c2c01125449cad5847e5ca23f73f72424b56)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7946c2c): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 5
All ❌✅ (primary) - - 0

Bootstrap: 675.005s -> 674.338s (-0.10%)
Artifact size: 313.35 MiB -> 313.36 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 27, 2023
@Mark-Simulacrum
Copy link
Member Author

Hm, interesting. Seems like this has zero effect (negative effect?) on the resulting artifacts here... I'll poke at that, it's an interesting result. I wonder if there's something in BOLT or PGO that teaches LLVM to do what I've done manually here automatically perhaps? Seems pretty unlikely...

@Mark-Simulacrum
Copy link
Member Author

It looks like the effect on the HashStable impl is actually similarly positive in the end artifact, and the increases are presumably to other (hopefully unrelated, but we don't have the tooling to make amazing comparisons) functions.

# new
1176 bytes - 83507978 <rustc_feature::unstable::Features as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable

# old
23771 bytes - 84054528 <rustc_feature::unstable::Features>::walk_feature_fields::<<rustc_feature::unstable::Features as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable::{closure#0}>
229 bytes - 84054282 <rustc_feature::unstable::Features as rustc_data_structures::stable_hasher::HashStable<rustc_query_system::ich::hcx::StableHashingContext>>::hash_stable

So I think this is ready to go. I thought some more about the additional specialization for [bool] arrays but ultimately I think it's probably not worth adding; any truly big bool array should be a bitset and this is (at least for now) an edge case. We can revisit that decision in the future easily too.

The gain here is that we're cutting around 20kb of instructions for the hashing here. As I said in the PR description, I think the cumulative potential here is possibly quite large and so I'm personally in favor of moving ahead here. If we want to refactor further before landing I'm OK with that but I think this is already pretty clean.

r? @compiler-errors

@rustbot
Copy link
Collaborator

rustbot commented Nov 29, 2023

Could not assign reviewer from: compiler-errors.
User(s) compiler-errors are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.

@compiler-errors
Copy link
Member

lgtm

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 29, 2023

📌 Commit 1487bd6 has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 29, 2023
@bors
Copy link
Collaborator

bors commented Nov 29, 2023

⌛ Testing commit 1487bd6 with merge f440b5f...

@bors
Copy link
Collaborator

bors commented Nov 29, 2023

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing f440b5f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 29, 2023
@bors bors merged commit f440b5f into rust-lang:master Nov 29, 2023
@rustbot rustbot added this to the 1.76.0 milestone Nov 29, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f440b5f): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.6% [-1.2%, -0.1%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.6% [3.3%, 3.9%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 2
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 673.535s -> 674.352s (0.12%)
Artifact size: 313.39 MiB -> 313.40 MiB (0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants