-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Cut code size for feature hashing #118348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This locally cuts ~32 kB of .text instructions.
(rustbot has picked a reviewer for you, use r? to override) |
/// Note that the total feature count is pretty small, so this is not a huge array. | ||
#[inline] | ||
pub fn all_features(&self) -> [u8; NUM_FEATURES] { | ||
[$(self.$feature as u8),+] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't like this I might look into seeing how painful/annoying it would be to make the tail of the struct be a repr(C) struct that we can cast into &[u8]
directly. Theoretically if we're not reordering fields that's basically already what should happen here. In any case, there's only ~250 bytes here so it's not a big copy anyway.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> Cut code size for feature hashing This locally cuts ~32 kB of .text instructions. This isn't really a clear win in terms of readability. IMO the code size benefits are worth it (even if they're not necessarily present in the x86_64 hyperoptimized build, I expect them to translate similarly to other platforms). Ultimately there's lots of "small ish" low hanging fruit like this that I'm seeing that seems worth tackling to me, and could translate into larger wins in aggregate.
/// | ||
/// Note that the total feature count is pretty small, so this is not a huge array. | ||
#[inline] | ||
pub fn all_features(&self) -> [u8; NUM_FEATURES] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it better to turn it into a u8 from a bool? Better HashStable
impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we should add a specialization there as well but [u8] operates in blocks while [bool] will operate one by one. In practice since this is compiling to a loop vs. fully unrolling (what we were doing before this PR afaict) it's not that big a difference but seems like an easy win. Also discourages usage of this array for anything but hashing.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7946c2c): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 675.005s -> 674.338s (-0.10%) |
Hm, interesting. Seems like this has zero effect (negative effect?) on the resulting artifacts here... I'll poke at that, it's an interesting result. I wonder if there's something in BOLT or PGO that teaches LLVM to do what I've done manually here automatically perhaps? Seems pretty unlikely... |
It looks like the effect on the HashStable impl is actually similarly positive in the end artifact, and the increases are presumably to other (hopefully unrelated, but we don't have the tooling to make amazing comparisons) functions.
So I think this is ready to go. I thought some more about the additional specialization for The gain here is that we're cutting around 20kb of instructions for the hashing here. As I said in the PR description, I think the cumulative potential here is possibly quite large and so I'm personally in favor of moving ahead here. If we want to refactor further before landing I'm OK with that but I think this is already pretty clean. |
Could not assign reviewer from: |
lgtm @bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (f440b5f): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 673.535s -> 674.352s (0.12%) |
This locally cuts ~32 kB of .text instructions.
This isn't really a clear win in terms of readability. IMO the code size benefits are worth it (even if they're not necessarily present in the x86_64 hyperoptimized build, I expect them to translate similarly to other platforms). Ultimately there's lots of "small ish" low hanging fruit like this that I'm seeing that seems worth tackling to me, and could translate into larger wins in aggregate.