-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add a cache for maybe_lint_level_root_bounded
#113609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a cache for maybe_lint_level_root_bounded
#113609
Conversation
It's annoying that these wrap in a 100-char terminal window.
From `TyCtxt` to the MIR `Builder`. This will allow us to add a cache to `Builder` and use it from `maybe_lint_level_root_bounded`.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 812a5ee0f64a6566cb35fd71e27fe01d8139bd27 with merge 14a7775ca133dba23c6852ebdc8638048b5a57da... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (14a7775ca133dba23c6852ebdc8638048b5a57da): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 656.821s -> 659.49s (0.41%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great results!
IIRC, we always have orig_id.owner
equal to self.hir_id.owner
, so we the cache could be limited to storing hir::ItemLocalId
.
Do you have an idea of the sparsity of the cache? I wonder if we could get even better with a bitset instead of a hashset.
@@ -725,6 +733,7 @@ impl<'a, 'tcx> Builder<'a, 'tcx> { | |||
var_indices: Default::default(), | |||
unit_temp: None, | |||
var_debug_info: vec![], | |||
lint_level_roots_cache: FxHashSet::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we initialize with self.hir_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Due to the parent_id == self.hir_id
test on the second call, self.hir_id
is never passed to maybe_lint_level_root_bounded
:)
if parent_id == self.hir_id { | ||
parent_id // this is very common | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be made a fast path inside maybe_lint_level_root_bounded
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id == self.hir_id
is already the first check within maybe_lint_level_root_bounded
. I added this pre-check here because (a) it's useful documentation, and (b) it gave a 0.4% instruction count win for deep-vector
, due to avoiding the function call overhead.
if hir.attrs(id).iter().any(|attr| Level::from_attr(attr).is_some()) { | ||
// This is a rare case. It's for a node path that doesn't reach the root due to an | ||
// intervening lint level attribute. This result doesn't get cached. | ||
return id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we eventually cache this too? If we have a whole HIR subtree that hits the same lint root, different than self.hir_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally did cache these values, using an FxHashMap<HirId, HirId>
. Then I realized that 99% of the values stored were self.hir_id
, which seemed wasteful. So I tried changing it to FxHashSet<HirId>
and only caching those 99% and the instruction count dropped very slightly, while also using less memory. And if we want to use bitset (like you suggested above) then we'll need to keep this design.
Oh, cool! I wondered about a bitset but I thought it wasn't possible because of |
It's a nice speed win.
812a5ee
to
667d75e
Compare
The new version uses a bitset instead of a hashset. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 667d75e with merge 948981694e24fd6ad761c41a383843fbe8b5dad1... |
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (948981694e24fd6ad761c41a383843fbe8b5dad1): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 658.224s -> 659.878s (0.25%) |
Yay, the new results with the bitset are clearly better: improvement in instruction counts, but even better, cycles and walltimes are seeing some genuine clear wins. |
Thanks! |
☀️ Test successful - checks-actions |
1 similar comment
☀️ Test successful - checks-actions |
Finished benchmarking commit (fe03b46): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 656.192s -> 658.307s (0.32%) |
maybe_lint_level_root_bounded
is called many times and traces node sub-paths many times. This PR adds a cache that lets many of these tracings be skipped, avoiding lots of calls to functions likeMap::attrs
andMap::parent_id
.r? @cjgillot