-
Notifications
You must be signed in to change notification settings - Fork 13.3k
TypeFoldable/TypeVisitable tidy up #117896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rather than being concerned with internals of ExternalConstraintsData and PredefinedOpaquesData, we derive traversable impls for those types to which we then delegate.
UnevaluatedConst is not a type of any interest to folders or visitors, and therefore should not be "super visitable" at all.
Previously only enforced via comment, now via negative impls.
r? lcnr |
r=me after perf @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
TypeFoldable/TypeVisitable tidy up I'm breaking rust-lang#117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow. r? types
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (a5f77ae): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 674.599s -> 674.341s (-0.04%) |
I've had a good play with cachegrind, and think this latest commit should resolve the perf regression. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
TypeFoldable/TypeVisitable tidy up I'm breaking rust-lang#117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow. r? types
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b57fd59): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 677.592s -> 678.36s (0.11%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized i have a pending review here 😅
do you feel happy with the perf changes and want me to review? I already looked at this before and was happy with the changes I think?
/// respectively folded (in definition order) using the `TypeFoldable` implementation for its | ||
/// type. However, if a field of a struct or of an enum variant is annotated with | ||
/// `#[type_foldable(identity)]` then that field will retain its incumbent value (and its type | ||
/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous | |
/// is not required to implement `TypeFoldable`). Use of this attribute is dangerous |
/// Each field of the struct or enum variant will be visited (in definition order) using the | ||
/// `TypeVisitable` implementation for its type. However, if a field of a struct or of an enum | ||
/// variant is annotated with `#[type_visitable(ignore)]` then that field will not be visited | ||
/// (and its type is not required to implement `TypeVisitable`). However use of this attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// (and its type is not required to implement `TypeVisitable`). However use of this attribute | |
/// (and its type is not required to implement `TypeVisitable`). Use of this attribute |
looking over it I would expect d0c9ba3 to be the cause of the regression 🤔 incredibly hot code so small changes can have a stupid impact here. I would expect that the additional |
That was my thought too, hence the latest commit tried to ensure the delegation is inlined so that the match can get optimised away. That's what I appeared to have achieved locally, at least - doesn't appear to have the same result in CI though. Sorry, hadn't spent much more time thinking about it yet - but still do intend to. |
I personally think it's fine to keep the existing |
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this. @rustbot label: +S-inactive |
I finally managed to do some real investigation into this performance hit and am dumping here my interim findings. Yes, d0c9ba3 resulted in different inlining decisions, but the real impact appears to be from the loss of a tail call optimisation in before
after
Of immediate note however:
|
Okay, I got to a reduced test case. The problem appears to originate in the early return (via |
Having made that fix, I'm (locally, on aarch64-unknown-linux-gnu without PGO) broadly seeing quite encouraging results: but still a few benchmarks (primarily full builds of bitmaps and typenum) that are slightly regressed: Summary
AggregationsProfilecheck
debug
opt
Scenariosfull
incr-full
incr-unchanged
incr-patched
Categoryprimary
secondary
DetailsPrimary benchmarks
Secondary benchmarks
I'm now examining the cachegrind diff for the most regressed benchmark (bitmaps-3.1.0 check full). |
I managed to build locally with PGO (but not BOLT owing to llvm/llvm-project#89549) and, whilst different, I think the results are similarly encouraging: Summary
Primary benchmarks
Secondary benchmarks
Seeing the following in cachegrind diff for the most regressed primary benchmark:
Which I think again is down to inlining, but at least now narrowing down to a likely mono item... |
Prefer not to early return from derived visitables LLVM is not able to tail-call optimise the final early return, as there instructions that follow it. Making that call the returned expression resolves this. (As identified in rust-lang#117896 (comment)). r? wg-compiler-performance
I'm breaking #117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow.
r? types