Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CTFE interning: don't walk allocations that don't need it #97585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CTFE interning: don't walk allocations that don't need it #97585
Changes from all commits
97a0b2e
266bab2
18cbc19
c9772d7
6d03c8d
d634f14
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only possible source of the regression that I can see is this function call invoking
layout.field()
to get the slice element's layout, which should also happen in the regular walk. But we only do that field layout computation if we're going to take the fast path here, so there should be no duplicate work.Unsure how to debug this further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valgrind have finally fixed the dwarf bug and I was able to run the CI artifacts to reproduce the issue it sees, that oli posted.
I wonder if that's the case.
diesel
is unusual here in that 3% of the check time is spent intry_fold_with
and the likes, does the current PR change things related to folding at all @oli-obk ? (I would expect it not to).Edit: After spending the better part of the day looking at this, I've noticed gathering my statistics (tracing rustc) was done over a diesel clean build (w/ dependencies and so on included), while rustc-perf just built the leaf crate. Unless I'm mistaken again, Diesel only has one aggregate that is walked during the check build benchmark, and it's a ZST hitting the early return... Making this regression either more puzzling, or easier to discard as noise.
The number of
TyAndLayout::field()
calls have apparently lowered (edit: on a clean build): 329512 on master and 326641 here. Were those you were wondering about ? (in the check benchmark I'm not seeing any change here by this PR)I also wonder why this didn't show up in the previous run, if anything
size_and_align_of_mplace
is called <= as then. Just in case I'll try reverting back to the same commit and running the bot again to see what happens, to see if it's indeed noise.I'm seeing some tiny cachegrind differences locally as well, but they are completely unrelated (and it seems unlikely it's about PGO) but could be about allocation: cachegrind attributes more time to
./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
, the problem is that it does this here even for two different rustc worktrees for the same commit and same config.toml, making things though to analyze (I'll ignore this memcpy difference).The
visit_aggregate
function in particular is pretty short, and some variance in the machine makes the 4945 calls' wall-time noisy (and doesn't register in cachegrind at all, neither does const interning apparently) like so:Edit: the above are actually the clean build stats, not a leaf crate build like rustc-perf: IIUC they mostly come from dependencies (std/alloc, via proc-macros). The single call couldn't + early return make such a big regression.
I was about to post the actual flow of the 4945 aggregates to list the ZSTs, the early returns, unsized mplaces,
get_ptr_alloc
returning None and so on, but I don't think it's worth the time for what I believe is likely noise 😓 .edit one million: and now after a rebase over master, the allocator noise I was seeing has switched to the other direction. Incredible.