-
Notifications
You must be signed in to change notification settings - Fork 13.4k
interpret: do not force_allocate all return places #141406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
interpret: do not force_allocate all return places A while ago I cleaned up our `PlaceTy` a little, but as a side-effect of that, return places had to always be force-allocated. That turns out to cause quite a few extra allocations, and for a project we are doing where we marry Miri with a model checker, that means a lot of extra work -- local variables are just so much easier to reason about than allocations. So, this PR brings back the ability to have the return place be just a local of the caller. To make this work cleanly I had to rework stack pop handling a bit, which also changes the output of Miri in some cases as the span for errors occurring during a particular phase of stack pop changed. With these changes, a no-std binary with a function of functions that just take and return scalar types and that uses no pointers now does not move *any* local variables into memory. :) r? `@oli-obk`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
I ran the Miri benchmarks with and without this, and I am seeing improvements throughout the board:
slice-get-unchecked and backtraces got 20% faster! |
Finished benchmarking commit (165f1dd): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 3.3%, secondary -3.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 0.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 776.4s -> 777.173s (0.10%) |
2480c47
to
6a9e189
Compare
@@ -22,5 +22,4 @@ fn race(local: i32) { | |||
thread::yield_now(); | |||
// Deallocating the local (when `main` returns) | |||
// races with the read in the other thread. | |||
// Make sure the error points at this function's end, not just the call site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh 😆 obviously intended change, but should we try to do something here in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... well TBH I am no longer sure that that's really the behavior we want here.
Deallocating locals has to happen after the return value got copied, and when there's something wrong with copying the return value we want that to point at the caller (since it's likely caused by the caller using a different return type than the callee). We used to have some very special hacks here to delay the "UB during ret val copy" error so that if there's also UB during deallocation, that takes precedent... but is that really a good idea?
@bors r+ |
🌲 The tree is currently closed for pull requests below priority 10. This pull request will be tested once the tree is reopened. |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 95a2212 (parent) -> b5eb989 (this PR) Test differencesNo test diffs found Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard b5eb9893f42a469d330046089539f908d4728384 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (b5eb989): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (secondary 2.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.9%, secondary -1.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 776.359s -> 775.728s (-0.08%) |
A while ago I cleaned up our
PlaceTy
a little, but as a side-effect of that, return places had to always be force-allocated. That turns out to cause quite a few extra allocations, and for a project we are doing where we marry Miri with a model checker, that means a lot of extra work -- local variables are just so much easier to reason about than allocations.So, this PR brings back the ability to have the return place be just a local of the caller. To make this work cleanly I had to rework stack pop handling a bit, which also changes the output of Miri in some cases as the span for errors occurring during a particular phase of stack pop changed.
With these changes, a no-std binary with a function of functions that just take and return scalar types and that uses no pointers now does not move any local variables into memory. :)
r? @oli-obk