interpret: do not force_allocate all return places #141406

RalfJung · 2025-05-22T18:19:06Z

A while ago I cleaned up our PlaceTy a little, but as a side-effect of that, return places had to always be force-allocated. That turns out to cause quite a few extra allocations, and for a project we are doing where we marry Miri with a model checker, that means a lot of extra work -- local variables are just so much easier to reason about than allocations.

So, this PR brings back the ability to have the return place be just a local of the caller. To make this work cleanly I had to rework stack pop handling a bit, which also changes the output of Miri in some cases as the span for errors occurring during a particular phase of stack pop changed.

With these changes, a no-std binary with a function of functions that just take and return scalar types and that uses no pointers now does not move any local variables into memory. :)

r? @oli-obk

rustbot · 2025-05-22T18:19:11Z

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

RalfJung · 2025-05-22T18:24:16Z

@bors try @rust-timer queue

bors · 2025-05-22T18:25:27Z

⌛ Trying commit 2480c47 with merge 165f1dd...

interpret: do not force_allocate all return places A while ago I cleaned up our `PlaceTy` a little, but as a side-effect of that, return places had to always be force-allocated. That turns out to cause quite a few extra allocations, and for a project we are doing where we marry Miri with a model checker, that means a lot of extra work -- local variables are just so much easier to reason about than allocations. So, this PR brings back the ability to have the return place be just a local of the caller. To make this work cleanly I had to rework stack pop handling a bit, which also changes the output of Miri in some cases as the span for errors occurring during a particular phase of stack pop changed. With these changes, a no-std binary with a function of functions that just take and return scalar types and that uses no pointers now does not move *any* local variables into memory. :) r? `@oli-obk`

bors · 2025-05-22T20:30:35Z

☀️ Try build successful - checks-actions
Build commit: 165f1dd (165f1dd00f970ba0e0dcfdf5d9c31a80464e7c42)

RalfJung · 2025-05-22T21:08:06Z

I ran the Miri benchmarks with and without this, and I am seeing improvements throughout the board:

Comparison with baseline (relative speed, lower is better for the new results):
  /home/r/src/rust/miri/bench-cargo-miri/unicode: 0.82 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/zip-equal: 0.89 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/mse: 0.82 ± 0.02
  /home/r/src/rust/miri/bench-cargo-miri/slice-chunked: 0.83 ± 0.03
  /home/r/src/rust/miri/bench-cargo-miri/serde1: 0.92 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/slice-get-unchecked: 0.78 ± 0.02
  /home/r/src/rust/miri/bench-cargo-miri/range-iteration: 0.87 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/backtraces: 0.79 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/serde2: 0.92 ± 0.01
  /home/r/src/rust/miri/bench-cargo-miri/string-replace: 0.92 ± 0.03
  /home/r/src/rust/miri/bench-cargo-miri/big-allocs: 0.96 ± 0.08

slice-get-unchecked and backtraces got 20% faster!

rust-timer · 2025-05-22T22:01:02Z

Finished benchmarking commit (165f1dd): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.6%	[0.6%, 0.6%]	1
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-3.4%	[-4.6%, -0.3%]	7
All ❌✅ (primary)	-0.2%	[-0.2%, -0.2%]	1

Max RSS (memory usage)

Results (primary 3.3%, secondary -3.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.3%	[3.3%, 3.3%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.1%	[-3.1%, -3.1%]	1
All ❌✅ (primary)	3.3%	[3.3%, 3.3%]	1

Cycles

Results (secondary 0.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 776.4s -> 777.173s (0.10%)
Artifact size: 365.61 MiB -> 365.61 MiB (0.00%)

oli-obk · 2025-05-26T09:53:17Z

src/tools/miri/tests/fail/data_race/stack_pop_race.rs

@@ -22,5 +22,4 @@ fn race(local: i32) {
    thread::yield_now();
    // Deallocating the local (when `main` returns)
    // races with the read in the other thread.
-    // Make sure the error points at this function's end, not just the call site.


uh 😆 obviously intended change, but should we try to do something here in the future?

Yeah... well TBH I am no longer sure that that's really the behavior we want here.

Deallocating locals has to happen after the return value got copied, and when there's something wrong with copying the return value we want that to point at the caller (since it's likely caused by the caller using a different return type than the callee). We used to have some very special hacks here to delay the "UB during ret val copy" error so that if there's also UB during deallocation, that takes precedent... but is that really a good idea?

oli-obk · 2025-05-26T09:54:15Z

@bors r+

bors · 2025-05-26T09:54:18Z

📌 Commit 6a9e189 has been approved by oli-obk

It is now in the queue for this repository.

bors · 2025-05-26T09:54:19Z

🌲 The tree is currently closed for pull requests below priority 10. This pull request will be tested once the tree is reopened.

bors · 2025-05-26T10:29:22Z

⌛ Testing commit 6a9e189 with merge b5eb989...

bors · 2025-05-26T13:41:25Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing b5eb989 to master...

github-actions · 2025-05-26T13:44:10Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 95a2212 (parent) -> b5eb989 (this PR)

Test differences

No test diffs found

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard b5eb9893f42a469d330046089539f908d4728384 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

aarch64-apple: 5398.1s -> 7237.0s (34.1%)
dist-aarch64-linux: 8233.3s -> 5536.8s (-32.8%)
x86_64-apple-2: 6631.5s -> 4591.8s (-30.8%)
dist-apple-various: 7801.0s -> 5486.3s (-29.7%)
dist-x86_64-apple: 8095.9s -> 9773.2s (20.7%)
dist-aarch64-apple: 6139.9s -> 5166.3s (-15.9%)
aarch64-gnu-debug: 4634.9s -> 3950.1s (-14.8%)
aarch64-gnu: 7905.5s -> 6769.0s (-14.4%)
x86_64-apple-1: 7727.7s -> 8466.3s (9.6%)
dist-ohos-x86_64: 4353.8s -> 4652.1s (6.9%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-05-26T16:28:15Z

Finished benchmarking commit (b5eb989): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.6%	[-2.9%, -0.2%]	2
Improvements ✅ (secondary)	-3.5%	[-4.6%, -0.3%]	7
All ❌✅ (primary)	-1.6%	[-2.9%, -0.2%]	2

Max RSS (memory usage)

Results (secondary 2.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	5.4%	[3.3%, 7.5%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.7%	[-2.7%, -2.7%]	1
All ❌✅ (primary)	-	-	0

Cycles

Results (primary -2.9%, secondary -1.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.8%	[1.2%, 10.4%]	3
Improvements ✅ (primary)	-2.9%	[-2.9%, -2.9%]	1
Improvements ✅ (secondary)	-5.1%	[-5.8%, -4.3%]	5
All ❌✅ (primary)	-2.9%	[-2.9%, -2.9%]	1

Binary size

Results (primary -1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.1%	[-1.1%, -1.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-1.1%, -1.1%]	1

Bootstrap: 776.359s -> 775.728s (-0.08%)
Artifact size: 366.28 MiB -> 366.25 MiB (-0.01%)

rustbot assigned oli-obk May 22, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 22, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 22, 2025

interpret: do not force_allocate all return places

6a9e189

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 22, 2025

RalfJung force-pushed the less-force-allocate branch from 2480c47 to 6a9e189 Compare May 23, 2025 05:55

oli-obk approved these changes May 26, 2025

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 26, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label May 26, 2025

bors merged commit b5eb989 into rust-lang:master May 26, 2025
7 checks passed

rustbot added this to the 1.89.0 milestone May 26, 2025

RalfJung deleted the less-force-allocate branch May 27, 2025 06:14

interpret: do not force_allocate all return places #141406

interpret: do not force_allocate all return places #141406

Uh oh!

Conversation

RalfJung commented May 22, 2025

Uh oh!

rustbot commented May 22, 2025

Uh oh!

RalfJung commented May 22, 2025

Uh oh!

This comment has been minimized.

bors commented May 22, 2025

Uh oh!

bors commented May 22, 2025

Uh oh!

This comment has been minimized.

RalfJung commented May 22, 2025

Uh oh!

rust-timer commented May 22, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oli-obk commented May 26, 2025

Uh oh!

bors commented May 26, 2025

Uh oh!

bors commented May 26, 2025

Uh oh!

bors commented May 26, 2025

Uh oh!

bors commented May 26, 2025

Uh oh!

Uh oh!

github-actions bot commented May 26, 2025

Test differences

Job duration changes

Uh oh!

rust-timer commented May 26, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

Uh oh!

RalfJung May 26, 2025 •

edited

Loading