CTFE interning: don't walk allocations that don't need it #97585

lqd · 2022-05-31T15:36:30Z

The interning of const allocations visits the mplace looking for references to intern. Walking big aggregates like big static arrays can be costly, so we only do it if the allocation we're interning contains references or interior mutability.

Walking ZSTs was avoided before, and this optimization is now applied to cases where there are no references/relocations either.

While initially looking at this in the context of #93215, I've been testing with smaller allocations than the 16GB one in that issue, and with different init/uninit patterns (esp. via padding).

In that example, by default, eval_to_allocation_raw is the heaviest query followed by incr_comp_serialize_result_cache. So I'll show numbers when incremental compilation is disabled, to focus on the const allocations themselves at 95% of the compilation time, at bigger array sizes on these minimal examples like static ARRAY: [u64; LEN] = [0; LEN];.

That is a close construction to parts of the ctfe-stress-test-5 benchmark, which has const allocations in the megabytes, while most crates usually have way smaller ones. This PR will have the most impact in these situations, as the walk during the interning starts to dominate the runtime.

Unicode crates (some of which are present in our benchmarks) like ucd, encoding_rs, etc come to mind as having bigger than usual allocations as well, because of big tables of code points (in the hundreds of KB, so still an order of magnitude or 2 less than the stress test).

In a check build, for a single static array shown above, from 100 to 10^9 u64s (for lengths in powers of ten), the constant factors are lowered:

(log scales for easier comparisons)

(linear scale for absolute diff at higher Ns)

For one of the alternatives of that issue

const ROWS: usize = 100_000;
const COLS: usize = 10_000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

we can see a similar reduction of around 3x (from 38s to 12s or so).

For the same size, the slowest case IIRC is when there are uninitialized bytes e.g. via padding

const ROWS: usize = 100_000;
const COLS: usize = 10_000;

static TWODARRAY: [[(u64, u8); COLS]; ROWS] = [[(0, 0); COLS]; ROWS];

then interning/walking does not dominate anymore (but means there is likely still some interesting work left to do here).

Compile times in this case rise up quite a bit, and avoiding interning walks has less impact: around 23%, from 730s on master to 568s with this PR.

rust-highfive · 2022-05-31T15:36:33Z

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

rust-highfive · 2022-05-31T15:36:34Z

r? @estebank

(rust-highfive has picked a reviewer for you, use r? to override)

lqd · 2022-05-31T15:36:49Z

r? @ghost

compiler/rustc_const_eval/src/interpret/memory.rs

lqd · 2022-05-31T15:39:22Z

Opening as draft as I still need to ensure test coverage is sufficient, and add some if necessary. It worked locally in my small scale tests.

@bors try @rust-timer queue

rust-timer · 2022-05-31T15:39:23Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-05-31T15:39:30Z

⌛ Trying commit 6c492abbbc94e85c545e996922b0275b4580265f with merge 0aa3f5e10899462a2f253e01eb637fa40eee7095...

compiler/rustc_const_eval/src/interpret/memory.rs

bors · 2022-05-31T17:10:56Z

☀️ Try build successful - checks-actions
Build commit: 0aa3f5e10899462a2f253e01eb637fa40eee7095 (0aa3f5e10899462a2f253e01eb637fa40eee7095)

rust-timer · 2022-05-31T17:10:57Z

Queued 0aa3f5e10899462a2f253e01eb637fa40eee7095 with parent 16a0d03, future comparison URL.

rust-timer · 2022-05-31T21:28:11Z

Finished benchmarking commit (0aa3f5e10899462a2f253e01eb637fa40eee7095): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.8%	-9.9%	26
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: 😿 relevant regression found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.7%	3.7%	1
Improvements 🎉 (primary)	-2.3%	-2.3%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-2.3%	-2.3%	1

Cycles

Results

Primary benchmarks: 😿 relevant regression found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.7%	2.7%	1
Regressions 😿 (secondary)	2.4%	2.7%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-5.4%	-6.1%	4
All 😿🎉 (primary)	2.7%	2.7%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

oli-obk · 2022-06-01T07:27:21Z

cc @RalfJung this PR does a work-skipping in the interner that is mostly copied from validation

lqd · 2022-06-01T15:30:46Z

I've updated the PR's description with numbers and examples of the cases I looked at. I don't believe (and oli seems to agree) interning itself is easily testable beyond the existing tests of the CTFE, smoke tests crates, and bootstrapping.

So I'll mark this as ready to review, and r? @oli-obk or @RalfJung.

oli-obk · 2022-06-01T15:40:06Z

r? @RalfJung as I worked with lqd directly on the impl

RalfJung · 2022-06-02T19:03:22Z

While initially looking at this in the context of #93215, I've been testing with smaller allocations than the 16GB one in that issue, and with .

That sentence in the PR description just ends mid-way?

compiler/rustc_const_eval/src/interpret/memory.rs

rustbot · 2022-06-28T20:52:26Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

lqd · 2022-06-28T20:53:49Z

@bors try @rust-timer queue

rust-timer · 2022-06-28T20:53:51Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-28T20:53:58Z

⌛ Trying commit 2344e664aae22c0a3c29e4caa26fa51db4e1a6a4 with merge f983b52da2d21c5393097bc352068af1ebe07bab...

bors · 2022-06-28T22:21:50Z

☀️ Try build successful - checks-actions
Build commit: f983b52da2d21c5393097bc352068af1ebe07bab (f983b52da2d21c5393097bc352068af1ebe07bab)

rust-timer · 2022-06-28T22:21:52Z

Queued f983b52da2d21c5393097bc352068af1ebe07bab with parent 94e9374, future comparison URL.

rust-timer · 2022-06-28T23:59:23Z

Finished benchmarking commit (f983b52da2d21c5393097bc352068af1ebe07bab): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.4%	0.5%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-3.0%	-8.9%	21
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.5%	2.8%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-3.5%	-3.6%	2
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.7%	2.7%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-6.7%	-7.9%	7
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

lqd · 2022-06-29T00:10:10Z

Alright, the diesel regression seems like it was noise indeed. I won't need to do other perf runs to tackle it.

Apart from possibly some more wording changes you'd want to see, this PR looks ready to me.

RalfJung · 2022-06-29T00:17:24Z

There's still a .5% regression in "deeply-nested-multi", whatever that is about. But the speedups are very nice. :)

RalfJung · 2022-06-29T00:18:32Z

Comments look good to me. :)

lqd · 2022-06-29T00:29:06Z

There's still a .5% regression in "deeply-nested-multi", whatever that is about.

Yeah I saw that. That benchmark looks a bit noisy as of late:

The regression wasn't present in the previous runs of this PR, and when that benchmark did appear, it was a -1% win, likely noise again.

Coupled to the fact that this is a stress test "secondary benchmark", the perfbot does not even count this as a noteworthy regression, so I won't either 🤡.

But the speedups are very nice. :)

credit to @oli-obk who masterminded the fix.

lqd · 2022-07-02T06:32:08Z

@RalfJung is there anything more you'd like me to do in this PR ? Or is it good enough for a r+ ? :)

RalfJung · 2022-07-02T13:47:40Z

Sorry, I didn't realize I am the assigned reviewer.^^
Looks like we are happy with the perf, so
@bors r+

bors · 2022-07-02T13:47:42Z

📌 Commit d634f14 has been approved by RalfJung

bors · 2022-07-02T17:05:16Z

⌛ Testing commit d634f14 with merge 750d6f8...

bors · 2022-07-02T19:45:55Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 750d6f8 to master...

rust-timer · 2022-07-03T02:18:53Z

Finished benchmarking commit (750d6f8): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-1.0%	-1.0%	2
Improvements 🎉 (secondary)	-2.5%	-9.3%	32
All 😿🎉 (primary)	-1.0%	-1.0%	2

Max RSS (memory usage)

Results

Primary benchmarks: mixed results
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	1.1%	1.1%	1
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-2.8%	-2.8%	1
Improvements 🎉 (secondary)	-2.0%	-2.4%	2
All 😿🎉 (primary)	-0.9%	-2.8%	2

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.0%	4.1%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-6.1%	-7.6%	7
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 31, 2022

rust-highfive assigned estebank May 31, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2022

lqd unassigned estebank May 31, 2022

oli-obk reviewed May 31, 2022

View reviewed changes

compiler/rustc_const_eval/src/interpret/memory.rs Outdated Show resolved Hide resolved

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022

oli-obk reviewed May 31, 2022

View reviewed changes

compiler/rustc_const_eval/src/interpret/memory.rs Outdated Show resolved Hide resolved

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022

lqd force-pushed the const-alloc-intern branch 2 times, most recently from f585e27 to 40172c5 Compare May 31, 2022 21:58

lqd changed the title ~~[DO NOT MERGE] CTFE interning: don't walk allocations that don't need it~~ CTFE interning: don't walk allocations that don't need it Jun 1, 2022

rust-highfive assigned oli-obk Jun 1, 2022

lqd marked this pull request as ready for review June 1, 2022 15:30

rust-highfive assigned RalfJung and unassigned oli-obk Jun 1, 2022

RalfJung reviewed Jun 2, 2022

View reviewed changes

compiler/rustc_const_eval/src/interpret/memory.rs Outdated Show resolved Hide resolved

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 28, 2022

rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 28, 2022

lqd added 2 commits June 29, 2022 02:05

fix comments

6d03c8d

avoid walk when get_ptr_alloc returns no AllocRef

d634f14

lqd force-pushed the const-alloc-intern branch from 2344e66 to d634f14 Compare June 29, 2022 00:06

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 2, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 2, 2022

bors merged commit 750d6f8 into rust-lang:master Jul 2, 2022

rustbot added this to the 1.64.0 milestone Jul 2, 2022

lqd deleted the const-alloc-intern branch July 2, 2022 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTFE interning: don't walk allocations that don't need it #97585

CTFE interning: don't walk allocations that don't need it #97585

lqd commented May 31, 2022 •

edited

Loading

rust-highfive commented May 31, 2022

rust-highfive commented May 31, 2022

lqd commented May 31, 2022

lqd commented May 31, 2022

rust-timer commented May 31, 2022

bors commented May 31, 2022

bors commented May 31, 2022

rust-timer commented May 31, 2022

rust-timer commented May 31, 2022

oli-obk commented Jun 1, 2022

lqd commented Jun 1, 2022

oli-obk commented Jun 1, 2022

RalfJung commented Jun 2, 2022

rustbot commented Jun 28, 2022

lqd commented Jun 28, 2022

rust-timer commented Jun 28, 2022

bors commented Jun 28, 2022

bors commented Jun 28, 2022

rust-timer commented Jun 28, 2022

rust-timer commented Jun 28, 2022

lqd commented Jun 29, 2022

RalfJung commented Jun 29, 2022 •

edited

Loading

RalfJung commented Jun 29, 2022

lqd commented Jun 29, 2022 •

edited

Loading

lqd commented Jul 2, 2022

RalfJung commented Jul 2, 2022

bors commented Jul 2, 2022

bors commented Jul 2, 2022

bors commented Jul 2, 2022

rust-timer commented Jul 3, 2022

CTFE interning: don't walk allocations that don't need it #97585

CTFE interning: don't walk allocations that don't need it #97585

Conversation

lqd commented May 31, 2022 • edited Loading

rust-highfive commented May 31, 2022

rust-highfive commented May 31, 2022

lqd commented May 31, 2022

lqd commented May 31, 2022

rust-timer commented May 31, 2022

bors commented May 31, 2022

bors commented May 31, 2022

rust-timer commented May 31, 2022

rust-timer commented May 31, 2022

Footnotes

oli-obk commented Jun 1, 2022

lqd commented Jun 1, 2022

oli-obk commented Jun 1, 2022

RalfJung commented Jun 2, 2022

rustbot commented Jun 28, 2022

lqd commented Jun 28, 2022

rust-timer commented Jun 28, 2022

bors commented Jun 28, 2022

bors commented Jun 28, 2022

rust-timer commented Jun 28, 2022

rust-timer commented Jun 28, 2022

Footnotes

lqd commented Jun 29, 2022

RalfJung commented Jun 29, 2022 • edited Loading

RalfJung commented Jun 29, 2022

lqd commented Jun 29, 2022 • edited Loading

lqd commented Jul 2, 2022

RalfJung commented Jul 2, 2022

bors commented Jul 2, 2022

bors commented Jul 2, 2022

bors commented Jul 2, 2022

rust-timer commented Jul 3, 2022

Footnotes

lqd commented May 31, 2022 •

edited

Loading

RalfJung commented Jun 29, 2022 •

edited

Loading

lqd commented Jun 29, 2022 •

edited

Loading