Skip to content

CTFE interning: don't walk allocations that don't need it #97585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 2, 2022

Conversation

lqd
Copy link
Member

@lqd lqd commented May 31, 2022

The interning of const allocations visits the mplace looking for references to intern. Walking big aggregates like big static arrays can be costly, so we only do it if the allocation we're interning contains references or interior mutability.

Walking ZSTs was avoided before, and this optimization is now applied to cases where there are no references/relocations either.


While initially looking at this in the context of #93215, I've been testing with smaller allocations than the 16GB one in that issue, and with different init/uninit patterns (esp. via padding).

In that example, by default, eval_to_allocation_raw is the heaviest query followed by incr_comp_serialize_result_cache. So I'll show numbers when incremental compilation is disabled, to focus on the const allocations themselves at 95% of the compilation time, at bigger array sizes on these minimal examples like static ARRAY: [u64; LEN] = [0; LEN];.

That is a close construction to parts of the ctfe-stress-test-5 benchmark, which has const allocations in the megabytes, while most crates usually have way smaller ones. This PR will have the most impact in these situations, as the walk during the interning starts to dominate the runtime.

Unicode crates (some of which are present in our benchmarks) like ucd, encoding_rs, etc come to mind as having bigger than usual allocations as well, because of big tables of code points (in the hundreds of KB, so still an order of magnitude or 2 less than the stress test).

In a check build, for a single static array shown above, from 100 to 10^9 u64s (for lengths in powers of ten), the constant factors are lowered:

(log scales for easier comparisons)
plot_log

(linear scale for absolute diff at higher Ns)
plot_linear

For one of the alternatives of that issue

const ROWS: usize = 100_000;
const COLS: usize = 10_000;

static TWODARRAY: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];

we can see a similar reduction of around 3x (from 38s to 12s or so).

For the same size, the slowest case IIRC is when there are uninitialized bytes e.g. via padding

const ROWS: usize = 100_000;
const COLS: usize = 10_000;

static TWODARRAY: [[(u64, u8); COLS]; ROWS] = [[(0, 0); COLS]; ROWS];

then interning/walking does not dominate anymore (but means there is likely still some interesting work left to do here).

Compile times in this case rise up quite a bit, and avoiding interning walks has less impact: around 23%, from 730s on master to 568s with this PR.

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 31, 2022
@rust-highfive
Copy link
Contributor

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

@rust-highfive
Copy link
Contributor

r? @estebank

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2022
@lqd
Copy link
Member Author

lqd commented May 31, 2022

r? @ghost

@lqd
Copy link
Member Author

lqd commented May 31, 2022

Opening as draft as I still need to ensure test coverage is sufficient, and add some if necessary. It worked locally in my small scale tests.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@bors
Copy link
Collaborator

bors commented May 31, 2022

⌛ Trying commit 6c492abbbc94e85c545e996922b0275b4580265f with merge 0aa3f5e10899462a2f253e01eb637fa40eee7095...

@bors
Copy link
Collaborator

bors commented May 31, 2022

☀️ Try build successful - checks-actions
Build commit: 0aa3f5e10899462a2f253e01eb637fa40eee7095 (0aa3f5e10899462a2f253e01eb637fa40eee7095)

@rust-timer
Copy link
Collaborator

Queued 0aa3f5e10899462a2f253e01eb637fa40eee7095 with parent 16a0d03, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0aa3f5e10899462a2f253e01eb637fa40eee7095): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.8% -9.9% 26
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.7% 3.7% 1
Improvements 🎉
(primary)
-2.3% -2.3% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -2.3% -2.3% 1

Cycles

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.7% 2.7% 1
Regressions 😿
(secondary)
2.4% 2.7% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-5.4% -6.1% 4
All 😿🎉 (primary) 2.7% 2.7% 1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@lqd lqd force-pushed the const-alloc-intern branch 2 times, most recently from f585e27 to 40172c5 Compare May 31, 2022 21:58
@oli-obk
Copy link
Contributor

oli-obk commented Jun 1, 2022

cc @RalfJung this PR does a work-skipping in the interner that is mostly copied from validation

@lqd lqd changed the title [DO NOT MERGE] CTFE interning: don't walk allocations that don't need it CTFE interning: don't walk allocations that don't need it Jun 1, 2022
@lqd
Copy link
Member Author

lqd commented Jun 1, 2022

I've updated the PR's description with numbers and examples of the cases I looked at. I don't believe (and oli seems to agree) interning itself is easily testable beyond the existing tests of the CTFE, smoke tests crates, and bootstrapping.

So I'll mark this as ready to review, and r? @oli-obk or @RalfJung.

@lqd lqd marked this pull request as ready for review June 1, 2022 15:30
@oli-obk
Copy link
Contributor

oli-obk commented Jun 1, 2022

r? @RalfJung as I worked with lqd directly on the impl

@rust-highfive rust-highfive assigned RalfJung and unassigned oli-obk Jun 1, 2022
@RalfJung
Copy link
Member

RalfJung commented Jun 2, 2022

While initially looking at this in the context of #93215, I've been testing with smaller allocations than the 16GB one in that issue, and with .

That sentence in the PR description just ends mid-way?

@rustbot
Copy link
Collaborator

rustbot commented Jun 28, 2022

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@lqd
Copy link
Member Author

lqd commented Jun 28, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 28, 2022
@bors
Copy link
Collaborator

bors commented Jun 28, 2022

⌛ Trying commit 2344e664aae22c0a3c29e4caa26fa51db4e1a6a4 with merge f983b52da2d21c5393097bc352068af1ebe07bab...

@bors
Copy link
Collaborator

bors commented Jun 28, 2022

☀️ Try build successful - checks-actions
Build commit: f983b52da2d21c5393097bc352068af1ebe07bab (f983b52da2d21c5393097bc352068af1ebe07bab)

@rust-timer
Copy link
Collaborator

Queued f983b52da2d21c5393097bc352068af1ebe07bab with parent 94e9374, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f983b52da2d21c5393097bc352068af1ebe07bab): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
0.4% 0.5% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-3.0% -8.9% 21
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.5% 2.8% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-3.5% -3.6% 2
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.7% 2.7% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-6.7% -7.9% 7
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 28, 2022
@lqd lqd force-pushed the const-alloc-intern branch from 2344e66 to d634f14 Compare June 29, 2022 00:06
@lqd
Copy link
Member Author

lqd commented Jun 29, 2022

Alright, the diesel regression seems like it was noise indeed. I won't need to do other perf runs to tackle it.

Apart from possibly some more wording changes you'd want to see, this PR looks ready to me.

@RalfJung
Copy link
Member

RalfJung commented Jun 29, 2022

There's still a .5% regression in "deeply-nested-multi", whatever that is about. But the speedups are very nice. :)

@RalfJung
Copy link
Member

Comments look good to me. :)

@lqd
Copy link
Member Author

lqd commented Jun 29, 2022

There's still a .5% regression in "deeply-nested-multi", whatever that is about.

Yeah I saw that. That benchmark looks a bit noisy as of late:

image

The regression wasn't present in the previous runs of this PR, and when that benchmark did appear, it was a -1% win, likely noise again.

Coupled to the fact that this is a stress test "secondary benchmark", the perfbot does not even count this as a noteworthy regression, so I won't either 🤡.

But the speedups are very nice. :)

credit to @oli-obk who masterminded the fix.

@lqd
Copy link
Member Author

lqd commented Jul 2, 2022

@RalfJung is there anything more you'd like me to do in this PR ? Or is it good enough for a r+ ? :)

@RalfJung
Copy link
Member

RalfJung commented Jul 2, 2022

Sorry, I didn't realize I am the assigned reviewer.^^
Looks like we are happy with the perf, so
@bors r+

@bors
Copy link
Collaborator

bors commented Jul 2, 2022

📌 Commit d634f14 has been approved by RalfJung

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 2, 2022
@bors
Copy link
Collaborator

bors commented Jul 2, 2022

⌛ Testing commit d634f14 with merge 750d6f8...

@bors
Copy link
Collaborator

bors commented Jul 2, 2022

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 750d6f8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 2, 2022
@bors bors merged commit 750d6f8 into rust-lang:master Jul 2, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jul 2, 2022
@lqd lqd deleted the const-alloc-intern branch July 2, 2022 21:18
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (750d6f8): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-1.0% -1.0% 2
Improvements 🎉
(secondary)
-2.5% -9.3% 32
All 😿🎉 (primary) -1.0% -1.0% 2

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
1.1% 1.1% 1
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-2.8% -2.8% 1
Improvements 🎉
(secondary)
-2.0% -2.4% 2
All 😿🎉 (primary) -0.9% -2.8% 2

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.0% 4.1% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-6.1% -7.6% 7
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants