Skip to content

Compile unicode-normalization faster #97936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nnethercote
Copy link
Contributor

Various optimizations and cleanups aimed at improving compilation of unicode-normalization, which is notable for having several very large matches with many char ranges.

Best reviewed one commit at a time.

r? @oli-obk

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 10, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 10, 2022
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2022
@bors
Copy link
Collaborator

bors commented Jun 10, 2022

⌛ Trying commit ca5e9908fdecd0a6f37dccb54fb0b3e8c416982b with merge dfb5ac9ecb0814d8c8d7f4ea4c0ee061170bdf97...

@bors
Copy link
Collaborator

bors commented Jun 10, 2022

☀️ Try build successful - checks-actions
Build commit: dfb5ac9ecb0814d8c8d7f4ea4c0ee061170bdf97 (dfb5ac9ecb0814d8c8d7f4ea4c0ee061170bdf97)

@rust-timer
Copy link
Collaborator

Queued dfb5ac9ecb0814d8c8d7f4ea4c0ee061170bdf97 with parent 420c970, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dfb5ac9ecb0814d8c8d7f4ea4c0ee061170bdf97): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: no relevant changes found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-14.3% -20.0% 6
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -14.3% -20.0% 6

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.2% 3.2% 1
Improvements 🎉
(primary)
-3.7% -3.7% 1
Improvements 🎉
(secondary)
-3.0% -3.4% 2
All 😿🎉 (primary) -3.7% -3.7% 1

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
2.9% 2.9% 1
Regressions 😿
(secondary)
2.6% 3.1% 3
Improvements 🎉
(primary)
-10.4% -15.3% 6
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -8.5% -15.3% 7

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2022

// This code is hot when compiling `unicode-normalization` because it has a
// number of matches with many ranges such as '\u{037A}'..='\u{037F}'. So
// we special-case comparisons of chars in the relevant form for speed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike special casing chars just for one benchmark. I think we should generally improve the logic inside the eval_bits and related functions to have a fast path for already evaluated constants.

For something that is actionable in this PR, please just replace this specific type match with one that performs this fast path for everything but floats and signed ints (or possibly pull the large type match below into a separate inline-always function and call it whenever both constants are already evaluated, irrespective of the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a fast path like that is tricky, because there are so many layers to these types, and because of the size checking and layout computations that takes place.

The fast path I added to this function hard to integrate with the rest of the function, because the fast path comparison is on ScalarInt values, while the general case comparison is on u128 values. Extracting those u128 values in the fast case would require extra size checks and layout computations, eating into the speed wins. But I have generalized it to work with any scalar type other than Float or Int.

This is a performance win for `unicode-normalization`.

Also, I find the new formulation easier to read.
This is a performance win for `unicode-normalization`.

The commit also removes the closure, which isn't necessary. And
reformulates the comparison into a form I find easier to read.
Also, the `try_to_bits` always succeeds, so use `unwrap`.
It's never executed when running the entire test suite. I think it's
because of the early return at the top of the function if `a.ty() != ty`
succeeds.
A direct comparison has the same effect. This also avoids the need for a
type test within `compare_const_vals`.
The code is clearer and simpler without it. Note that the `a == b` early
return at the top of the function means the `a == b` test at the end of
the function could never succeed.
It's now only used in no-longer-interesting assertion.
Because these evaluations can never fail.
@nnethercote nnethercote force-pushed the compile-unicode_normalization-faster branch from ca5e990 to 32b011a Compare June 16, 2022 01:25
This commit removes the `a == b` early return, which isn't useful in
practice, and replaces it with one that helps matches with many ranges,
including char ranges.
@nnethercote nnethercote force-pushed the compile-unicode_normalization-faster branch from 32b011a to bdbf9b2 Compare June 16, 2022 01:26
@nnethercote
Copy link
Contributor Author

I have updated the final commit as requested.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 16, 2022

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 16, 2022

📌 Commit bdbf9b2 has been approved by oli-obk

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 16, 2022
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jun 16, 2022
@bors
Copy link
Collaborator

bors commented Jun 16, 2022

⌛ Testing commit bdbf9b2 with merge cacc75c...

@bors
Copy link
Collaborator

bors commented Jun 16, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing cacc75c to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 16, 2022
@bors bors merged commit cacc75c into rust-lang:master Jun 16, 2022
@rustbot rustbot added this to the 1.63.0 milestone Jun 16, 2022
@nnethercote nnethercote deleted the compile-unicode_normalization-faster branch June 17, 2022 01:49
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (cacc75c): comparison url.

Instruction count

  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
1.0% 1.0% 1
Improvements 🎉
(primary)
-14.2% -19.4% 6
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -14.2% -19.4% 6

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
2.3% 2.3% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: 🎉 relevant improvements found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
-9.1% -14.4% 7
Improvements 🎉
(secondary)
-1.9% -1.9% 1
All 😿🎉 (primary) -9.1% -14.4% 7

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@nnethercote
Copy link
Contributor Author

@lqd and I were talking just today about how debug builds of coercions have become very noisy. The change there is not meaningful, and wasn't present in the earlier perf CI run.

@rustbot perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants