Add a short-circuiting path to slice comparison #113576

krtab · 2023-07-11T15:16:05Z

This adds a short-circuiting path to slice comparison using the fact that two slices of the same length and pointing at the same address are equal.

Ideally this would be implemented for both:

comparisons where A and B can be compared byte-wise (done)
~~comparisons where A is B (cannot be done currently due to limitation in trait specialization)~~ edit: this is false, see float for example

rustbot · 2023-07-11T15:16:14Z

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

krtab · 2023-07-11T15:17:17Z

Given that this PR is made in hope of improving performance, I guess performance benchmarks would be appreciable, but I don't think I can start those myself.

workingjubilee · 2023-07-11T15:32:05Z

@bors try
@rust-timer queue

bors · 2023-07-11T15:32:14Z

⌛ Trying commit 1616ffd67536e6201f8cda5fce4da6441fa750d6 with merge 8fcfeecab188ad3fc76fa0e22b6015e3007d8caa...

bors · 2023-07-11T16:48:20Z

☀️ Try build successful - checks-actions
Build commit: 8fcfeecab188ad3fc76fa0e22b6015e3007d8caa (8fcfeecab188ad3fc76fa0e22b6015e3007d8caa)

rust-timer · 2023-07-12T01:59:45Z

Finished benchmarking commit (8fcfeecab188ad3fc76fa0e22b6015e3007d8caa): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.4%, 1.1%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-0.6%, -0.6%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[-0.6%, 1.1%]	7

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.8%	[3.8%, 3.8%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.0%	[-3.0%, -3.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[-3.0%, 3.8%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[0.9%, 1.2%]	4
Regressions ❌ (secondary)	2.3%	[1.0%, 3.5%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.1%	[0.9%, 1.2%]	4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.2%]	16
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.1%, 0.2%]	17

Bootstrap: 656.645s -> 656.898s (0.04%)

workingjubilee · 2023-07-12T03:15:49Z

It's counterintuitive as hell, but I can't say I'm that surprised.

"You're comparing something with itself, that's gonna be true" is an easy evaluation for a compiler, but harder to optimize out for the "you're not comparing something with itself" case (since then the problem is usually greater uncertainty overall), so it's dead instructions on all the paths that actually need to run the full memcmp.

krtab · 2023-07-12T14:35:55Z

Yeah this is indeed not so surprising as comparing the exact same slice is pretty rare.

I've force-pushed a new version that generates branch-less assembly using conditional moves, which may be better, but I have little hope.

This adds a short-circuiting path to slice comparison using the fact that two slices of the same length and pointing at the same address are equal.

krtab · 2023-07-12T14:38:08Z

I'll try to see if it let me do it myself, otherwise I'll need you to do it :)

@bors try
@rust-timer queue

edit: It did not.

bors · 2023-07-12T14:38:10Z

@krtab: 🔑 Insufficient privileges: not in try users

lqd · 2023-07-12T15:27:41Z

@bors try @rust-timer queue

bors · 2023-07-12T15:27:51Z

⌛ Trying commit b0ca5ff with merge ef9db8293d0b16a1e6a3d43fbee50ecbeb0b4290...

bors · 2023-07-12T16:44:11Z

☀️ Try build successful - checks-actions
Build commit: ef9db8293d0b16a1e6a3d43fbee50ecbeb0b4290 (ef9db8293d0b16a1e6a3d43fbee50ecbeb0b4290)

rust-timer · 2023-07-12T22:17:49Z

Finished benchmarking commit (ef9db8293d0b16a1e6a3d43fbee50ecbeb0b4290): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.4%, 0.6%]	8
Regressions ❌ (secondary)	0.6%	[0.5%, 0.6%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.4%, 0.6%]	8

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.2%	[4.2%, 4.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.9%, 0.9%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.9%	[0.9%, 0.9%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.1%, 0.3%]	16
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.1%, 0.3%]	17

Bootstrap: 657.774s -> 658.13s (0.05%)

krtab · 2023-07-13T12:17:56Z

Well I guess that's all I could try. I'm not sure how representative of "the typical workload" these benchmarks are but they are not that surprising either so feel free to close this if you don't think this is mergeable as is @workingjubilee, and thanks for the review.

workingjubilee · 2023-07-14T00:16:26Z

rustc's workload is heavily dominated by "branchy" code and it does exercise a lot of Vecs, but it doesn't do as much work on slice comparison I think. On the other hand it doesn't not: this is not like a floating-point benchmark where the compiler is completely useless. So while we may want to reexamine this question when we get a better perf suite which has more actual programs to test, instead of just the compiler's own benchmarks, it's a pretty bad sign without further motivation. So yeah, closing.

rustbot assigned m-ou-se Jul 11, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 11, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 11, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 12, 2023

m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 12, 2023

krtab force-pushed the addr_slice_comp branch from 1616ffd to b0ca5ff Compare July 12, 2023 14:34

Add a short-circuiting path to slice comparison

b0ca5ff

This adds a short-circuiting path to slice comparison using the fact that two slices of the same length and pointing at the same address are equal.

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 12, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 12, 2023

krtab mentioned this pull request Jul 13, 2023

Non memcmp slice comparison optimization #113654

Closed

workingjubilee closed this Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a short-circuiting path to slice comparison #113576

Add a short-circuiting path to slice comparison #113576

krtab commented Jul 11, 2023 •

edited

Loading

rustbot commented Jul 11, 2023

krtab commented Jul 11, 2023 •

edited

Loading

workingjubilee commented Jul 11, 2023

This comment has been minimized.

bors commented Jul 11, 2023

bors commented Jul 11, 2023

This comment has been minimized.

rust-timer commented Jul 12, 2023

workingjubilee commented Jul 12, 2023

krtab commented Jul 12, 2023 •

edited

Loading

krtab commented Jul 12, 2023 •

edited

Loading

bors commented Jul 12, 2023

This comment has been minimized.

This comment has been minimized.

lqd commented Jul 12, 2023

This comment has been minimized.

bors commented Jul 12, 2023

bors commented Jul 12, 2023

This comment has been minimized.

rust-timer commented Jul 12, 2023

krtab commented Jul 13, 2023

workingjubilee commented Jul 14, 2023

Add a short-circuiting path to slice comparison #113576

Add a short-circuiting path to slice comparison #113576

Conversation

krtab commented Jul 11, 2023 • edited Loading

rustbot commented Jul 11, 2023

krtab commented Jul 11, 2023 • edited Loading

workingjubilee commented Jul 11, 2023

This comment has been minimized.

bors commented Jul 11, 2023

bors commented Jul 11, 2023

This comment has been minimized.

rust-timer commented Jul 12, 2023

Overall result: ❌ regressions - ACTION NEEDED

workingjubilee commented Jul 12, 2023

krtab commented Jul 12, 2023 • edited Loading

krtab commented Jul 12, 2023 • edited Loading

bors commented Jul 12, 2023

This comment has been minimized.

This comment has been minimized.

lqd commented Jul 12, 2023

This comment has been minimized.

bors commented Jul 12, 2023

bors commented Jul 12, 2023

This comment has been minimized.

rust-timer commented Jul 12, 2023

Overall result: ❌ regressions - ACTION NEEDED

krtab commented Jul 13, 2023

workingjubilee commented Jul 14, 2023

krtab commented Jul 11, 2023 •

edited

Loading

krtab commented Jul 11, 2023 •

edited

Loading

krtab commented Jul 12, 2023 •

edited

Loading

krtab commented Jul 12, 2023 •

edited

Loading