Skip to content

libcore: assume the input of next_code_point and next_code_point_reverse is UTF-8-like #89611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 21, 2021
Merged

libcore: assume the input of next_code_point and next_code_point_reverse is UTF-8-like #89611

merged 1 commit into from
Nov 21, 2021

Conversation

eduardosm
Copy link
Contributor

The functions are now unsafe and they use Option::unwrap_unchecked instead of unwrap_or_0

unwrap_or_0 was added in 42357d7. I guess unwrap_unchecked was not available back then.

Given this example:

pub fn first_char(s: &str) -> Option<char> {
    s.chars().next()
}

Previously, the following assembly was produced:

_ZN7example10first_char17ha056ddea6bafad1cE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	edx, byte ptr [rdi]
	test	dl, dl
	js	.LBB0_3
	mov	eax, edx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	lea	r8, [rdi + rsi]
	xor	eax, eax
	mov	r9, r8
	cmp	rsi, 1
	je	.LBB0_5
	movzx	eax, byte ptr [rdi + 1]
	add	rdi, 2
	and	eax, 63
	mov	r9, rdi
.LBB0_5:
	mov	ecx, edx
	and	ecx, 31
	cmp	dl, -33
	jbe	.LBB0_6
	cmp	r9, r8
	je	.LBB0_9
	movzx	esi, byte ptr [r9]
	add	r9, 1
	and	esi, 63
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jb	.LBB0_12
.LBB0_13:
	cmp	r9, r8
	je	.LBB0_14
	movzx	edx, byte ptr [r9]
	and	edx, 63
	jmp	.LBB0_16
.LBB0_6:
	shl	ecx, 6
	or	eax, ecx
	ret
.LBB0_9:
	xor	esi, esi
	mov	r9, r8
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jae	.LBB0_13
.LBB0_12:
	shl	ecx, 12
	or	eax, ecx
	ret
.LBB0_14:
	xor	edx, edx
.LBB0_16:
	and	ecx, 7
	shl	ecx, 18
	shl	eax, 6
	or	eax, ecx
	or	eax, edx
	ret

After this change, the assembly is reduced to:

_ZN7example10first_char17h4318683472f884ccE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	ecx, byte ptr [rdi]
	test	cl, cl
	js	.LBB0_3
	mov	eax, ecx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	mov	eax, ecx
	and	eax, 31
	movzx	esi, byte ptr [rdi + 1]
	and	esi, 63
	cmp	cl, -33
	jbe	.LBB0_4
	movzx	edx, byte ptr [rdi + 2]
	shl	esi, 6
	and	edx, 63
	or	edx, esi
	cmp	cl, -16
	jb	.LBB0_7
	movzx	ecx, byte ptr [rdi + 3]
	and	eax, 7
	shl	eax, 18
	shl	edx, 6
	and	ecx, 63
	or	ecx, edx
	or	eax, ecx
	ret
.LBB0_4:
	shl	eax, 6
	or	eax, esi
	ret
.LBB0_7:
	shl	eax, 12
	or	eax, edx
	ret

@rust-highfive
Copy link
Contributor

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 6, 2021
@joshtriplett
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 9, 2021
@bors
Copy link
Collaborator

bors commented Oct 9, 2021

⌛ Trying commit 3b585ca3d460c8c097b37ff79f3858db7a24c779 with merge 8ed38a9b25c53b134180965d21731aa1cdfe8e25...

@bors
Copy link
Collaborator

bors commented Oct 9, 2021

☀️ Try build successful - checks-actions
Build commit: 8ed38a9b25c53b134180965d21731aa1cdfe8e25 (8ed38a9b25c53b134180965d21731aa1cdfe8e25)

@rust-timer
Copy link
Collaborator

Queued 8ed38a9b25c53b134180965d21731aa1cdfe8e25 with parent f875143, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8ed38a9b25c53b134180965d21731aa1cdfe8e25): comparison url.

Summary: This change led to small relevant improvements 🎉 in compiler performance.

  • Small improvement in instruction counts (up to -0.8% on full builds of encoding)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 9, 2021
@apiraino apiraino added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 14, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2021
@Mark-Simulacrum
Copy link
Member

r=me with the merge conflict resolved

…everse` is UTF-8-like

The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0`

`unwrap_or_0` was added in 42357d7. I guess `unwrap_unchecked` was not available back then.

Given this example:

```rust
pub fn first_char(s: &str) -> Option<char> {
    s.chars().next()
}
```

Previously, the following assembly was produced:

```asm
_ZN7example10first_char17ha056ddea6bafad1cE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	edx, byte ptr [rdi]
	test	dl, dl
	js	.LBB0_3
	mov	eax, edx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	lea	r8, [rdi + rsi]
	xor	eax, eax
	mov	r9, r8
	cmp	rsi, 1
	je	.LBB0_5
	movzx	eax, byte ptr [rdi + 1]
	add	rdi, 2
	and	eax, 63
	mov	r9, rdi
.LBB0_5:
	mov	ecx, edx
	and	ecx, 31
	cmp	dl, -33
	jbe	.LBB0_6
	cmp	r9, r8
	je	.LBB0_9
	movzx	esi, byte ptr [r9]
	add	r9, 1
	and	esi, 63
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jb	.LBB0_12
.LBB0_13:
	cmp	r9, r8
	je	.LBB0_14
	movzx	edx, byte ptr [r9]
	and	edx, 63
	jmp	.LBB0_16
.LBB0_6:
	shl	ecx, 6
	or	eax, ecx
	ret
.LBB0_9:
	xor	esi, esi
	mov	r9, r8
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jae	.LBB0_13
.LBB0_12:
	shl	ecx, 12
	or	eax, ecx
	ret
.LBB0_14:
	xor	edx, edx
.LBB0_16:
	and	ecx, 7
	shl	ecx, 18
	shl	eax, 6
	or	eax, ecx
	or	eax, edx
	ret
```

After this change, the assembly is reduced to:

```asm
_ZN7example10first_char17h4318683472f884ccE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	ecx, byte ptr [rdi]
	test	cl, cl
	js	.LBB0_3
	mov	eax, ecx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	mov	eax, ecx
	and	eax, 31
	movzx	esi, byte ptr [rdi + 1]
	and	esi, 63
	cmp	cl, -33
	jbe	.LBB0_4
	movzx	edx, byte ptr [rdi + 2]
	shl	esi, 6
	and	edx, 63
	or	edx, esi
	cmp	cl, -16
	jb	.LBB0_7
	movzx	ecx, byte ptr [rdi + 3]
	and	eax, 7
	shl	eax, 18
	shl	edx, 6
	and	ecx, 63
	or	ecx, edx
	or	eax, ecx
	ret
.LBB0_4:
	shl	eax, 6
	or	eax, esi
	ret
.LBB0_7:
	shl	eax, 12
	or	eax, edx
	ret
```
@eduardosm
Copy link
Contributor Author

Merge conflict fixed

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Collaborator

bors commented Nov 21, 2021

📌 Commit 23637e2 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2021
@bors
Copy link
Collaborator

bors commented Nov 21, 2021

⌛ Testing commit 23637e2 with merge 65f3f8b...

@bors
Copy link
Collaborator

bors commented Nov 21, 2021

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 65f3f8b to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 21, 2021
@bors bors merged commit 65f3f8b into rust-lang:master Nov 21, 2021
@rustbot rustbot added this to the 1.58.0 milestone Nov 21, 2021
@eduardosm eduardosm deleted the next_code_point branch November 21, 2021 21:52
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (65f3f8b): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@bluss
Copy link
Member

bluss commented Nov 22, 2021

unwrap_or_0 was added in 42357d7. I guess unwrap_unchecked was not available back then.

Historical notes: I'm sorry about contributing with such botched author settings. And back then we were more conservative about adding unsafe code and none was considered to be needed. Nice code cleanup, for sure. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.