Rust 1.25.0 regressed the performance of encoding_rs's UTF-8 validation on i686

When Firefox switched from Rust 1.24.0 to Rust 1.25.0, the win32 performance of encoding_rs's UTF-8 validation function [dropped 12.5%](https://bugzilla.mozilla.org/show_bug.cgi?id=1451703) when used on ASCII input. encoding_rs's UTF-8 validation function is a fork of the Rust standard library validation function that replaces the ASCII acceleration ALU trick that autovectorizes on x86_64 but not on i686 and works only  in the aligned case with explicit SIMD code that deals with both the aligned and unaligned cases.

When the input is all ASCII, the function should stay in either the aligned-case or the unaligned-case inner loop that loads 16 bytes using `movdqa` or `movdqu`, respectively, performs `pmovmskb` on the xmm register and compares the result to zero jumping back to the start of the loop if it is zero.

When compiled for i686 Linux with opt level 2 (which Firefox uses) using Rust 1.24.0, the result is exactly as expected.

Unaligned:
```asm
.LBB12_3:
	movdqu	(%edx,%eax), %xmm0
	pmovmskb	%xmm0, %ebp
	testl	%ebp, %ebp
	jne	.LBB12_9
	addl	$16, %eax
	cmpl	%ebx, %eax
	jbe	.LBB12_3
	jmp	.LBB12_5
	.p2align	4, 0x90
```

Aligned:
```asm
.LBB12_7:
	movdqa	(%edx,%eax), %xmm0
	pmovmskb	%xmm0, %ebp
	testl	%ebp, %ebp
	jne	.LBB12_9
	addl	$16, %eax
	cmpl	%ebx, %eax
	jbe	.LBB12_7
	.p2align	4, 0x90
```

(Windows wouldn't let me see the asm  due to LLVM deeming the IR invalid with `--emit asm`.)

When compiled with Rust 1.25.0,  the result is more complicated:

 1. There are two instances of `movdqa` and two instances of `movdqu` suggesting that  the first trip through the loop has been unrolled to be a separate copy from the loop proper.
 2. In the actual loop, ALU instructions have been moved around including placing one between the SSE2 instructions.

Both of these transformations look like plausible optimizations, but considering the performance result from Firefox CI, it seems these transformations made performance worse.

```asm
.LBB16_1:
	movl	%edx, %ebp
	leal	(%ecx,%edi), %ebx
	movl	$0, %esi
	subl	%edi, %ebp
	cmpl	$16, %ebp
	jb	.LBB16_22
	leal	-16(%ebp), %eax
	testb	$15, %bl
	movl	%eax, 20(%esp)
	je	.LBB16_9
	movdqu	(%ebx), %xmm0
	movl	%edx, 12(%esp)
	xorl	%eax, %eax
	pmovmskb	%xmm0, %edx
	testl	%edx, %edx
	jne	.LBB16_7
	movl	24(%esp), %eax
	xorl	%esi, %esi
	leal	(%eax,%edi), %ecx
	.p2align	4, 0x90
.LBB16_5:
	leal	16(%esi), %eax
	cmpl	20(%esp), %eax
	ja	.LBB16_20
	movdqu	(%ecx,%esi), %xmm0
	movl	%eax, %esi
	pmovmskb	%xmm0, %edx
	testl	%edx, %edx
	je	.LBB16_5
.LBB16_7:
	testl	%edx, %edx
	je	.LBB16_12
	bsfl	%edx, %esi
	jmp	.LBB16_13
.LBB16_9:
	movdqa	(%ebx), %xmm0
	xorl	%ecx, %ecx
	pmovmskb	%xmm0, %eax
	testl	%eax, %eax
	je	.LBB16_15
	testl	%eax, %eax
	je	.LBB16_19
.LBB16_11:
	bsfl	%eax, %esi
	addl	%ecx, %esi
	jmp	.LBB16_14
.LBB16_12:
	movl	$32, %esi
.LBB16_13:
	movl	12(%esp), %edx
	addl	%eax, %esi
.LBB16_14:
	movb	(%ebx,%esi), %al
	jmp	.LBB16_24
.LBB16_15:
	movl	$16, %esi
	.p2align	4, 0x90
.LBB16_16:
	cmpl	20(%esp), %esi
	ja	.LBB16_22
	movdqa	(%ebx,%esi), %xmm0
	addl	$16, %esi
	pmovmskb	%xmm0, %eax
	testl	%eax, %eax
	je	.LBB16_16
	addl	$-16, %esi
	movl	%esi, %ecx
	testl	%eax, %eax
	jne	.LBB16_11
```

The asm was obtained by compiling [encoding_rs](https://github.com/hsivonen/encoding_rs) (Firefox uses 0.7.2) using `RUSTC_BOOTSTRAP=1 RUSTFLAGS='-C opt-level=2 --emit asm' cargo build --target i686-unknown-linux-gnu --release --features simd-accel` and searching for `utf8_valid_up_to` in the `.s` file.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust 1.25.0 regressed the performance of encoding_rs's UTF-8 validation on i686 #49873

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rust 1.25.0 regressed the performance of encoding_rs's UTF-8 validation on i686 #49873

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions