Implementation of Ord for integers is suboptimal

The current implementation of `Ord::cmp` for integral types results in less than optimal code. Currently the implementation looks something like this:

```rust                
fn cmp(&self, other: &$t) -> Ordering {
    if *self == *other { Equal }
    else if *self < *other { Less }
    else { Greater }
}
```

This results in the following IR:

```llvm
  %2 = icmp eq i64 %0, %1
  %3 = icmp ult i64 %0, %1
  %..i = select i1 %3, i8 -1, i8 1
  %_0.0.i = select i1 %2, i8 0, i8 %..i
  ret i8 %_0.0.i
```

which, on x86, becomes

```asm
	cmpq	%rsi, %rdi
	setae	%al
	cmpq	%rsi, %rdi
	je	.LBB0_1
	addb	%al, %al
	addb	$-1, %al
        ; exit (retq)
    .LBB0_1:
	xorl	%eax, %eax
        ; exit (retq)
```

where the critical path looks like this (courtesy of `llvm-mca -mcpu=broadwell`):

```
              Instruction                                 Dependency Information
 +----< 0.    cmpq	%rsi, %rdi
 +----> 1.    setae	%al                               ## REGISTER dependency:  %flags
 |      2.    cmpq	%rsi, %rdi
 |      3.    je	.LBB0_1
 +----> 4.    addb	%al, %al                          ## REGISTER dependency:  %al
 +----> 5.    addb	$-1, %al                          ## REGISTER dependency:  %al
 |      6.    ; exit
 |      7.    xorl	%eax, %eax
 |      8.    ; exit
 |
 |    < loop carried > 
 |
 +----> 2.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort1 [ probability: 25% ]
```

---

if the implementation instead was:

```rust
    if a < b {
        std::cmp::Ordering::Less
    } else if a > b {
        std::cmp::Ordering::Greater
    } else {
        std::cmp::Ordering::Equal
    }
```

the IR would become

```llvm
  %0 = icmp ult i64 %a, %b
  %1 = icmp ugt i64 %a, %b
  %. = zext i1 %1 to i8
  %_0.0 = select i1 %0, i8 -1, i8 %.
  ret i8 %_0.0
```

which in turn would compile down to

```asm
	movb	$-1, %al
	cmpq	%rsi, %rdi
	seta	%cl
	jb	.LBB1_2
	movl	%ecx, %eax
    .LBB1_2:
        ; exit (retq)
```

for which the critical path (as expected) only becomes visible on a 2nd iteration:

```
              Instruction                                 Dependency Information
 +----< 2.    seta	%cl
 |
 |    < loop carried > 
 |
 |      0.    movb	$-1, %al
 +----> 1.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort6 [ probability: 97% ]
 +----> 2.    seta	%cl                               ## REGISTER dependency:  %flags
 |      3.    jb	.LBB1_2
 +----> 4.    movl	%ecx, %eax                        ## REGISTER dependency:  %cl
```

llvm-mca reports that the reciprocal throughput for the improved version is 1.5 or lower (getting as low as 1.3 on znver1; lower is better for reciprocal throughput) for various x86 architectures whereas the old code always exceeds 2.0, reaching 3.0 on `core2`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Ord for integers is suboptimal #63758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of Ord for integers is suboptimal #63758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions