Skip to content

Implementation of Ord for integers is suboptimal #63758

Closed
@nagisa

Description

@nagisa

The current implementation of Ord::cmp for integral types results in less than optimal code. Currently the implementation looks something like this:

fn cmp(&self, other: &$t) -> Ordering {
    if *self == *other { Equal }
    else if *self < *other { Less }
    else { Greater }
}

This results in the following IR:

  %2 = icmp eq i64 %0, %1
  %3 = icmp ult i64 %0, %1
  %..i = select i1 %3, i8 -1, i8 1
  %_0.0.i = select i1 %2, i8 0, i8 %..i
  ret i8 %_0.0.i

which, on x86, becomes

	cmpq	%rsi, %rdi
	setae	%al
	cmpq	%rsi, %rdi
	je	.LBB0_1
	addb	%al, %al
	addb	$-1, %al
        ; exit (retq)
    .LBB0_1:
	xorl	%eax, %eax
        ; exit (retq)

where the critical path looks like this (courtesy of llvm-mca -mcpu=broadwell):

              Instruction                                 Dependency Information
 +----< 0.    cmpq	%rsi, %rdi
 +----> 1.    setae	%al                               ## REGISTER dependency:  %flags
 |      2.    cmpq	%rsi, %rdi
 |      3.    je	.LBB0_1
 +----> 4.    addb	%al, %al                          ## REGISTER dependency:  %al
 +----> 5.    addb	$-1, %al                          ## REGISTER dependency:  %al
 |      6.    ; exit
 |      7.    xorl	%eax, %eax
 |      8.    ; exit
 |
 |    < loop carried > 
 |
 +----> 2.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort1 [ probability: 25% ]

if the implementation instead was:

    if a < b {
        std::cmp::Ordering::Less
    } else if a > b {
        std::cmp::Ordering::Greater
    } else {
        std::cmp::Ordering::Equal
    }

the IR would become

  %0 = icmp ult i64 %a, %b
  %1 = icmp ugt i64 %a, %b
  %. = zext i1 %1 to i8
  %_0.0 = select i1 %0, i8 -1, i8 %.
  ret i8 %_0.0

which in turn would compile down to

	movb	$-1, %al
	cmpq	%rsi, %rdi
	seta	%cl
	jb	.LBB1_2
	movl	%ecx, %eax
    .LBB1_2:
        ; exit (retq)

for which the critical path (as expected) only becomes visible on a 2nd iteration:

              Instruction                                 Dependency Information
 +----< 2.    seta	%cl
 |
 |    < loop carried > 
 |
 |      0.    movb	$-1, %al
 +----> 1.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort6 [ probability: 97% ]
 +----> 2.    seta	%cl                               ## REGISTER dependency:  %flags
 |      3.    jb	.LBB1_2
 +----> 4.    movl	%ecx, %eax                        ## REGISTER dependency:  %cl

llvm-mca reports that the reciprocal throughput for the improved version is 1.5 or lower (getting as low as 1.3 on znver1; lower is better for reciprocal throughput) for various x86 architectures whereas the old code always exceeds 2.0, reaching 3.0 on core2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.E-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.E-help-wantedCall for participation: Help is requested to fix this issue.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.I-slowIssue: Problems and improvements with respect to performance of generated code.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions