Skip to content

[InstCombine] Performance degradation caused by more branch after commit b5a9361 and #66740 #99835

Open
@cyyself

Description

@cyyself

Statement

Commit b5a9361 and PR #66740 (commit a7f962c) caused more likely branches generated. Result in 37% or more end-to-end severe performance degradation on both AMD Zen 4 and Intel Raptor Lake running Verilator generated C++ codes for XiangShan. As we can compare these two results: cyyself@a2489e3#commitcomment-144413753 cyyself@40529f8#commitcomment-144413665.

Reduced reproducer

https://godbolt.org/z/TYsnxGf5a

Look at the following code:

struct a_struct {
    bool b_1, b_2, b_3;
    unsigned int cv_1, cv_2, cv_3;
    unsigned int cav_1, cav_2, cav_3;
    unsigned int d;
};

void some_func(a_struct &a) {
    if (a.b_1 & (8U == (0x1f & a.cv_1))) [[unlikely]] {
        a.d = a.cav_1;
    }
    else if (a.b_2 & (7U == (0x1f & a.cv_2))) [[unlikely]] {
        a.d = a.cav_2;
    }
    else if (a.b_3 & (6U == (0x1f & a.cv_3))) [[unlikely]] {
        a.d = a.cav_3;
    }
}

Compile: clang++ -O3 -c test.cpp -S

We will get this code on the x86-64 target on LLVM main (at least for commit da0c8b2):

_Z9some_funcR8a_struct:                 # @_Z9some_funcR8a_struct
	.cfi_startproc
# %bb.0:
	movl	4(%rdi), %eax
	andl	$31, %eax
	cmpl	$8, %eax
	jne	.LBB0_3
# %bb.1:
	cmpb	$0, (%rdi)
	jne	.LBB0_2
.LBB0_3:
	movl	8(%rdi), %eax
	andl	$31, %eax
	cmpl	$7, %eax
	jne	.LBB0_6
...

As we can see, if the condition a.b_1 is false, LLVM will generate a conditional jump to the next if even when we have an unlikely hint. Even worse, these conditional jumps are likely to be taken, causing the CPU to flush the pipeline if the branch predictor isn't working for its small size.

But if we revert commit b5a9361 and PR #66740 (commit a7f962c) or directly use the LLVM release version < 16. We will get the code like this:

_Z9some_funcR8a_struct:                 # @_Z9some_funcR8a_struct
	.cfi_startproc
# %bb.0:
	movl	4(%rdi), %eax
	andl	$31, %eax
	cmpl	$8, %eax
	sete	%al
	testb	%al, (%rdi)
	jne	.LBB0_1
# %bb.2:
	movl	8(%rdi), %eax
	andl	$31, %eax
...

It reduced the number of branches by half and is very friendly for CPU branch predictors since there are no likely branches.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions