Skip to content

[X86] const << (x&7) doesn't use shlx when BMI2 is available #141347

Open
@dzaima

Description

@dzaima

These functions:

void shl_u8(uint8_t* dst, uint64_t c) {
    *dst = 1 << (c&7);
}
void shr_u8(uint8_t* dst, uint64_t c) {
    *dst = 0xaa >> (c&7);
}

compiled with -O3 -march=haswell produce:

shl_u8:
        mov     rcx, rsi
        and     cl, 7
        mov     al, 1
        shl     al, cl
        mov     byte ptr [rdi], al
        ret

shr_u8:
        mov     rcx, rsi
        and     cl, 7
        mov     al, -86
        shr     al, cl
        mov     byte ptr [rdi], al
        ret

but they could use shlx & shrx as gcc does, e.g.:

shl_u8:
        and     esi, 7
        mov     eax, 1
        shlx    esi, eax, esi
        mov     BYTE PTR [rdi], sil
        ret

Extra important in a loop, where clang's version ends up reloading the constant every iteration, whereas shlx/shrx can reuse one from outside the loop, ending up with clang taking 4 uops on Haswell, vs gcc - 1 uop per iteration.

https://godbolt.org/z/Yc57PsWKE

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions