Skip to content

Ineffectual por with constant emitted for pshufb operand #106256

Closed
@cvijdea-bd

Description

@cvijdea-bd

Clang example: https://godbolt.org/z/ec4P4j78b, flags: -O3 -march=x86-64-v2. Not clang specific, same behaviour on rust nightly.

#include <immintrin.h>

extern "C" __m128i shuffle_or(__m128i bytes, __m128i idxs) {
    return _mm_shuffle_epi8(bytes, _mm_or_si128(idxs, _mm_set1_epi8(112)));
}

The por of xmm1 with 112 (0b0111_0000) is a no-op and should be optimized out, as pshufb ignores bits 5-7 of the mask argument:

.LCPI0_0:
        .zero   16,112
shuffle_or:
        por     xmm1, xmmword ptr [rip + .LCPI0_0]
        pshufb  xmm0, xmm1
        ret

Writing _mm_shuffle_epi8(bytes, _mm_set1_epi8(127)) in the source emits a pshufb with 15 in the assembly, so it seems like LLVM is aware of this optimization on some level, but fails to apply it here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions