Skip to content

Ineffectual bitwise or with constant emitted for mask operand of vperm(b|w|d|q|ps|pd) #106413

Closed
@cvijdea-bd

Description

@cvijdea-bd

Same thing as #106256, but also happens for the (avx2/avs512) permute[x]var intrinsics, while the PR #106377 seems to only fix it for (v)pshufb specifically.

Godbolt examples: https://godbolt.org/z/MsTcx7qYc

The vector permute intrinsics ignore all bits except the ones that match the required index size, e.g.:

  • vpermb only uses 4, 5, 6 bits out of each mask byte element for 128, 256, 512 bit sized vectors respectively
  • vpermw only uses 3, 4, 5 bits out of each 16-bit element in the mask
  • etc.

The OR operations with unrelated bits should be optimzied out.

Probably applies to vpermt2 (e.g. _mm512_permutex2var_epi16) also, with 1 more bit used since they selected from two concatenated vectors.

cc @RKSimon

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions