Skip to content

x86 missing optimization for variable shift left (without avx512) #140418

Open
@oscardssmith

Description

@oscardssmith

Given the following code

define <16 x i16> @mulbyconst(<16 x i16> %"a") #0 {
top:
  %0 = mul <16 x i16> %"a", <i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4>
  ret <16 x i16> %0
}

LLVM compiles this to a single vpsllvw instruction with AVX512, but in the absence of AVX512, it instead compiles to two vpsllw and a vpblendw (as shown in https://godbolt.org/z/PMehWerEd).

The issue is that although avx2 CPUs are missing the vpsllvw instruction (because avx2 is a bit of a mess), it includes the vpmullw instruction, so this could have compiled to a single vpmullw instruction by an alternating vector of 256 and 16. This missed optimization is especially annoying because LLVM went through a bunch of work to canonicalize the variable multiplication by powers of 2 into a variable shift left, even though just leaving it as a multiply would have been more efficient.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions