Skip to content

possibly missed fold, unnecessary splat before interleaving merging two scalars #86076

Closed
@zhengyang92

Description

@zhengyang92

https://godbolt.org/z/ah1vGWTqK
https://alive2.llvm.org/ce/z/oe-Gbc

define <16 x float> @src(float %0, float %1, i8 %2) {
entry:
  %3 = insertelement <8 x float> poison, float %0, i64 0
  %4 = shufflevector <8 x float> %3, <8 x float> poison, <8 x i32> zeroinitializer
  %5 = insertelement <8 x float> poison, float %1, i64 0
  %6 = shufflevector <8 x float> %5, <8 x float> poison, <8 x i32> zeroinitializer
  %7 = shufflevector <8 x float> %4, <8 x float> %6, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
  ret <16 x float> %7
}

define <16 x float> @tgt(float %0, float %1, i8 %2) {
entry:
  %3 = insertelement <8 x float> poison, float %0, i64 0
  %4 = insertelement <8 x float> poison, float %1, i64 0
  %sv = shufflevector <8 x float> %3, <8 x float> %4, <16 x i32> <i32 0, i32 8, i32 0, i32 8, i32 0, i32 8, i32 0, i32 8, i32 0, i32 8, i32 0, i32 8, i32 0, i32 8, i32 0, i32 8>
  ret <16 x float> %sv
}

Is the src in some kind of canonicalized form? From the generated x86 code, tgt seems have 1 less instruction but introduces a new constant.

@nikic @dtcxzyw @regehr @topperc

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions