Skip to content

Auto-vectorization via masked.load blocks constprop #134513

Closed
@scottmcm

Description

@scottmcm

I was writing some code in Rust and ended up with the following IR, where even though everything's a constant -- it should just be ret i64 165 -- the masked loads from autovectorization on -Ctarget-cpu=x86-64-v3 kept that from happening:

define noundef i64 @test() unnamed_addr #0 {
bb3.preheader:
  %iter = alloca [64 x i8], align 8
  call void @llvm.lifetime.start.p0(i64 64, ptr nonnull %iter)
  %_3.sroa.5.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 16
  store <4 x i64> <i64 23, i64 16, i64 54, i64 3>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8
  %_3.sroa.9.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 48
  store i64 60, ptr %_3.sroa.9.0.iter.sroa_idx, align 8
  %_3.sroa.10.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 56
  store i64 9, ptr %_3.sroa.10.0.iter.sroa_idx, align 8
  %unmaskedload = load <4 x i64>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8, !alias.scope !2
  %0 = getelementptr inbounds nuw i8, ptr %iter, i64 48
  %wide.masked.load.1 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr nonnull %0, i32 8, <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> poison), !alias.scope !2
  %1 = add <4 x i64> %wide.masked.load.1, %unmaskedload
  %2 = shufflevector <4 x i64> %1, <4 x i64> %unmaskedload, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
  %3 = tail call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> %2)
  call void @llvm.lifetime.end.p0(i64 64, ptr nonnull %iter)
  ret i64 %3
}

It looks like trunk can't optimize that to a constant either: https://llvm.godbolt.org/z/z6MKz6cz1

(Trunk at least doesn't need the store-load of the vector constant, but it still doesn't const-prop the stores and the masked.load.)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions