`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers

Code generation for `std::intrinsics::simd::simd_reduce_add_unordered` generates an extra floating-point add that adds +0.0 to the result: https://godbolt.org/z/Y496nxv3E

```rust
use std::simd::*;

unsafe fn reduce_add_unordered(v: f32x4) -> f32 {
    std::intrinsics::simd::simd_reduce_add_unordered(v)
}
```

The problem seems to be because the compiler uses +0.0 as the starting value of `@llvm.vector.reduce.fadd.*` instead of -0.0. Comparing LLVM code generation for the two cases, we get the more efficient version when using -0.0: https://godbolt.org/z/fhaz7ced6

```llvm
define float @reduce_fadd_positive_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
  ret float %result
}

define float @reduce_fadd_negative_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> %v)
  ret float %result
}

declare float @llvm.vector.reduce.fadd.v4f32(float, <4 x float>)
```
This generates the following assembly for AArch64:
```
reduce_fadd_positive_zero:              // @reduce_fadd_positive_zero
        ldr     q1, [x0]
        movi    d0, #0000000000000000
        faddp   v1.4s, v1.4s, v1.4s
        faddp   s1, v1.2s
        fadd    s0, s1, s0
        ret
reduce_fadd_negative_zero:              // @reduce_fadd_negative_zero
        ldr     q0, [x0]
        faddp   v0.4s, v0.4s, v0.4s
        faddp   s0, v0.2s
        ret
```

To me, this behaviour seems to be caused by using +0.0 instead of -0.0 here in the compiler: https://github.com/rust-lang/rust/blob/a3af2085ccdc1890fef22b96397ac58f714c5580/compiler/rustc_codegen_llvm/src/intrinsic.rs#L2095-L2101

	arith_red!(
	simd_reduce_add_unordered: vector_reduce_add,
	vector_reduce_fadd_reassoc,
	false,
	add,
	0.0
	);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

std::intrinsics::simd::simd_reduce_add_unordered generates inefficient code for floating-point numbers #130028

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028