Skip to content

std::intrinsics::simd::simd_reduce_add_unordered generates inefficient code for floating-point numbers #130028

Closed
@Il-Capitano

Description

@Il-Capitano

Code generation for std::intrinsics::simd::simd_reduce_add_unordered generates an extra floating-point add that adds +0.0 to the result: https://godbolt.org/z/Y496nxv3E

use std::simd::*;

unsafe fn reduce_add_unordered(v: f32x4) -> f32 {
    std::intrinsics::simd::simd_reduce_add_unordered(v)
}

The problem seems to be because the compiler uses +0.0 as the starting value of @llvm.vector.reduce.fadd.* instead of -0.0. Comparing LLVM code generation for the two cases, we get the more efficient version when using -0.0: https://godbolt.org/z/fhaz7ced6

define float @reduce_fadd_positive_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
  ret float %result
}

define float @reduce_fadd_negative_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> %v)
  ret float %result
}

declare float @llvm.vector.reduce.fadd.v4f32(float, <4 x float>)

This generates the following assembly for AArch64:

reduce_fadd_positive_zero:              // @reduce_fadd_positive_zero
        ldr     q1, [x0]
        movi    d0, #0000000000000000
        faddp   v1.4s, v1.4s, v1.4s
        faddp   s1, v1.2s
        fadd    s0, s1, s0
        ret
reduce_fadd_negative_zero:              // @reduce_fadd_negative_zero
        ldr     q0, [x0]
        faddp   v0.4s, v0.4s, v0.4s
        faddp   s0, v0.2s
        ret

To me, this behaviour seems to be caused by using +0.0 instead of -0.0 here in the compiler:

arith_red!(
simd_reduce_add_unordered: vector_reduce_add,
vector_reduce_fadd_reassoc,
false,
add,
0.0
);

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-SIMDArea: SIMD (Single Instruction Multiple Data)A-codegenArea: Code generationC-bugCategory: This is a bug.O-AArch64Armv8-A or later processors in AArch64 modeT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions