Skip to content

significant mul_add perf regression since nightly-2025-03-06 #140452

Closed
@sarah-quinones

Description

@sarah-quinones

rust uses compiler_builtins fma instead of the one from libc since nightly 2025-03-06 (2025-03-05 is fine)

my suspicion is linkage changes somewhere but i haven't been following the changes in compiler_builtins

code

i ran this code under perf on linux x86_64 (perf record -g cargo run)

the code loops forever so i interrupt it manually after a couple seconds

use core::hint::black_box as bb;

fn main() {
    loop {
        bb(f64::mul_add(1.0, 1.0, bb(1.0)));
    }
}

on 2025-03-05, the profiling (perf report -g) shows that most of the time is spent in __fma_fma3. on 2025-03-06 it's now spending most of the time in compiler_builtins::math::libm::fma::fma, which is a lot slower since it emulates fma with the limited sse instructions instead of vfmadd231sd

version it worked on

nightly-2025-03-05

version with regression

nightly-2025-03-06

@rustbot modify labels: +regression-from-stable-to-nightly -regression-untriaged

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.P-highHigh priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-nightlyPerformance or correctness regression from stable to nightly.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions