`mul_add` documentation is inaccurate

According to the [documentation of `mul_add`](https://doc.rust-lang.org/std/primitive.f64.html#method.mul_add):
> This produces a more accurate result with better performance than a separate multiplication operation followed by an add.

This seems not to be true in general. On my machine, `a.mul_add(b, c)` is slower than `a * b + c` when compiling without `target-cpu=native`.