Closed
Description
According to the documentation of mul_add
:
This produces a more accurate result with better performance than a separate multiplication operation followed by an add.
This seems not to be true in general. On my machine, a.mul_add(b, c)
is slower than a * b + c
when compiling without target-cpu=native
.