Closed
Description
With the -freciprocal-math (and -funsafe-math-optimizations) the compiler can try harder to avoid dependent FSQRT and FDIV operations. For example
double res, res2, tmp;
void
foo (double a, double b, int c, int d)
{
tmp = 1.0 / __builtin_sqrt (a);
res = tmp * tmp;
if (d)
res2 = a * tmp;
}
With -Ofast aarch64 LLVM generates:
foo(double, double, int, int): // @foo(double, double, int, int)
fsqrt d1, d0
fmov d2, #1.00000000
adrp x8, tmp
fdiv d2, d2, d1
str d2, [x8, :lo12:tmp]
fmul d2, d2, d2
adrp x8, res
str d2, [x8, :lo12:res]
cbz w1, .LBB0_2
fdiv d0, d0, d1
adrp x8, res2
str d0, [x8, :lo12:res2]
.LBB0_2:
ret
GCC at -Ofast can do:
foo(double, double, int, int):
fmov d1, 1.0e+0
adrp x0, .LANCHOR0
fsqrt d2, d0
add x2, x0, :lo12:.LANCHOR0
fdiv d0, d1, d0
fmul d1, d2, d0
str d0, [x2, 8]
str d1, [x0, #:lo12:.LANCHOR0]
cbz w1, .L1
str d2, [x2, 16]
.L1:
ret
https://godbolt.org/z/jb8a14K16
Notice how the expensive FSQRT and FDIV are now independent and can execute in parallel.
A write-up of the transformation can be found in the GCC commit:
http://gcc.gnu.org/g:24c49431499bcb462aeee41e027a3dac25e934b3