Open
Description
given this code
https://godbolt.org/z/3WxTM4Yao
#include <altivec.h>
vector float old(vector float a, vector float b, vector float c) {
return vec_nmsub(a, b, c);
}
vector float new(vector float a, vector float b, vector float c) {
return vec_neg(vec_madd(a, b, vec_neg(c)));
}
on newer powerpc cpus, these both generate the exact same assembly as expected:
xvnmsubasp 36, 34, 35
vmr 2, 4
blr
however for older cpus, the non-intrinsic implementation fails to optimize
old:
vnmsubfp 2, 2, 3, 4
blr
new:
vspltisb 5, -1
vslw 5, 5, 5
vsubfp 4, 5, 4
vmaddfp 2, 2, 3, 4
vsubfp 2, 5, 2
blrasm
this came up here rust-lang/stdarch#1734