Description
I'm having a lot of trouble with the arm (32 and 64 bit) backends de-optimizing code related to broadcasted constants. There are several issues:
- LLVM attempts to observe constants through memory, and propagate them.
- LLVM moves broadcasts into loops.
- LLVM spills broadcasts by redoing the broadcast, rather than spilling and reloading a vector.
Here's an example that demonstrates several issues: https://godbolt.org/z/chjx4d4vh
If the compiler would compile the code as written, there would be no register spills, because the constants would occupy half as many registers. I included a commented call to make_opaque
that is one attempted workaround, to trick the compiler into not thinking these are constants (at the expense of a function call...), and it does work to do that, but the compiler still moves the broadcasts (dup
instructions) out of the loop and spills some of the registers.
I run into this issue very frequently. Any suggested workarounds, e.g. some annotation to force the compiler to keep a broadcast outside of the loop, or possible fixes to LLVM, would be very welcome. As it stands, I find vmla_lane_X
intrinsics to be almost useless because of this issue.