Skip to content

ARM/AArch64 backend aggressively pessimizes code with broadcasted constants #102195

Open
@dsharlet

Description

@dsharlet

I'm having a lot of trouble with the arm (32 and 64 bit) backends de-optimizing code related to broadcasted constants. There are several issues:

  • LLVM attempts to observe constants through memory, and propagate them.
  • LLVM moves broadcasts into loops.
  • LLVM spills broadcasts by redoing the broadcast, rather than spilling and reloading a vector.

Here's an example that demonstrates several issues: https://godbolt.org/z/chjx4d4vh

If the compiler would compile the code as written, there would be no register spills, because the constants would occupy half as many registers. I included a commented call to make_opaque that is one attempted workaround, to trick the compiler into not thinking these are constants (at the expense of a function call...), and it does work to do that, but the compiler still moves the broadcasts (dup instructions) out of the loop and spills some of the registers.

I run into this issue very frequently. Any suggested workarounds, e.g. some annotation to force the compiler to keep a broadcast outside of the loop, or possible fixes to LLVM, would be very welcome. As it stands, I find vmla_lane_X intrinsics to be almost useless because of this issue.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions