[RISCV] Add CompressPat for all cases in isCopyInstrImpl #136875

asb · 2025-04-23T14:36:59Z

We may end up with a non-canonical move (copy) instruction after a pass like MachineCopyPropagation that doesn't get compressed because we don't have an explicit pattern for it. This patch addresses that by adding additional CompressPat. This means that the generated code will be slightly different with/without the compressed extension. This isn't expected to be problematic as the instructions in question are all simple arithmetic ops.

This removes 214 static instances of such instructions in an llvm-test-suite rva22 compile including SPEC. I don't claim this is any great win, but particularly as these do show up in benchmarks that people spend a lot of time analysing my preference is to remove codegen oddities like this where it's reasonable to do so, so as to avoid distraction/wasted time when people are analysing said benchmarks.

Question for reviewers: are you happy with the approach of adding as a CompressPat? The alternative would be to try to do this independent of whether C is enabled or not through either a canonicalisation pass, or adding a TargetInstrInfo hook like canonicalizeCopyInstr that MachineCopyPropagation calls after mutating something that was isCopyInstr.

If we're happy with this implementation approach I will add the necessary MC tests and mark for review.

We may end up with a non-canonical move (copy) instruction after a pass like MachineCopyPropagation that doesn't get compressed because we don't have an explicit pattern for it. This patch addressses that by adding additional CompressPat. This means that the generated code will be slightly different with/without the compressed extension. This isn't expected to be problematic as the instructions in question are all simple arithmetic ops.

topperc · 2025-04-23T15:35:05Z

llvm/lib/Target/RISCV/RISCVInstrInfoC.td

@@ -1053,3 +1057,21 @@ let Predicates = [HasStdExtCOrZca, IsRV64] in {
 def : CompressPat<(SD GPR:$rs2, SPMem:$rs1, uimm9_lsb000:$imm),
                  (C_SDSP GPR:$rs2, SPMem:$rs1, uimm9_lsb000:$imm)>;
 } // Predicates = [HasStdExtCOrZca, IsRV64]
+
+let isCompressOnly = true, Predicates = [HasStdExtCOrZca, HasStdExtZba] in {
+def : CompressPat<(SH1ADD GPRNoX0:$rs1, X0, GPRNoX0:$rs2),


SHXADD is available on a single ALU on SiFive x280, but ADD/ADDI are available on both ALUs. x280 supports C so this patch would compress, but I could envision other CPUs that support SHXADD with lower performance and don't support C. Should we canonicalize a copy SHXADD to ADD earlier?

This is the kind trade-off I wondered about. I oversimplified a bit in my summary.

On the one hand: Although canonicalising earlier is theoretically better for this reason, it also seem defensible to optimise for existing real-world and likely future hardware where unless something big changes, it's hard to imagine it doesn't have RVC support. Especially as this is more of a micro-optimisation that we're mainly motivated to do to keep output clean and non-surprising for anyone doing perf analysis.

On the other hand: there's an attraction and probably minor engineering advantage to keeping differences between compressed/non-compressed as minimal as possible. And if non-C hardware did appear, our future selves would be thankful.

Overall I'm still torn...

preames · 2025-04-23T20:46:06Z

I'm mildly of the opinion that we should canonicalize the copy instead. Not a strongly held opinion or anything.

Maybe instead of a specific canonicalizeCopyInst, we could introduce a generic optimizeInstruction which is called after mutation? It could canonicalize the copy, but could also e.g. fold a sh2add x2, x1, x0 to slli x2, x1, 2.

Alternatively, if we wanted something localized, we could extend the late optimization pass we just added for branches to handle non-branch instructions.

asb · 2025-04-24T15:00:58Z

I'm mildly of the opinion that we should canonicalize the copy instead. Not a strongly held opinion or anything.

Maybe instead of a specific canonicalizeCopyInst, we could introduce a generic optimizeInstruction which is called after mutation? It could canonicalize the copy, but could also e.g. fold a sh2add x2, x1, x0 to slli x2, x1, 2.

Good point. I'm seeing only a single static instance of that specific example (sh?add that can be a shift), but there are a few other potentials.

Alternatively, if we wanted something localized, we could extend the late optimization pass we just added for branches to handle non-branch instructions.

Before that I had it in mind that the pseudo instruction expansion pass might become a canonicalisation + pseudo instruction expansion pass. But yes, there are a few possible places for this.

Just to give a holding pattern update: given the fact we suspect this isn't necessarily the best way of handling it I'm going through the other suboptimal or improperly canonicalised instructions to make sure I can follow up with a proposal that considers all the ones we care about. I suspect this looks like the optimizeInstruction idea, but will report back once I've checked.

…s after MachineCopyPropagation PR llvm#136875 was posted as a draft PR that handled a subset of these cases, using the CompressPat mechanism. The consensus from that discussion (and a conclusion I agree with) is that it would be beneficial doing this optimisation earlier on, and in a way that isn't limited just to cases that can be handled by instruction compression. The most common source for instructions that can be optimized/canonicalized in this way is through tail duplication followed by machine copy propagation. For RISC-V, choosing a more canonical instruction allows it to be compressed when it couldn't be before. There is the potential that it would make other MI-level optimisations easier. This modifies ~910 instructions across an llvm-test-suite build including SPEC2017, targeting rva22u64. Coverage of instructions is based on observations from a script written to find redundant or improperly canonicalized instructions (though I aim to support all instructions in a 'group' at once, e.g. MUL* even if I only saw some variants of MUL in practice).

asb · 2025-04-30T14:36:44Z

I've found that all cases I saw were ultimately the result of MachineCopyPropagation and can be handled cleanly via the suggested optimizeInstruction route. #137973 implements this approach, and I'll close this draft PR in favour of that one.

…ns after MachineCopyPropagation (#137973) PR #136875 was posted as a draft PR that handled a subset of these cases, using the CompressPat mechanism. The consensus from that discussion (and a conclusion I agree with) is that it would be beneficial doing this optimisation earlier on, and in a way that isn't limited just to cases that can be handled by instruction compression. The most common source for instructions that can be optimized/canonicalized in this way is through tail duplication in MachineBlockPlacement followed by machine copy propagation. For RISC-V, choosing a more canonical instruction allows it to be compressed when it couldn't be before. There is the potential that it would make other MI-level optimisations easier. This modifies ~910 instructions across an llvm-test-suite build including SPEC2017, targeting rva22u64. Looking at the diff, it seems there's room for eliminating instructions or further propagating after this. Coverage of instructions is based on observations from a script written to find redundant or improperly canonicalized instructions (though I aim to support all instructions in a 'group' at once, e.g. MUL* even if I only saw some variants of MUL in practice).

asb requested review from lenary, preames and topperc April 23, 2025 14:36

asb force-pushed the 2025q2-riscv-additional-compress-pats branch from 6a6a224 to 2698e03 Compare April 23, 2025 14:38

topperc reviewed Apr 23, 2025

View reviewed changes

asb mentioned this pull request Apr 30, 2025

[RISCV][TII] Add and use new hook to optimize/canonicalize instructions after MachineCopyPropagation #137973

Merged

asb closed this Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Add CompressPat for all cases in isCopyInstrImpl #136875

[RISCV] Add CompressPat for all cases in isCopyInstrImpl #136875

Uh oh!

asb commented Apr 23, 2025

Uh oh!

topperc Apr 23, 2025

Uh oh!

asb Apr 23, 2025

Uh oh!

preames commented Apr 23, 2025

Uh oh!

asb commented Apr 24, 2025

Uh oh!

asb commented Apr 30, 2025

Uh oh!

Uh oh!

[RISCV] Add CompressPat for all cases in isCopyInstrImpl #136875

[RISCV] Add CompressPat for all cases in isCopyInstrImpl #136875

Uh oh!

Conversation

asb commented Apr 23, 2025

Uh oh!

topperc Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

asb Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

preames commented Apr 23, 2025

Uh oh!

asb commented Apr 24, 2025

Uh oh!

asb commented Apr 30, 2025

Uh oh!

Uh oh!