Skip to content

RISC-V Machine Outliner should reach parity with other platforms #89822

Open
@ilovepi

Description

@ilovepi

Machine Outliner can lead to significant size savings, even for 32-bit embedded targets, see https://www.linaro.org/blog/reducing-code-size-with-llvm-machine-outliner-on-32-bit-arm-targets/

Both Arm and AArch64 backends support variety of outlining strategies:

https://github.com/llvm/llvm-project/blob/89c95effe82c09b9a42408f4823409331f8fa266/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp#L5678-5778

/// Constants defining how certain sequences should be outlined.
/// This encompasses how an outlined function should be called, and what kind of
/// frame should be emitted for that outlined function.
///
/// \p MachineOutlinerDefault implies that the function should be called with
/// a save and restore of LR to the stack.
///
/// That is,
///
/// I1 Save LR OUTLINED_FUNCTION:
/// I2 --> BL OUTLINED_FUNCTION I1
/// I3 Restore LR I2
/// I3
/// RET
///
/// * Call construction overhead: 3 (save + BL + restore)
/// * Frame construction overhead: 1 (ret)
/// * Requires stack fixups? Yes
///
/// \p MachineOutlinerTailCall implies that the function is being created from
/// a sequence of instructions ending in a return.
///
/// That is,
///
/// I1 OUTLINED_FUNCTION:
/// I2 --> B OUTLINED_FUNCTION I1
/// RET I2
/// RET
///
/// * Call construction overhead: 1 (B)
/// * Frame construction overhead: 0 (Return included in sequence)
/// * Requires stack fixups? No
///
/// \p MachineOutlinerNoLRSave implies that the function should be called using
/// a BL instruction, but doesn't require LR to be saved and restored. This
/// happens when LR is known to be dead.
///
/// That is,
///
/// I1 OUTLINED_FUNCTION:
/// I2 --> BL OUTLINED_FUNCTION I1
/// I3 I2
/// I3
/// RET
///
/// * Call construction overhead: 1 (BL)
/// * Frame construction overhead: 1 (RET)
/// * Requires stack fixups? No
///
/// \p MachineOutlinerThunk implies that the function is being created from
/// a sequence of instructions ending in a call. The outlined function is
/// called with a BL instruction, and the outlined function tail-calls the
/// original call destination.
///
/// That is,
///
/// I1 OUTLINED_FUNCTION:
/// I2 --> BL OUTLINED_FUNCTION I1
/// BL f I2
/// B f
/// * Call construction overhead: 1 (BL)
/// * Frame construction overhead: 0
/// * Requires stack fixups? No
///
/// \p MachineOutlinerRegSave implies that the function should be called with a
/// save and restore of LR to an available register. This allows us to avoid
/// stack fixups. Note that this outlining variant is compatible with the
/// NoLRSave case.
///
/// That is,
///
/// I1 Save LR OUTLINED_FUNCTION:
/// I2 --> BL OUTLINED_FUNCTION I1
/// I3 Restore LR I2
/// I3
/// RET
///
/// * Call construction overhead: 3 (save + BL + restore)
/// * Frame construction overhead: 1 (ret)
/// * Requires stack fixups? No

While the RISC-V backend has some outlining support, it's fairly limited compared to Arm and AArch64 backends, see
std::optional<outliner::OutlinedFunction>
RISCVInstrInfo::getOutliningCandidateInfo(
std::vector<outliner::Candidate> &RepeatedSequenceLocs) const {
// First we need to filter out candidates where the X5 register (IE t0) can't
// be used to setup the function call.
auto CannotInsertCall = [](outliner::Candidate &C) {
const TargetRegisterInfo *TRI = C.getMF()->getSubtarget().getRegisterInfo();
return !C.isAvailableAcrossAndOutOfSeq(RISCV::X5, *TRI);
};
llvm::erase_if(RepeatedSequenceLocs, CannotInsertCall);
// If the sequence doesn't have enough candidates left, then we're done.
if (RepeatedSequenceLocs.size() < 2)
return std::nullopt;
unsigned SequenceSize = 0;
for (auto &MI : RepeatedSequenceLocs[0])
SequenceSize += getInstSizeInBytes(MI);
// call t0, function = 8 bytes.
unsigned CallOverhead = 8;
for (auto &C : RepeatedSequenceLocs)
C.setCallInfo(MachineOutlinerDefault, CallOverhead);
// jr t0 = 4 bytes, 2 bytes if compressed instructions are enabled.
unsigned FrameOverhead = 4;
if (RepeatedSequenceLocs[0]
.getMF()
->getSubtarget<RISCVSubtarget>()
.hasStdExtCOrZca())
FrameOverhead = 2;
return outliner::OutlinedFunction(RepeatedSequenceLocs, SequenceSize,
FrameOverhead, MachineOutlinerDefault);
}

We should improve the RISC-V Machine Outliner to be on par with Arm and AArch64.

Off the top of my head we should take the following steps:

  1. Perform Gap analysis between the ARM and RISC-V machine outliners
  2. Discuss the findings with RISC-V backend maintainers: @topperc @preames @MaskRay @asb @jrtc27
  3. Create a design implementation plan to improve support
  4. Implement improved support upstream

CC: @petrhosek

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend:RISC-VenhancementImproving things as opposed to bug fixing, e.g. new or missing featurellvm:codegenllvm:codesizeCode size issuesmetabugIssue to collect references to a group of similar or related issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions