Skip to content

Commit 614a8cb

Browse files
authored
[MC][NFC] Allow MCInstrAnalysis to store state (#65479)
Currently, all the analysis functions provided by `MCInstrAnalysis` work on a single instruction. On some targets, this limits the kind of instructions that can be successfully analyzed as common constructs may need multiple instructions. For example, a typical call sequence on RISC-V uses a auipc+jalr pair. In order to analyse the jalr inside `evaluateBranch`, information about the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr pairs to access globals. This patch proposes to add state to `MCInstrAnalysis` to support these use cases. Two new virtual methods are added: - `updateState`: takes an instruction and its address. This methods should be called by clients on every instruction and allows targets to store whatever information they need to analyse future instructions. - `resetState`: clears the state whenever it becomes irrelevant. Clients could call this, for example, when starting to disassemble a new function. Note that the default implementations do nothing so this patch is NFC. No actual state is stored inside `MCInstrAnalysis`; deciding the structure of the state is left to the targets. This patch also modifies llvm-objdump to use the new interface. This patch is an alternative to [D116677](https://reviews.llvm.org/D116677) and the idea of storing state in `MCInstrAnalysis` was first discussed there.
1 parent b2d3c7b commit 614a8cb

File tree

2 files changed

+31
-5
lines changed

2 files changed

+31
-5
lines changed

llvm/include/llvm/MC/MCInstrAnalysis.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
3737
MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
3838
virtual ~MCInstrAnalysis() = default;
3939

40+
/// Clear the internal state. See updateState for more information.
41+
virtual void resetState() {}
42+
43+
/// Update internal state with \p Inst at \p Addr.
44+
///
45+
/// For some types of analyses, inspecting a single instruction is not
46+
/// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
47+
/// on AArch64. To support inspecting multiple instructions, targets may keep
48+
/// track of an internal state while analysing instructions. Clients should
49+
/// call updateState for every instruction which allows later calls to one of
50+
/// the analysis functions to take previous instructions into account.
51+
/// Whenever state becomes irrelevant (e.g., when starting to disassemble a
52+
/// new function), clients should call resetState to clear it.
53+
virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
54+
4055
virtual bool isBranch(const MCInst &Inst) const {
4156
return Info->get(Inst.getOpcode()).isBranch();
4257
}

llvm/tools/llvm-objdump/llvm-objdump.cpp

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -860,7 +860,7 @@ class DisassemblerTarget {
860860
std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
861861
std::shared_ptr<MCContext> Context;
862862
std::unique_ptr<MCDisassembler> DisAsm;
863-
std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
863+
std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
864864
std::shared_ptr<MCInstPrinter> InstPrinter;
865865
PrettyPrinter *Printer;
866866

@@ -1283,14 +1283,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
12831283
}
12841284
}
12851285

1286-
static void collectLocalBranchTargets(
1287-
ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
1288-
MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
1289-
uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
1286+
static void
1287+
collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
1288+
MCDisassembler *DisAsm, MCInstPrinter *IP,
1289+
const MCSubtargetInfo *STI, uint64_t SectionAddr,
1290+
uint64_t Start, uint64_t End,
1291+
std::unordered_map<uint64_t, std::string> &Labels) {
12901292
// So far only supports PowerPC and X86.
12911293
if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
12921294
return;
12931295

1296+
if (MIA)
1297+
MIA->resetState();
1298+
12941299
Labels.clear();
12951300
unsigned LabelCount = 0;
12961301
Start += SectionAddr;
@@ -1316,6 +1321,7 @@ static void collectLocalBranchTargets(
13161321
!Labels.count(Target) &&
13171322
!(STI->getTargetTriple().isPPC() && Target == Index))
13181323
Labels[Target] = ("L" + Twine(LabelCount++)).str();
1324+
MIA->updateState(Inst, Index);
13191325
}
13201326
Index += Size;
13211327
}
@@ -1967,6 +1973,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
19671973
BBAddrMapLabels);
19681974
}
19691975

1976+
if (DT->InstrAnalysis)
1977+
DT->InstrAnalysis->resetState();
1978+
19701979
while (Index < End) {
19711980
// ARM and AArch64 ELF binaries can interleave data and text in the
19721981
// same section. We rely on the markers introduced to understand what
@@ -2183,6 +2192,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
21832192
if (TargetOS == &CommentStream)
21842193
*TargetOS << "\n";
21852194
}
2195+
2196+
DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
21862197
}
21872198
}
21882199

0 commit comments

Comments
 (0)