[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

aengelke · 2024-06-10T09:39:23Z

Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.

Copying an MCInst isn't cheap (copies all operands) and the whole instruction is only used for the Intel erratum mitigation, which is off by default. In all other cases, the opcode alone suffices. This slightly pessimizes code that uses moves to segment registers -- but that's uncommon and not performance-sensitive anyway. As a related change, also call canPadInst() only when the result is actually used, which is typically only the case in emitInstrEnd. This gives a minor performance improvement.

llvmbot · 2024-06-10T09:39:55Z

@llvm/pr-subscribers-backend-x86

Author: None (aengelke)

Changes

Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.

Full diff: https://github.com/llvm/llvm-project/pull/94947.diff

1 Files Affected:

(modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+34-19)

diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
index 472f34a4efdb4..5dba1607ad14a 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
@@ -125,9 +125,10 @@ class X86AsmBackend : public MCAsmBackend {
   unsigned TargetPrefixMax = 0;
 
   MCInst PrevInst;
+  unsigned PrevInstOpcode = 0;
   MCBoundaryAlignFragment *PendingBA = nullptr;
   std::pair<MCFragment *, size_t> PrevInstPosition;
-  bool CanPadInst = false;
+  bool IsRightAfterData = false;
 
   uint8_t determinePaddingPrefix(const MCInst &Inst) const;
   bool isMacroFused(const MCInst &Cmp, const MCInst &Jcc) const;
@@ -267,8 +268,8 @@ static bool isRIPRelative(const MCInst &MI, const MCInstrInfo &MCII) {
 }
 
 /// Check if the instruction is a prefix.
-static bool isPrefix(const MCInst &MI, const MCInstrInfo &MCII) {
-  return X86II::isPrefix(MCII.get(MI.getOpcode()).TSFlags);
+static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII) {
+  return X86II::isPrefix(MCII.get(Opcode).TSFlags);
 }
 
 /// Check if the instruction is valid as the first instruction in macro fusion.
@@ -382,9 +383,9 @@ bool X86AsmBackend::allowEnhancedRelaxation() const {
 
 /// X86 has certain instructions which enable interrupts exactly one
 /// instruction *after* the instruction which stores to SS.  Return true if the
-/// given instruction has such an interrupt delay slot.
-static bool hasInterruptDelaySlot(const MCInst &Inst) {
-  switch (Inst.getOpcode()) {
+/// given instruction may have such an interrupt delay slot.
+static bool mayHaveInterruptDelaySlot(unsigned InstOpcode) {
+  switch (InstOpcode) {
   case X86::POPSS16:
   case X86::POPSS32:
   case X86::STI:
@@ -394,9 +395,9 @@ static bool hasInterruptDelaySlot(const MCInst &Inst) {
   case X86::MOV32sr:
   case X86::MOV64sr:
   case X86::MOV16sm:
-    if (Inst.getOperand(0).getReg() == X86::SS)
-      return true;
-    break;
+    // In fact, this is only the case if the first operand is SS. However, as
+    // segment moves occur extremely rarely, this is just a minor pessimization.
+    return true;
   }
   return false;
 }
@@ -455,22 +456,22 @@ bool X86AsmBackend::canPadInst(const MCInst &Inst, MCObjectStreamer &OS) const {
     // TLSCALL).
     return false;
 
-  if (hasInterruptDelaySlot(PrevInst))
+  if (mayHaveInterruptDelaySlot(PrevInstOpcode))
     // If this instruction follows an interrupt enabling instruction with a one
     // instruction delay, inserting a nop would change behavior.
     return false;
 
-  if (isPrefix(PrevInst, *MCII))
+  if (isPrefix(PrevInstOpcode, *MCII))
     // If this instruction follows a prefix, inserting a nop/prefix would change
     // semantic.
     return false;
 
-  if (isPrefix(Inst, *MCII))
+  if (isPrefix(Inst.getOpcode(), *MCII))
     // If this instruction is a prefix, inserting a prefix would change
     // semantic.
     return false;
 
-  if (isRightAfterData(OS.getCurrentFragment(), PrevInstPosition))
+  if (IsRightAfterData)
     // If this instruction follows any data, there is no clear
     // instruction boundary, inserting a nop/prefix would change semantic.
     return false;
@@ -514,16 +515,24 @@ bool X86AsmBackend::needAlign(const MCInst &Inst) const {
 /// Insert BoundaryAlignFragment before instructions to align branches.
 void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
                                          const MCInst &Inst, const MCSubtargetInfo &STI) {
-  CanPadInst = canPadInst(Inst, OS);
+  // Used by canPadInst. Done here, because in emitInstructionEnd, the current
+  // fragment will have changed.
+  IsRightAfterData =
+      isRightAfterData(OS.getCurrentFragment(), PrevInstPosition);
 
   if (!canPadBranches(OS))
     return;
 
+  // NB: PrevInst only valid if canPadBranches is true.
   if (!isMacroFused(PrevInst, Inst))
     // Macro fusion doesn't happen indeed, clear the pending.
     PendingBA = nullptr;
 
-  if (!CanPadInst)
+  // When branch padding is enabled (basically the skx102 erratum => unlikely),
+  // we call canPadInst (not cheap) twice. However, in the common case, we can
+  // avoid unnecessary calls to that, as this is otherwise only used for
+  // relaxable fragments.
+  if (!canPadInst(Inst, OS))
     return;
 
   if (PendingBA && OS.getCurrentFragment()->getPrevNode() == PendingBA) {
@@ -557,16 +566,22 @@ void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
 }
 
 /// Set the last fragment to be aligned for the BoundaryAlignFragment.
-void X86AsmBackend::emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) {
-  PrevInst = Inst;
+void X86AsmBackend::emitInstructionEnd(MCObjectStreamer &OS,
+                                       const MCInst &Inst) {
   MCFragment *CF = OS.getCurrentFragment();
-  PrevInstPosition = std::make_pair(CF, getSizeForInstFragment(CF));
   if (auto *F = dyn_cast_or_null<MCRelaxableFragment>(CF))
-    F->setAllowAutoPadding(CanPadInst);
+    F->setAllowAutoPadding(canPadInst(Inst, OS));
+
+  // Update PrevInstOpcode here, canPadInst() reads that.
+  PrevInstOpcode = Inst.getOpcode();
+  PrevInstPosition = std::make_pair(CF, getSizeForInstFragment(CF));
 
   if (!canPadBranches(OS))
     return;
 
+  // PrevInst is only needed if canPadBranches. Copying an MCInst isn't cheap.
+  PrevInst = Inst;
+
   if (!needAlign(Inst) || !PendingBA)
     return;

KanRobert · 2024-06-11T02:01:06Z

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

@@ -267,8 +268,8 @@ static bool isRIPRelative(const MCInst &MI, const MCInstrInfo &MCII) {
 }

 /// Check if the instruction is a prefix.
-static bool isPrefix(const MCInst &MI, const MCInstrInfo &MCII) {
-  return X86II::isPrefix(MCII.get(MI.getOpcode()).TSFlags);
+static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII) {


Not understand why this is better. It's reference not a copy here.

I think it's for this line in X86AsmBackend::canPadInst

if (isPrefix(PrevInstOpcode, *MCII))

MaskRay · 2024-06-11T06:31:53Z

Avoiding PrevInst copy has a noticeable compile time improvement, more pronounced with -O0 when assembler time dominates.

% hyperfine --warmup 1 --min-runs 10 '/tmp/out/custom2/bin/clang -c sqlite3.i -w' # no 94947
Benchmark 1: /tmp/out/custom2/bin/clang -c sqlite3.i -w
  Time (mean ± σ):     514.6 ms ±   2.5 ms    [User: 497.8 ms, System: 16.7 ms]
  Range (min … max):   510.5 ms … 517.5 ms    10 runs

% hyperfine --warmup 1 --min-runs 10 '/tmp/out/custom2/bin/clang -c sqlite3.i -w' # 94947
Benchmark 1: /tmp/out/custom2/bin/clang -c sqlite3.i -w
  Time (mean ± σ):     507.5 ms ±   1.9 ms    [User: 485.6 ms, System: 21.8 ms]
  Range (min … max):   504.1 ms … 509.9 ms    10 runs

KanRobert

LGTM

Copying an MCInst isn't cheap (copies all operands) and the whole instruction is only used for the Intel erratum mitigation, which is off by default. In all other cases, the opcode alone suffices. This slightly pessimizes code that uses moves to segment registers -- but that's uncommon and not performance-sensitive anyway. As a related change, also call canPadInst() only when the result is actually used, which is typically only the case in emitInstrEnd. This gives a minor performance improvement.

aengelke requested review from MaskRay, preames and KanRobert June 10, 2024 09:39

llvmbot added the backend:X86 label Jun 10, 2024

KanRobert reviewed Jun 11, 2024

View reviewed changes

MaskRay approved these changes Jun 11, 2024

View reviewed changes

aengelke merged commit fc6e97c into llvm:main Jun 11, 2024
9 checks passed

aengelke deleted the perf/mc-no-mcinst-copy branch June 11, 2024 11:10

KanRobert reviewed Jun 12, 2024

View reviewed changes

HerrCai0907 mentioned this pull request Jun 13, 2024

tidy #95384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

aengelke commented Jun 10, 2024

llvmbot commented Jun 10, 2024

KanRobert Jun 11, 2024 •

edited

Loading

topperc Jun 11, 2024

MaskRay commented Jun 11, 2024

KanRobert left a comment

[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

Conversation

aengelke commented Jun 10, 2024

llvmbot commented Jun 10, 2024

KanRobert Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

topperc Jun 11, 2024

Choose a reason for hiding this comment

MaskRay commented Jun 11, 2024

KanRobert left a comment

Choose a reason for hiding this comment

KanRobert Jun 11, 2024 •

edited

Loading