Skip to content

[MC][X86] Avoid copying MCInst in emitInstrEnd #94947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

aengelke
Copy link
Contributor

Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.

Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2024

@llvm/pr-subscribers-backend-x86

Author: None (aengelke)

Changes

Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.


Full diff: https://github.com/llvm/llvm-project/pull/94947.diff

1 Files Affected:

  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+34-19)
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
index 472f34a4efdb4..5dba1607ad14a 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
@@ -125,9 +125,10 @@ class X86AsmBackend : public MCAsmBackend {
   unsigned TargetPrefixMax = 0;
 
   MCInst PrevInst;
+  unsigned PrevInstOpcode = 0;
   MCBoundaryAlignFragment *PendingBA = nullptr;
   std::pair<MCFragment *, size_t> PrevInstPosition;
-  bool CanPadInst = false;
+  bool IsRightAfterData = false;
 
   uint8_t determinePaddingPrefix(const MCInst &Inst) const;
   bool isMacroFused(const MCInst &Cmp, const MCInst &Jcc) const;
@@ -267,8 +268,8 @@ static bool isRIPRelative(const MCInst &MI, const MCInstrInfo &MCII) {
 }
 
 /// Check if the instruction is a prefix.
-static bool isPrefix(const MCInst &MI, const MCInstrInfo &MCII) {
-  return X86II::isPrefix(MCII.get(MI.getOpcode()).TSFlags);
+static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII) {
+  return X86II::isPrefix(MCII.get(Opcode).TSFlags);
 }
 
 /// Check if the instruction is valid as the first instruction in macro fusion.
@@ -382,9 +383,9 @@ bool X86AsmBackend::allowEnhancedRelaxation() const {
 
 /// X86 has certain instructions which enable interrupts exactly one
 /// instruction *after* the instruction which stores to SS.  Return true if the
-/// given instruction has such an interrupt delay slot.
-static bool hasInterruptDelaySlot(const MCInst &Inst) {
-  switch (Inst.getOpcode()) {
+/// given instruction may have such an interrupt delay slot.
+static bool mayHaveInterruptDelaySlot(unsigned InstOpcode) {
+  switch (InstOpcode) {
   case X86::POPSS16:
   case X86::POPSS32:
   case X86::STI:
@@ -394,9 +395,9 @@ static bool hasInterruptDelaySlot(const MCInst &Inst) {
   case X86::MOV32sr:
   case X86::MOV64sr:
   case X86::MOV16sm:
-    if (Inst.getOperand(0).getReg() == X86::SS)
-      return true;
-    break;
+    // In fact, this is only the case if the first operand is SS. However, as
+    // segment moves occur extremely rarely, this is just a minor pessimization.
+    return true;
   }
   return false;
 }
@@ -455,22 +456,22 @@ bool X86AsmBackend::canPadInst(const MCInst &Inst, MCObjectStreamer &OS) const {
     // TLSCALL).
     return false;
 
-  if (hasInterruptDelaySlot(PrevInst))
+  if (mayHaveInterruptDelaySlot(PrevInstOpcode))
     // If this instruction follows an interrupt enabling instruction with a one
     // instruction delay, inserting a nop would change behavior.
     return false;
 
-  if (isPrefix(PrevInst, *MCII))
+  if (isPrefix(PrevInstOpcode, *MCII))
     // If this instruction follows a prefix, inserting a nop/prefix would change
     // semantic.
     return false;
 
-  if (isPrefix(Inst, *MCII))
+  if (isPrefix(Inst.getOpcode(), *MCII))
     // If this instruction is a prefix, inserting a prefix would change
     // semantic.
     return false;
 
-  if (isRightAfterData(OS.getCurrentFragment(), PrevInstPosition))
+  if (IsRightAfterData)
     // If this instruction follows any data, there is no clear
     // instruction boundary, inserting a nop/prefix would change semantic.
     return false;
@@ -514,16 +515,24 @@ bool X86AsmBackend::needAlign(const MCInst &Inst) const {
 /// Insert BoundaryAlignFragment before instructions to align branches.
 void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
                                          const MCInst &Inst, const MCSubtargetInfo &STI) {
-  CanPadInst = canPadInst(Inst, OS);
+  // Used by canPadInst. Done here, because in emitInstructionEnd, the current
+  // fragment will have changed.
+  IsRightAfterData =
+      isRightAfterData(OS.getCurrentFragment(), PrevInstPosition);
 
   if (!canPadBranches(OS))
     return;
 
+  // NB: PrevInst only valid if canPadBranches is true.
   if (!isMacroFused(PrevInst, Inst))
     // Macro fusion doesn't happen indeed, clear the pending.
     PendingBA = nullptr;
 
-  if (!CanPadInst)
+  // When branch padding is enabled (basically the skx102 erratum => unlikely),
+  // we call canPadInst (not cheap) twice. However, in the common case, we can
+  // avoid unnecessary calls to that, as this is otherwise only used for
+  // relaxable fragments.
+  if (!canPadInst(Inst, OS))
     return;
 
   if (PendingBA && OS.getCurrentFragment()->getPrevNode() == PendingBA) {
@@ -557,16 +566,22 @@ void X86AsmBackend::emitInstructionBegin(MCObjectStreamer &OS,
 }
 
 /// Set the last fragment to be aligned for the BoundaryAlignFragment.
-void X86AsmBackend::emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) {
-  PrevInst = Inst;
+void X86AsmBackend::emitInstructionEnd(MCObjectStreamer &OS,
+                                       const MCInst &Inst) {
   MCFragment *CF = OS.getCurrentFragment();
-  PrevInstPosition = std::make_pair(CF, getSizeForInstFragment(CF));
   if (auto *F = dyn_cast_or_null<MCRelaxableFragment>(CF))
-    F->setAllowAutoPadding(CanPadInst);
+    F->setAllowAutoPadding(canPadInst(Inst, OS));
+
+  // Update PrevInstOpcode here, canPadInst() reads that.
+  PrevInstOpcode = Inst.getOpcode();
+  PrevInstPosition = std::make_pair(CF, getSizeForInstFragment(CF));
 
   if (!canPadBranches(OS))
     return;
 
+  // PrevInst is only needed if canPadBranches. Copying an MCInst isn't cheap.
+  PrevInst = Inst;
+
   if (!needAlign(Inst) || !PendingBA)
     return;
 

@@ -267,8 +268,8 @@ static bool isRIPRelative(const MCInst &MI, const MCInstrInfo &MCII) {
}

/// Check if the instruction is a prefix.
static bool isPrefix(const MCInst &MI, const MCInstrInfo &MCII) {
return X86II::isPrefix(MCII.get(MI.getOpcode()).TSFlags);
static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII) {
Copy link
Contributor

@KanRobert KanRobert Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not understand why this is better. It's reference not a copy here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's for this line in X86AsmBackend::canPadInst

if (isPrefix(PrevInstOpcode, *MCII))

@MaskRay
Copy link
Member

MaskRay commented Jun 11, 2024

Avoiding PrevInst copy has a noticeable compile time improvement, more pronounced with -O0 when assembler time dominates.

% hyperfine --warmup 1 --min-runs 10 '/tmp/out/custom2/bin/clang -c sqlite3.i -w' # no 94947
Benchmark 1: /tmp/out/custom2/bin/clang -c sqlite3.i -w
  Time (mean ± σ):     514.6 ms ±   2.5 ms    [User: 497.8 ms, System: 16.7 ms]
  Range (min … max):   510.5 ms … 517.5 ms    10 runs

% hyperfine --warmup 1 --min-runs 10 '/tmp/out/custom2/bin/clang -c sqlite3.i -w' # 94947
Benchmark 1: /tmp/out/custom2/bin/clang -c sqlite3.i -w
  Time (mean ± σ):     507.5 ms ±   1.9 ms    [User: 485.6 ms, System: 21.8 ms]
  Range (min … max):   504.1 ms … 509.9 ms    10 runs

@aengelke aengelke merged commit fc6e97c into llvm:main Jun 11, 2024
9 checks passed
@aengelke aengelke deleted the perf/mc-no-mcinst-copy branch June 11, 2024 11:10
Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Jun 12, 2024
Copying an MCInst isn't cheap (copies all operands) and the whole
instruction is only used for the Intel erratum mitigation, which is off
by default. In all other cases, the opcode alone suffices.

This slightly pessimizes code that uses moves to segment registers --
but that's uncommon and not performance-sensitive anyway.

As a related change, also call canPadInst() only when the result is
actually used, which is typically only the case in emitInstrEnd.

This gives a minor performance improvement.
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants