[LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter #136456

jurahul · 2025-04-19T19:58:14Z

Add command line option num-to-skip-size to parameterize the size of NumToSkip bytes in the decoder table. Default value will be 2, and targets that need larger size can use 3.
Keep all existing targets, except AArch64, to use size 2, and change AArch64 to use size 3 since it run into the "disassembler decoding table too large" error with size 2.
Additional fixes on top of earlier revert: mark decodeNumToSkip as static (not necessary anymore as the generated code is now in anonymous namespace, but doing it for consistency) and incorporate Bazel build changes from Port a new flag for AArch64 tablegen to Bazel rules #136212
Following is a rough reduction in size for the decoder tables by switching to size 2.

Target         Old Size   New Size   % Reduction
================================================
AArch64           153254     153254        0.00
AMDGPU            471566     412805       12.46
ARC                 5724       5061       11.58
ARM                84936      73831       13.07
AVR                 1497       1306       12.76
BPF                 2172       1927       11.28
CSKY               10064       8692       13.63
Hexagon            47967      41965       12.51
Lanai               1108        982       11.37
LoongArch          24446      21621       11.56
MSP430              4200       3716       11.52
Mips               36330      31415       13.53
PPC                31897      28098       11.91
RISCV              37979      32790       13.66
Sparc               8331       7252       12.95
SystemZ            36722      32248       12.18
VE                 48296      42873       11.23
XCore               2590       2316       10.58
Xtensa              3827       3316       13.35

…mitter" (llvm#136017)" (llvm#136068) This reverts commit 6d8bf3c.

llvmbot · 2025-04-19T23:58:35Z

@llvm/pr-subscribers-tablegen

Author: Rahul Joshi (jurahul)

Changes

Add command line option num-to-skip-size to parameterize the size of NumToSkip bytes in the decoder table. Default value will be 2, and targets that need larger size can use 3.
Keep all existing targets, except AArch64, to use size 2, and change AArch64 to use size 3 since it run into the "disassembler decoding table too large" error with size 2.
Additional fixes on top of earlier revert: mark decodeNumToSkip as static (not necessary anymore as the generated code is now in anonymous namespace, but doing it for consistency) and incorporate Bazel build changes from Port a new flag for AArch64 tablegen to Bazel rules #136212
Following is a rough reduction in size for the decoder tables by switching to size 2.

Target         Old Size   New Size   % Reduction
================================================
AArch64           153254     153254        0.00
AMDGPU            471566     412805       12.46
ARC                 5724       5061       11.58
ARM                84936      73831       13.07
AVR                 1497       1306       12.76
BPF                 2172       1927       11.28
CSKY               10064       8692       13.63
Hexagon            47967      41965       12.51
Lanai               1108        982       11.37
LoongArch          24446      21621       11.56
MSP430              4200       3716       11.52
Mips               36330      31415       13.53
PPC                31897      28098       11.91
RISCV              37979      32790       13.66
Sparc               8331       7252       12.95
SystemZ            36722      32248       12.18
VE                 48296      42873       11.23
XCore               2590       2316       10.58
Xtensa              3827       3316       13.35

Full diff: https://github.com/llvm/llvm-project/pull/136456.diff

8 Files Affected:

(modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1-1)
(modified) llvm/test/TableGen/VarLenDecoder.td (+2-2)
(modified) llvm/test/TableGen/trydecode-emission.td (+5-5)
(modified) llvm/test/TableGen/trydecode-emission2.td (+8-8)
(modified) llvm/test/TableGen/trydecode-emission3.td (+1-1)
(modified) llvm/test/TableGen/trydecode-emission4.td (+1-1)
(modified) llvm/utils/TableGen/DecoderEmitter.cpp (+65-50)
(modified) utils/bazel/llvm-project-overlay/llvm/BUILD.bazel (+1-1)

diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 2300e479bc110..ba1d1605ec104 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -7,7 +7,7 @@ tablegen(LLVM AArch64GenAsmWriter.inc -gen-asm-writer)
 tablegen(LLVM AArch64GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
 tablegen(LLVM AArch64GenCallingConv.inc -gen-callingconv)
 tablegen(LLVM AArch64GenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AArch64GenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AArch64GenDisassemblerTables.inc -gen-disassembler --num-to-skip-size=3)
 tablegen(LLVM AArch64GenFastISel.inc -gen-fast-isel)
 tablegen(LLVM AArch64GenGlobalISel.inc -gen-global-isel)
 tablegen(LLVM AArch64GenO0PreLegalizeGICombiner.inc -gen-global-isel-combiner
diff --git a/llvm/test/TableGen/VarLenDecoder.td b/llvm/test/TableGen/VarLenDecoder.td
index 5cf0bf8911859..b77702ff7c5c1 100644
--- a/llvm/test/TableGen/VarLenDecoder.td
+++ b/llvm/test/TableGen/VarLenDecoder.td
@@ -47,9 +47,9 @@ def FOO32 : MyVarInst<MemOp32> {
 }
 
 // CHECK:      MCD::OPC_ExtractField, 3, 5,  // Inst{7-3} ...
-// CHECK-NEXT: MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12
+// CHECK-NEXT: MCD::OPC_FilterValue, 8, 4, 0, // Skip to: 11
 // CHECK-NEXT: MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 0, // Opcode: FOO16
-// CHECK-NEXT: MCD::OPC_FilterValue, 9, 4, 0, 0, // Skip to: 21
+// CHECK-NEXT: MCD::OPC_FilterValue, 9, 4, 0, // Skip to: 19
 // CHECK-NEXT: MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: FOO32
 // CHECK-NEXT: MCD::OPC_Fail,
 
diff --git a/llvm/test/TableGen/trydecode-emission.td b/llvm/test/TableGen/trydecode-emission.td
index 20d2446eeac7f..2b4239f4fbe65 100644
--- a/llvm/test/TableGen/trydecode-emission.td
+++ b/llvm/test/TableGen/trydecode-emission.td
@@ -34,10 +34,10 @@ def InstB : TestInstruction {
 }
 
 // CHECK:      /* 0 */       MCD::OPC_ExtractField, 4, 4,  // Inst{7-4} ...
-// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 18, 0, 0, // Skip to: 26
-// CHECK-NEXT: /* 8 */       MCD::OPC_CheckField, 2, 2, 0, 7, 0, 0, // Skip to: 22
-// CHECK-NEXT: /* 15 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, 0, // Opcode: InstB, skip to: 22
-// CHECK-NEXT: /* 22 */      MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: InstA
-// CHECK-NEXT: /* 26 */      MCD::OPC_Fail,
+// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 16, 0, // Skip to: 23
+// CHECK-NEXT: /* 7 */       MCD::OPC_CheckField, 2, 2, 0, 6, 0, // Skip to: 19
+// CHECK-NEXT: /* 13 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, // Opcode: InstB, skip to: 19
+// CHECK-NEXT: /* 19 */      MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: InstA
+// CHECK-NEXT: /* 23 */      MCD::OPC_Fail,
 
 // CHECK: if (!Check(S, DecodeInstB(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
diff --git a/llvm/test/TableGen/trydecode-emission2.td b/llvm/test/TableGen/trydecode-emission2.td
index 0584034e41233..7d30474058f73 100644
--- a/llvm/test/TableGen/trydecode-emission2.td
+++ b/llvm/test/TableGen/trydecode-emission2.td
@@ -31,14 +31,14 @@ def InstB : TestInstruction {
 }
 
 // CHECK:      /* 0 */       MCD::OPC_ExtractField, 2, 1,  // Inst{2} ...
-// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 36, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 8 */       MCD::OPC_ExtractField, 5, 3,  // Inst{7-5} ...
-// CHECK-NEXT: /* 11 */      MCD::OPC_FilterValue, 0, 28, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 16 */      MCD::OPC_CheckField, 0, 2, 3, 7, 0, 0, // Skip to: 30
-// CHECK-NEXT: /* 23 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, 0, // Opcode: InstB, skip to: 30
-// CHECK-NEXT: /* 30 */      MCD::OPC_CheckField, 3, 2, 0, 7, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 37 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 1, 0, 0, 0, // Opcode: InstA, skip to: 44
-// CHECK-NEXT: /* 44 */      MCD::OPC_Fail,
+// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 31, 0, // Skip to: 38
+// CHECK-NEXT: /* 7 */       MCD::OPC_ExtractField, 5, 3,  // Inst{7-5} ...
+// CHECK-NEXT: /* 10 */      MCD::OPC_FilterValue, 0, 24, 0, // Skip to: 38
+// CHECK-NEXT: /* 14 */      MCD::OPC_CheckField, 0, 2, 3, 6, 0, // Skip to: 26
+// CHECK-NEXT: /* 20 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, // Opcode: InstB, skip to: 26
+// CHECK-NEXT: /* 26 */      MCD::OPC_CheckField, 3, 2, 0, 6, 0, // Skip to: 38
+// CHECK-NEXT: /* 32 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 1, 0, 0, // Opcode: InstA, skip to: 38
+// CHECK-NEXT: /* 38 */      MCD::OPC_Fail,
 
 // CHECK: if (!Check(S, DecodeInstB(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
 // CHECK: if (!Check(S, DecodeInstA(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
diff --git a/llvm/test/TableGen/trydecode-emission3.td b/llvm/test/TableGen/trydecode-emission3.td
index 4c5be7e1af229..0abbe62fe337e 100644
--- a/llvm/test/TableGen/trydecode-emission3.td
+++ b/llvm/test/TableGen/trydecode-emission3.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s
+ // RUN: llvm-tblgen -gen-disassembler --num-to-skip-size=3 -I %p/../../include %s  | FileCheck %s
 
 include "llvm/Target/Target.td"
 
diff --git a/llvm/test/TableGen/trydecode-emission4.td b/llvm/test/TableGen/trydecode-emission4.td
index 1e51ba5e40768..413e4a0d1275a 100644
--- a/llvm/test/TableGen/trydecode-emission4.td
+++ b/llvm/test/TableGen/trydecode-emission4.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s
+// RUN: llvm-tblgen -gen-disassembler --num-to-skip-size=3 -I %p/../../include %s | FileCheck %s
 
 // Test for OPC_ExtractField/OPC_CheckField with start bit > 255.
 // These large start values may arise for architectures with long instruction
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 7d63126f65ac4..478ea881aa237 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -32,8 +32,10 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/FormattedStream.h"
 #include "llvm/Support/LEB128.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
@@ -76,6 +78,12 @@ static cl::opt<SuppressLevel> DecoderEmitterSuppressDuplicates(
             "significantly reducing Table Duplications")),
     cl::init(SUPPRESSION_DISABLE), cl::cat(DisassemblerEmitterCat));
 
+static cl::opt<uint32_t>
+    NumToSkipSizeInBytes("num-to-skip-size",
+                         cl::desc("number of bytes to use for num-to-skip "
+                                  "entries in the decoder table (2 or 3)"),
+                         cl::init(2), cl::cat(DisassemblerEmitterCat));
+
 STATISTIC(NumEncodings, "Number of encodings considered");
 STATISTIC(NumEncodingsLackingDisasm,
           "Number of encodings without disassembler info");
@@ -130,10 +138,29 @@ struct DecoderTable : public std::vector<uint8_t> {
   // in the table for patching.
   size_t insertNumToSkip() {
     size_t Size = size();
-    insert(end(), 3, 0);
+    insert(end(), NumToSkipSizeInBytes, 0);
     return Size;
   }
+
+  void patchNumToSkip(size_t FixupIdx, uint32_t DestIdx) {
+    // Calculate the distance from the byte following the fixup entry byte
+    // to the destination. The Target is calculated from after the
+    // `NumToSkipSizeInBytes`-byte NumToSkip entry itself, so subtract
+    // `NumToSkipSizeInBytes` from the displacement here to account for that.
+    assert(DestIdx >= FixupIdx + NumToSkipSizeInBytes &&
+           "Expecting a forward jump in the decoding table");
+    uint32_t Delta = DestIdx - FixupIdx - NumToSkipSizeInBytes;
+    if (!isUIntN(8 * NumToSkipSizeInBytes, Delta))
+      PrintFatalError(
+          "disassembler decoding table too large, try --num-to-skip-size=3");
+
+    (*this)[FixupIdx] = static_cast<uint8_t>(Delta);
+    (*this)[FixupIdx + 1] = static_cast<uint8_t>(Delta >> 8);
+    if (NumToSkipSizeInBytes == 3)
+      (*this)[FixupIdx + 2] = static_cast<uint8_t>(Delta >> 16);
+  }
 };
+
 struct DecoderTableInfo {
   DecoderTable Table;
   FixupScopeList FixupStack;
@@ -690,19 +717,8 @@ static void resolveTableFixups(DecoderTable &Table, const FixupList &Fixups,
                                uint32_t DestIdx) {
   // Any NumToSkip fixups in the current scope can resolve to the
   // current location.
-  for (uint32_t FixupIdx : reverse(Fixups)) {
-    // Calculate the distance from the byte following the fixup entry byte
-    // to the destination. The Target is calculated from after the 24-bit
-    // NumToSkip entry itself, so subtract three from the displacement here
-    // to account for that.
-    uint32_t Delta = DestIdx - FixupIdx - 3;
-    // Our NumToSkip entries are 24-bits. Make sure our table isn't too
-    // big.
-    assert(isUInt<24>(Delta));
-    Table[FixupIdx] = (uint8_t)Delta;
-    Table[FixupIdx + 1] = (uint8_t)(Delta >> 8);
-    Table[FixupIdx + 2] = (uint8_t)(Delta >> 16);
-  }
+  for (uint32_t FixupIdx : Fixups)
+    Table.patchNumToSkip(FixupIdx, DestIdx);
 }
 
 // Emit table entries to decode instructions given a segment or segments
@@ -759,15 +775,9 @@ void Filter::emitTableEntry(DecoderTableInfo &TableInfo) const {
     Delegate->emitTableEntries(TableInfo);
 
     // Now that we've emitted the body of the handler, update the NumToSkip
-    // of the filter itself to be able to skip forward when false. Subtract
-    // three as to account for the width of the NumToSkip field itself.
-    if (PrevFilter) {
-      uint32_t NumToSkip = Table.size() - PrevFilter - 3;
-      assert(isUInt<24>(NumToSkip) && "disassembler decoding table too large!");
-      Table[PrevFilter] = (uint8_t)NumToSkip;
-      Table[PrevFilter + 1] = (uint8_t)(NumToSkip >> 8);
-      Table[PrevFilter + 2] = (uint8_t)(NumToSkip >> 16);
-    }
+    // of the filter itself to be able to skip forward when false.
+    if (PrevFilter)
+      Table.patchNumToSkip(PrevFilter, Table.size());
   }
 
   // If there is no fallthrough, then the final filter should get fixed
@@ -814,7 +824,8 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
     OS << (unsigned)*I++ << ", ";
   };
 
-  // Emit 24-bit numtoskip value to OS, returning the NumToSkip value.
+  // Emit `NumToSkipSizeInBytes`-byte numtoskip value to OS, returning the
+  // NumToSkip value.
   auto emitNumToSkip = [](DecoderTable::const_iterator &I,
                           formatted_raw_ostream &OS) {
     uint8_t Byte = *I++;
@@ -823,9 +834,11 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
     Byte = *I++;
     OS << (unsigned)Byte << ", ";
     NumToSkip |= Byte << 8;
-    Byte = *I++;
-    OS << (unsigned)(Byte) << ", ";
-    NumToSkip |= Byte << 16;
+    if (NumToSkipSizeInBytes == 3) {
+      Byte = *I++;
+      OS << (unsigned)(Byte) << ", ";
+      NumToSkip |= Byte << 16;
+    }
     return NumToSkip;
   };
 
@@ -866,7 +879,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       // The filter value is ULEB128 encoded.
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -881,7 +894,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       // ULEB128 encoded field value.
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -890,7 +903,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       OS << Indent << "MCD::OPC_CheckPredicate, ";
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -920,7 +933,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
 
       // Fallthrough for OPC_TryDecode.
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
 
       OS << "// Opcode: " << NumberedEncodings[EncodingID]
@@ -1392,9 +1405,9 @@ void FilterChooser::emitSingletonTableEntry(DecoderTableInfo &TableInfo,
     TableInfo.Table.push_back(NumBits);
     TableInfo.Table.insertULEB128(Ilnd.FieldVal);
 
-    // The fixup is always 24-bits, so go ahead and allocate the space
-    // in the table so all our relative position calculations work OK even
-    // before we fully resolve the real value here.
+    // Allocate space in the table for fixup (NumToSkipSizeInBytes) so all
+    // our relative position calculations work OK even before we fully
+    // resolve the real value here.
 
     // Push location for NumToSkip backpatching.
     TableInfo.FixupStack.back().push_back(TableInfo.Table.insertNumToSkip());
@@ -2138,7 +2151,18 @@ insertBits(InsnType &field, uint64_t bits, unsigned startBit, unsigned numBits)
 // decodeInstruction().
 static void emitDecodeInstruction(formatted_raw_ostream &OS,
                                   bool IsVarLenInst) {
+  OS << formatv("\nconstexpr unsigned NumToSkipSizeInBytes = {};\n",
+                NumToSkipSizeInBytes);
+
   OS << R"(
+static unsigned decodeNumToSkip(const uint8_t *&Ptr) {
+  unsigned NumToSkip = *Ptr++;
+  NumToSkip |= (*Ptr++) << 8;
+  if constexpr (NumToSkipSizeInBytes == 3)
+    NumToSkip |= (*Ptr++) << 16;
+  return NumToSkip;
+}
+
 template <typename InsnType>
 static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
                                       InsnType insn, uint64_t Address,
@@ -2176,10 +2200,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       // Decode the field value.
       uint64_t Val = decodeULEB128AndIncUnsafe(Ptr);
       bool Failed = Val != CurFieldValue;
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // Perform the filter operation.
       if (Failed)
@@ -2203,10 +2224,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       uint64_t ExpectedValue = decodeULEB128(++Ptr, &PtrLen);
       Ptr += PtrLen;
       bool Failed = ExpectedValue != FieldValue;
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // If the actual and expected values don't match, skip.
       if (Failed)
@@ -2221,10 +2239,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
     case MCD::OPC_CheckPredicate: {
       // Decode the Predicate Index value.
       unsigned PIdx = decodeULEB128AndIncUnsafe(Ptr);
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
       // Check the predicate.
       bool Failed = !checkDecoderPredicate(PIdx, Bits);
       if (Failed)
@@ -2259,10 +2274,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       // Decode the Opcode value.
       unsigned Opc = decodeULEB128AndIncUnsafe(Ptr);
       unsigned DecodeIdx = decodeULEB128AndIncUnsafe(Ptr);
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // Perform the decode operation.
       MCInst TmpMI;
@@ -2387,6 +2399,9 @@ handleHwModesUnrelatedEncodings(const CodeGenInstruction *Instr,
 
 // Emits disassembler code for instruction decoding.
 void DecoderEmitter::run(raw_ostream &o) {
+  if (NumToSkipSizeInBytes != 2 && NumToSkipSizeInBytes != 3)
+    PrintFatalError("Invalid value for num-to-skip-size, must be 2 or 3");
+
   formatted_raw_ostream OS(o);
   OS << R"(
 #include "llvm/MC/MCInst.h"
diff --git a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
index b77ddf634eec6..1bfb07eec6f3a 100644
--- a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
@@ -2148,7 +2148,7 @@ llvm_target_lib_list = [lib for lib in [
                 "lib/Target/AArch64/AArch64GenSubtargetInfo.inc",
             ),
             (
-                ["-gen-disassembler"],
+                ["-gen-disassembler", "--num-to-skip-size=3"],
                 "lib/Target/AArch64/AArch64GenDisassemblerTables.inc",
             ),
             (

llvmbot · 2025-04-19T23:58:35Z

@llvm/pr-subscribers-backend-aarch64

Author: Rahul Joshi (jurahul)

Changes

Add command line option num-to-skip-size to parameterize the size of NumToSkip bytes in the decoder table. Default value will be 2, and targets that need larger size can use 3.
Keep all existing targets, except AArch64, to use size 2, and change AArch64 to use size 3 since it run into the "disassembler decoding table too large" error with size 2.
Additional fixes on top of earlier revert: mark decodeNumToSkip as static (not necessary anymore as the generated code is now in anonymous namespace, but doing it for consistency) and incorporate Bazel build changes from Port a new flag for AArch64 tablegen to Bazel rules #136212
Following is a rough reduction in size for the decoder tables by switching to size 2.

Target         Old Size   New Size   % Reduction
================================================
AArch64           153254     153254        0.00
AMDGPU            471566     412805       12.46
ARC                 5724       5061       11.58
ARM                84936      73831       13.07
AVR                 1497       1306       12.76
BPF                 2172       1927       11.28
CSKY               10064       8692       13.63
Hexagon            47967      41965       12.51
Lanai               1108        982       11.37
LoongArch          24446      21621       11.56
MSP430              4200       3716       11.52
Mips               36330      31415       13.53
PPC                31897      28098       11.91
RISCV              37979      32790       13.66
Sparc               8331       7252       12.95
SystemZ            36722      32248       12.18
VE                 48296      42873       11.23
XCore               2590       2316       10.58
Xtensa              3827       3316       13.35

Full diff: https://github.com/llvm/llvm-project/pull/136456.diff

8 Files Affected:

(modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1-1)
(modified) llvm/test/TableGen/VarLenDecoder.td (+2-2)
(modified) llvm/test/TableGen/trydecode-emission.td (+5-5)
(modified) llvm/test/TableGen/trydecode-emission2.td (+8-8)
(modified) llvm/test/TableGen/trydecode-emission3.td (+1-1)
(modified) llvm/test/TableGen/trydecode-emission4.td (+1-1)
(modified) llvm/utils/TableGen/DecoderEmitter.cpp (+65-50)
(modified) utils/bazel/llvm-project-overlay/llvm/BUILD.bazel (+1-1)

diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 2300e479bc110..ba1d1605ec104 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -7,7 +7,7 @@ tablegen(LLVM AArch64GenAsmWriter.inc -gen-asm-writer)
 tablegen(LLVM AArch64GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
 tablegen(LLVM AArch64GenCallingConv.inc -gen-callingconv)
 tablegen(LLVM AArch64GenDAGISel.inc -gen-dag-isel)
-tablegen(LLVM AArch64GenDisassemblerTables.inc -gen-disassembler)
+tablegen(LLVM AArch64GenDisassemblerTables.inc -gen-disassembler --num-to-skip-size=3)
 tablegen(LLVM AArch64GenFastISel.inc -gen-fast-isel)
 tablegen(LLVM AArch64GenGlobalISel.inc -gen-global-isel)
 tablegen(LLVM AArch64GenO0PreLegalizeGICombiner.inc -gen-global-isel-combiner
diff --git a/llvm/test/TableGen/VarLenDecoder.td b/llvm/test/TableGen/VarLenDecoder.td
index 5cf0bf8911859..b77702ff7c5c1 100644
--- a/llvm/test/TableGen/VarLenDecoder.td
+++ b/llvm/test/TableGen/VarLenDecoder.td
@@ -47,9 +47,9 @@ def FOO32 : MyVarInst<MemOp32> {
 }
 
 // CHECK:      MCD::OPC_ExtractField, 3, 5,  // Inst{7-3} ...
-// CHECK-NEXT: MCD::OPC_FilterValue, 8, 4, 0, 0, // Skip to: 12
+// CHECK-NEXT: MCD::OPC_FilterValue, 8, 4, 0, // Skip to: 11
 // CHECK-NEXT: MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 0, // Opcode: FOO16
-// CHECK-NEXT: MCD::OPC_FilterValue, 9, 4, 0, 0, // Skip to: 21
+// CHECK-NEXT: MCD::OPC_FilterValue, 9, 4, 0, // Skip to: 19
 // CHECK-NEXT: MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: FOO32
 // CHECK-NEXT: MCD::OPC_Fail,
 
diff --git a/llvm/test/TableGen/trydecode-emission.td b/llvm/test/TableGen/trydecode-emission.td
index 20d2446eeac7f..2b4239f4fbe65 100644
--- a/llvm/test/TableGen/trydecode-emission.td
+++ b/llvm/test/TableGen/trydecode-emission.td
@@ -34,10 +34,10 @@ def InstB : TestInstruction {
 }
 
 // CHECK:      /* 0 */       MCD::OPC_ExtractField, 4, 4,  // Inst{7-4} ...
-// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 18, 0, 0, // Skip to: 26
-// CHECK-NEXT: /* 8 */       MCD::OPC_CheckField, 2, 2, 0, 7, 0, 0, // Skip to: 22
-// CHECK-NEXT: /* 15 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, 0, // Opcode: InstB, skip to: 22
-// CHECK-NEXT: /* 22 */      MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: InstA
-// CHECK-NEXT: /* 26 */      MCD::OPC_Fail,
+// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 16, 0, // Skip to: 23
+// CHECK-NEXT: /* 7 */       MCD::OPC_CheckField, 2, 2, 0, 6, 0, // Skip to: 19
+// CHECK-NEXT: /* 13 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, // Opcode: InstB, skip to: 19
+// CHECK-NEXT: /* 19 */      MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // Opcode: InstA
+// CHECK-NEXT: /* 23 */      MCD::OPC_Fail,
 
 // CHECK: if (!Check(S, DecodeInstB(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
diff --git a/llvm/test/TableGen/trydecode-emission2.td b/llvm/test/TableGen/trydecode-emission2.td
index 0584034e41233..7d30474058f73 100644
--- a/llvm/test/TableGen/trydecode-emission2.td
+++ b/llvm/test/TableGen/trydecode-emission2.td
@@ -31,14 +31,14 @@ def InstB : TestInstruction {
 }
 
 // CHECK:      /* 0 */       MCD::OPC_ExtractField, 2, 1,  // Inst{2} ...
-// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 36, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 8 */       MCD::OPC_ExtractField, 5, 3,  // Inst{7-5} ...
-// CHECK-NEXT: /* 11 */      MCD::OPC_FilterValue, 0, 28, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 16 */      MCD::OPC_CheckField, 0, 2, 3, 7, 0, 0, // Skip to: 30
-// CHECK-NEXT: /* 23 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, 0, // Opcode: InstB, skip to: 30
-// CHECK-NEXT: /* 30 */      MCD::OPC_CheckField, 3, 2, 0, 7, 0, 0, // Skip to: 44
-// CHECK-NEXT: /* 37 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 1, 0, 0, 0, // Opcode: InstA, skip to: 44
-// CHECK-NEXT: /* 44 */      MCD::OPC_Fail,
+// CHECK-NEXT: /* 3 */       MCD::OPC_FilterValue, 0, 31, 0, // Skip to: 38
+// CHECK-NEXT: /* 7 */       MCD::OPC_ExtractField, 5, 3,  // Inst{7-5} ...
+// CHECK-NEXT: /* 10 */      MCD::OPC_FilterValue, 0, 24, 0, // Skip to: 38
+// CHECK-NEXT: /* 14 */      MCD::OPC_CheckField, 0, 2, 3, 6, 0, // Skip to: 26
+// CHECK-NEXT: /* 20 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 0, 0, 0, // Opcode: InstB, skip to: 26
+// CHECK-NEXT: /* 26 */      MCD::OPC_CheckField, 3, 2, 0, 6, 0, // Skip to: 38
+// CHECK-NEXT: /* 32 */      MCD::OPC_TryDecode, {{[0-9]+}}, {{[0-9]+}}, 1, 0, 0, // Opcode: InstA, skip to: 38
+// CHECK-NEXT: /* 38 */      MCD::OPC_Fail,
 
 // CHECK: if (!Check(S, DecodeInstB(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
 // CHECK: if (!Check(S, DecodeInstA(MI, insn, Address, Decoder))) { DecodeComplete = false; return MCDisassembler::Fail; }
diff --git a/llvm/test/TableGen/trydecode-emission3.td b/llvm/test/TableGen/trydecode-emission3.td
index 4c5be7e1af229..0abbe62fe337e 100644
--- a/llvm/test/TableGen/trydecode-emission3.td
+++ b/llvm/test/TableGen/trydecode-emission3.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s
+ // RUN: llvm-tblgen -gen-disassembler --num-to-skip-size=3 -I %p/../../include %s  | FileCheck %s
 
 include "llvm/Target/Target.td"
 
diff --git a/llvm/test/TableGen/trydecode-emission4.td b/llvm/test/TableGen/trydecode-emission4.td
index 1e51ba5e40768..413e4a0d1275a 100644
--- a/llvm/test/TableGen/trydecode-emission4.td
+++ b/llvm/test/TableGen/trydecode-emission4.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-disassembler -I %p/../../include %s | FileCheck %s
+// RUN: llvm-tblgen -gen-disassembler --num-to-skip-size=3 -I %p/../../include %s | FileCheck %s
 
 // Test for OPC_ExtractField/OPC_CheckField with start bit > 255.
 // These large start values may arise for architectures with long instruction
diff --git a/llvm/utils/TableGen/DecoderEmitter.cpp b/llvm/utils/TableGen/DecoderEmitter.cpp
index 7d63126f65ac4..478ea881aa237 100644
--- a/llvm/utils/TableGen/DecoderEmitter.cpp
+++ b/llvm/utils/TableGen/DecoderEmitter.cpp
@@ -32,8 +32,10 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/FormattedStream.h"
 #include "llvm/Support/LEB128.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
@@ -76,6 +78,12 @@ static cl::opt<SuppressLevel> DecoderEmitterSuppressDuplicates(
             "significantly reducing Table Duplications")),
     cl::init(SUPPRESSION_DISABLE), cl::cat(DisassemblerEmitterCat));
 
+static cl::opt<uint32_t>
+    NumToSkipSizeInBytes("num-to-skip-size",
+                         cl::desc("number of bytes to use for num-to-skip "
+                                  "entries in the decoder table (2 or 3)"),
+                         cl::init(2), cl::cat(DisassemblerEmitterCat));
+
 STATISTIC(NumEncodings, "Number of encodings considered");
 STATISTIC(NumEncodingsLackingDisasm,
           "Number of encodings without disassembler info");
@@ -130,10 +138,29 @@ struct DecoderTable : public std::vector<uint8_t> {
   // in the table for patching.
   size_t insertNumToSkip() {
     size_t Size = size();
-    insert(end(), 3, 0);
+    insert(end(), NumToSkipSizeInBytes, 0);
     return Size;
   }
+
+  void patchNumToSkip(size_t FixupIdx, uint32_t DestIdx) {
+    // Calculate the distance from the byte following the fixup entry byte
+    // to the destination. The Target is calculated from after the
+    // `NumToSkipSizeInBytes`-byte NumToSkip entry itself, so subtract
+    // `NumToSkipSizeInBytes` from the displacement here to account for that.
+    assert(DestIdx >= FixupIdx + NumToSkipSizeInBytes &&
+           "Expecting a forward jump in the decoding table");
+    uint32_t Delta = DestIdx - FixupIdx - NumToSkipSizeInBytes;
+    if (!isUIntN(8 * NumToSkipSizeInBytes, Delta))
+      PrintFatalError(
+          "disassembler decoding table too large, try --num-to-skip-size=3");
+
+    (*this)[FixupIdx] = static_cast<uint8_t>(Delta);
+    (*this)[FixupIdx + 1] = static_cast<uint8_t>(Delta >> 8);
+    if (NumToSkipSizeInBytes == 3)
+      (*this)[FixupIdx + 2] = static_cast<uint8_t>(Delta >> 16);
+  }
 };
+
 struct DecoderTableInfo {
   DecoderTable Table;
   FixupScopeList FixupStack;
@@ -690,19 +717,8 @@ static void resolveTableFixups(DecoderTable &Table, const FixupList &Fixups,
                                uint32_t DestIdx) {
   // Any NumToSkip fixups in the current scope can resolve to the
   // current location.
-  for (uint32_t FixupIdx : reverse(Fixups)) {
-    // Calculate the distance from the byte following the fixup entry byte
-    // to the destination. The Target is calculated from after the 24-bit
-    // NumToSkip entry itself, so subtract three from the displacement here
-    // to account for that.
-    uint32_t Delta = DestIdx - FixupIdx - 3;
-    // Our NumToSkip entries are 24-bits. Make sure our table isn't too
-    // big.
-    assert(isUInt<24>(Delta));
-    Table[FixupIdx] = (uint8_t)Delta;
-    Table[FixupIdx + 1] = (uint8_t)(Delta >> 8);
-    Table[FixupIdx + 2] = (uint8_t)(Delta >> 16);
-  }
+  for (uint32_t FixupIdx : Fixups)
+    Table.patchNumToSkip(FixupIdx, DestIdx);
 }
 
 // Emit table entries to decode instructions given a segment or segments
@@ -759,15 +775,9 @@ void Filter::emitTableEntry(DecoderTableInfo &TableInfo) const {
     Delegate->emitTableEntries(TableInfo);
 
     // Now that we've emitted the body of the handler, update the NumToSkip
-    // of the filter itself to be able to skip forward when false. Subtract
-    // three as to account for the width of the NumToSkip field itself.
-    if (PrevFilter) {
-      uint32_t NumToSkip = Table.size() - PrevFilter - 3;
-      assert(isUInt<24>(NumToSkip) && "disassembler decoding table too large!");
-      Table[PrevFilter] = (uint8_t)NumToSkip;
-      Table[PrevFilter + 1] = (uint8_t)(NumToSkip >> 8);
-      Table[PrevFilter + 2] = (uint8_t)(NumToSkip >> 16);
-    }
+    // of the filter itself to be able to skip forward when false.
+    if (PrevFilter)
+      Table.patchNumToSkip(PrevFilter, Table.size());
   }
 
   // If there is no fallthrough, then the final filter should get fixed
@@ -814,7 +824,8 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
     OS << (unsigned)*I++ << ", ";
   };
 
-  // Emit 24-bit numtoskip value to OS, returning the NumToSkip value.
+  // Emit `NumToSkipSizeInBytes`-byte numtoskip value to OS, returning the
+  // NumToSkip value.
   auto emitNumToSkip = [](DecoderTable::const_iterator &I,
                           formatted_raw_ostream &OS) {
     uint8_t Byte = *I++;
@@ -823,9 +834,11 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
     Byte = *I++;
     OS << (unsigned)Byte << ", ";
     NumToSkip |= Byte << 8;
-    Byte = *I++;
-    OS << (unsigned)(Byte) << ", ";
-    NumToSkip |= Byte << 16;
+    if (NumToSkipSizeInBytes == 3) {
+      Byte = *I++;
+      OS << (unsigned)(Byte) << ", ";
+      NumToSkip |= Byte << 16;
+    }
     return NumToSkip;
   };
 
@@ -866,7 +879,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       // The filter value is ULEB128 encoded.
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -881,7 +894,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       // ULEB128 encoded field value.
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -890,7 +903,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
       OS << Indent << "MCD::OPC_CheckPredicate, ";
       emitULEB128(I, OS);
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
       OS << "// Skip to: " << ((I - Table.begin()) + NumToSkip) << "\n";
       break;
@@ -920,7 +933,7 @@ void DecoderEmitter::emitTable(formatted_raw_ostream &OS, DecoderTable &Table,
 
       // Fallthrough for OPC_TryDecode.
 
-      // 24-bit numtoskip value.
+      // numtoskip value.
       uint32_t NumToSkip = emitNumToSkip(I, OS);
 
       OS << "// Opcode: " << NumberedEncodings[EncodingID]
@@ -1392,9 +1405,9 @@ void FilterChooser::emitSingletonTableEntry(DecoderTableInfo &TableInfo,
     TableInfo.Table.push_back(NumBits);
     TableInfo.Table.insertULEB128(Ilnd.FieldVal);
 
-    // The fixup is always 24-bits, so go ahead and allocate the space
-    // in the table so all our relative position calculations work OK even
-    // before we fully resolve the real value here.
+    // Allocate space in the table for fixup (NumToSkipSizeInBytes) so all
+    // our relative position calculations work OK even before we fully
+    // resolve the real value here.
 
     // Push location for NumToSkip backpatching.
     TableInfo.FixupStack.back().push_back(TableInfo.Table.insertNumToSkip());
@@ -2138,7 +2151,18 @@ insertBits(InsnType &field, uint64_t bits, unsigned startBit, unsigned numBits)
 // decodeInstruction().
 static void emitDecodeInstruction(formatted_raw_ostream &OS,
                                   bool IsVarLenInst) {
+  OS << formatv("\nconstexpr unsigned NumToSkipSizeInBytes = {};\n",
+                NumToSkipSizeInBytes);
+
   OS << R"(
+static unsigned decodeNumToSkip(const uint8_t *&Ptr) {
+  unsigned NumToSkip = *Ptr++;
+  NumToSkip |= (*Ptr++) << 8;
+  if constexpr (NumToSkipSizeInBytes == 3)
+    NumToSkip |= (*Ptr++) << 16;
+  return NumToSkip;
+}
+
 template <typename InsnType>
 static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
                                       InsnType insn, uint64_t Address,
@@ -2176,10 +2200,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       // Decode the field value.
       uint64_t Val = decodeULEB128AndIncUnsafe(Ptr);
       bool Failed = Val != CurFieldValue;
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // Perform the filter operation.
       if (Failed)
@@ -2203,10 +2224,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       uint64_t ExpectedValue = decodeULEB128(++Ptr, &PtrLen);
       Ptr += PtrLen;
       bool Failed = ExpectedValue != FieldValue;
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // If the actual and expected values don't match, skip.
       if (Failed)
@@ -2221,10 +2239,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
     case MCD::OPC_CheckPredicate: {
       // Decode the Predicate Index value.
       unsigned PIdx = decodeULEB128AndIncUnsafe(Ptr);
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
       // Check the predicate.
       bool Failed = !checkDecoderPredicate(PIdx, Bits);
       if (Failed)
@@ -2259,10 +2274,7 @@ static DecodeStatus decodeInstruction(const uint8_t DecodeTable[], MCInst &MI,
       // Decode the Opcode value.
       unsigned Opc = decodeULEB128AndIncUnsafe(Ptr);
       unsigned DecodeIdx = decodeULEB128AndIncUnsafe(Ptr);
-      // NumToSkip is a plain 24-bit integer.
-      unsigned NumToSkip = *Ptr++;
-      NumToSkip |= (*Ptr++) << 8;
-      NumToSkip |= (*Ptr++) << 16;
+      unsigned NumToSkip = decodeNumToSkip(Ptr);
 
       // Perform the decode operation.
       MCInst TmpMI;
@@ -2387,6 +2399,9 @@ handleHwModesUnrelatedEncodings(const CodeGenInstruction *Instr,
 
 // Emits disassembler code for instruction decoding.
 void DecoderEmitter::run(raw_ostream &o) {
+  if (NumToSkipSizeInBytes != 2 && NumToSkipSizeInBytes != 3)
+    PrintFatalError("Invalid value for num-to-skip-size, must be 2 or 3");
+
   formatted_raw_ostream OS(o);
   OS << R"(
 #include "llvm/MC/MCInst.h"
diff --git a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
index b77ddf634eec6..1bfb07eec6f3a 100644
--- a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
@@ -2148,7 +2148,7 @@ llvm_target_lib_list = [lib for lib in [
                 "lib/Target/AArch64/AArch64GenSubtargetInfo.inc",
             ),
             (
-                ["-gen-disassembler"],
+                ["-gen-disassembler", "--num-to-skip-size=3"],
                 "lib/Target/AArch64/AArch64GenDisassemblerTables.inc",
             ),
             (

llvm/utils/TableGen/DecoderEmitter.cpp

llvm/test/TableGen/trydecode-emission3.td

s-barannikov

LGTM

llvm-ci · 2025-04-21T16:07:49Z

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm,utils at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/14805

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/io/TestDAP_io.py (1174 of 2956)
PASS: lldb-api :: tools/lldb-dap/locations/TestDAP_locations.py (1175 of 2956)
PASS: lldb-api :: tools/lldb-dap/memory/TestDAP_memory.py (1176 of 2956)
PASS: lldb-api :: tools/lldb-dap/optimized/TestDAP_optimized.py (1177 of 2956)
PASS: lldb-api :: tools/lldb-dap/output/TestDAP_output.py (1178 of 2956)
PASS: lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py (1179 of 2956)
PASS: lldb-api :: tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py (1180 of 2956)
PASS: lldb-api :: tools/lldb-dap/evaluate/TestDAP_evaluate.py (1181 of 2956)
UNSUPPORTED: lldb-api :: tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py (1182 of 2956)
UNRESOLVED: lldb-api :: tools/lldb-dap/module/TestDAP_module.py (1183 of 2956)
******************** TEST 'lldb-api :: tools/lldb-dap/module/TestDAP_module.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/module -p TestDAP_module.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a)
  clang revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a
  llvm revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_compile_units (TestDAP_module.TestDAP_module)
========= DEBUG ADAPTER PROTOCOL LOGS =========
1745251473.284274101 --> (stdin/stdout) {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}
1745251473.288561821 <-- (stdin/stdout) {"body":{"$__lldb_version":"lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a)\n  clang revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a\n  llvm revision e1bb7f6ddec37567230d3e46719aee5bcd268d5a","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"default":false,"filter":"cpp_catch","label":"C++ Catch"},{"default":false,"filter":"cpp_throw","label":"C++ Throw"},{"default":false,"filter":"objc_catch","label":"Objective-C Catch"},{"default":false,"filter":"objc_throw","label":"Objective-C Throw"}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionInfoRequest":true,"supportsExceptionOptions":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsRestartRequest":true,"supportsSetVariable":true,"supportsStepInTargetsRequest":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1745251473.289146900 --> (stdin/stdout) {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","initCommands":["settings clear -all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false,"commandEscapePrefix":null},"seq":2}
1745251473.289742231 <-- (stdin/stdout) {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}
1745251473.289798975 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings clear -all\n"},"event":"output","seq":0,"type":"event"}
1745251473.289813757 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}
1745251473.289827347 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}
1745251473.289841413 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":0,"type":"event"}
1745251473.289854765 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":0,"type":"event"}
1745251473.289866924 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":0,"type":"event"}
1745251473.289880276 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":0,"type":"event"}
1745251473.289917707 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":0,"type":"event"}
1745251473.289930582 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":0,"type":"event"}
1745251473.289942741 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":0,"type":"event"}
1745251473.509528875 <-- (stdin/stdout) {"command":"launch","request_seq":2,"seq":0,"success":true,"type":"response"}
1745251473.509677410 <-- (stdin/stdout) {"body":{"isLocalProcess":true,"name":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","startMethod":"launch","systemProcessId":3907194},"event":"process","seq":0,"type":"event"}
1745251473.509703875 <-- (stdin/stdout) {"event":"initialized","seq":0,"type":"event"}
1745251473.510438204 --> (stdin/stdout) {"command":"setBreakpoints","type":"request","arguments":{"source":{"name":"main.cpp","path":"main.cpp"},"sourceModified":false,"lines":[5],"breakpoints":[{"line":5}]},"seq":3}
1745251473.521255732 <-- (stdin/stdout) {"body":{"breakpoints":[{"column":3,"id":1,"instructionReference":"0x4A073C","line":5,"source":{"name":"main.cpp","path":"main.cpp"},"verified":true}]},"command":"setBreakpoints","request_seq":3,"seq":0,"success":true,"type":"response"}
1745251473.521397352 <-- (stdin/stdout) {"body":{"breakpoint":{"column":3,"id":1,"instructionReference":"0x4A073C","line":5,"verified":true},"reason":"changed"},"event":"breakpoint","seq":0,"type":"event"}

evodius96 · 2025-04-21T21:42:35Z

Note: this is failing for our downstream toolchain due to a hard-coded field, specfically due to '178' being used, whereas our downstream target ends up with '179'.

llvm-project/llvm/test/TableGen/VarLenDecoder.td:59:22: error: CHECK-LARGE-NEXT: expected string not found in input
// CHECK-LARGE-NEXT: /* 8 */ MCD::OPC_Decode, 178, 2, 0, // Opcode: FOO16
                     ^
<stdin>:78:1: note: possible intended match here
/* 8 */ MCD::OPC_Decode, 179, 2, 0, // Opcode: FOO16

Is it necessary to hard code the values? I observe other checks in these tests use regular expressions to avoid having to deal with hard-coded fields and opcodes.

topperc · 2025-04-21T21:47:12Z

Note: this is failing for our downstream toolchain due to a hard-coded field, specfically due to '178' being used, whereas our downstream target ends up with '179'.
llvm-project/llvm/test/TableGen/VarLenDecoder.td:59:22: error: CHECK-LARGE-NEXT: expected string not found in input
// CHECK-LARGE-NEXT: /* 8 */ MCD::OPC_Decode, 178, 2, 0, // Opcode: FOO16
                     ^
<stdin>:78:1: note: possible intended match here
/* 8 */ MCD::OPC_Decode, 179, 2, 0, // Opcode: FOO16
Is it necessary to hard code the values? I observe other checks in these tests use regular expressions to avoid having to deal with hard-coded fields and opcodes.

I agree this is fragile. This value is going to change anytime a new StandardPseudoInstruction is added to Target.td or a new GISel instruction is added to GenericOpcodes.td

jurahul · 2025-04-21T23:04:13Z

Note: this is failing for our downstream toolchain due to a hard-coded field, specfically due to '178' being used, whereas our downstream target ends up with '179'.
llvm-project/llvm/test/TableGen/VarLenDecoder.td:59:22: error: CHECK-LARGE-NEXT: expected string not found in input
// CHECK-LARGE-NEXT: /* 8 */ MCD::OPC_Decode, 178, 2, 0, // Opcode: FOO16
                     ^
<stdin>:78:1: note: possible intended match here
/* 8 */ MCD::OPC_Decode, 179, 2, 0, // Opcode: FOO16
Is it necessary to hard code the values? I observe other checks in these tests use regular expressions to avoid having to deal with hard-coded fields and opcodes.
I agree this is fragile. This value is going to change anytime a new StandardPseudoInstruction is added to Target.td or a new GISel instruction is added to GenericOpcodes.td

I didn't realize that the test used a RE instead of hard-coded values in the earlier form. Will fix the new test to use the same.

llvm-ci · 2025-04-21T23:10:31Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm,utils at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2759

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/87/95' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-15572-87-95.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=95 GTEST_SHARD_INDEX=87 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************

jurahul · 2025-04-21T23:13:47Z

#136632

evodius96 · 2025-04-22T15:41:16Z

Thank you! I gave one example, but all of your modified tests had this issue.

jurahul · 2025-04-22T15:43:46Z

I see. Will push a new fix for all other tests sometime today

…

On Tue, Apr 22, 2025 at 8:41 AM Alan Phipps ***@***.***> wrote: Thank you! I gave one example, but all of your modified tests had this issue. — Reply to this email directly, view it on GitHub <#136456 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APRMUB3W2C4EOCLHMIJLDC322ZPLNAVCNFSM6AAAAAB3O4KXSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMRRG4ZTQNBYGM> . You are receiving this because you modified the open/close state.Message ID: ***@***.***> *evodius96* left a comment (llvm/llvm-project#136456) <#136456 (comment)> Thank you! I gave one example, but all of your modified tests had this issue. — Reply to this email directly, view it on GitHub <#136456 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APRMUB3W2C4EOCLHMIJLDC322ZPLNAVCNFSM6AAAAAB3O4KXSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMRRG4ZTQNBYGM> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

- Add command line option `num-to-skip-size` to parameterize the size of `NumToSkip` bytes in the decoder table. Default value will be 2, and targets that need larger size can use 3. - Keep all existing targets, except AArch64, to use size 2, and change AArch64 to use size 3 since it run into the "disassembler decoding table too large" error with size 2. - Additional fixes on top of earlier revert: mark `decodeNumToSkip` as static (not necessary anymore as the generated code is now in anonymous namespace, but doing it for consistency) and incorporate Bazel build changes from llvm#136212 - Following is a rough reduction in size for the decoder tables by switching to size 2. ``` Target Old Size New Size % Reduction ================================================ AArch64 153254 153254 0.00 AMDGPU 471566 412805 12.46 ARC 5724 5061 11.58 ARM 84936 73831 13.07 AVR 1497 1306 12.76 BPF 2172 1927 11.28 CSKY 10064 8692 13.63 Hexagon 47967 41965 12.51 Lanai 1108 982 11.37 LoongArch 24446 21621 11.56 MSP430 4200 3716 11.52 Mips 36330 31415 13.53 PPC 31897 28098 11.91 RISCV 37979 32790 13.66 Sparc 8331 7252 12.95 SystemZ 36722 32248 12.18 VE 48296 42873 11.23 XCore 2590 2316 10.58 Xtensa 3827 3316 13.35 ```

Reapply "Reapply "[LLVM][TableGen] Parameterize NumToSkip in DecoderE…

dda3726

…mitter" (llvm#136017)" (llvm#136068) This reverts commit 6d8bf3c.

jurahul changed the title ~~Decoder emitter num skip param~~ [LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter Apr 19, 2025

jurahul mentioned this pull request Apr 19, 2025

Port a new flag for AArch64 tablegen to Bazel rules #136212

Closed

jurahul marked this pull request as ready for review April 19, 2025 23:58

jurahul requested review from rupprecht, keith and aaronmondal as code owners April 19, 2025 23:58

llvmbot added backend:AArch64 tablegen bazel "Peripheral" support tier build system: utils/bazel labels Apr 19, 2025

jurahul requested review from topperc and s-barannikov and removed request for keith, rupprecht and aaronmondal April 19, 2025 23:58

s-barannikov reviewed Apr 20, 2025

View reviewed changes

Fixes

a82cfbd

jurahul force-pushed the decoder_emitter_num_skip_param branch from 13adcf1 to a82cfbd Compare April 20, 2025 13:12

jurahul requested a review from s-barannikov April 20, 2025 16:34

s-barannikov approved these changes Apr 21, 2025

View reviewed changes

jurahul merged commit e1bb7f6 into llvm:main Apr 21, 2025
11 checks passed

jurahul deleted the decoder_emitter_num_skip_param branch April 21, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter #136456

[LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter #136456

Uh oh!

jurahul commented Apr 19, 2025 •

edited

Loading

Uh oh!

llvmbot commented Apr 19, 2025

Uh oh!

llvmbot commented Apr 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-barannikov left a comment

Uh oh!

Uh oh!

llvm-ci commented Apr 21, 2025

Uh oh!

evodius96 commented Apr 21, 2025

Uh oh!

topperc commented Apr 21, 2025

Uh oh!

jurahul commented Apr 21, 2025

Uh oh!

llvm-ci commented Apr 21, 2025

Uh oh!

jurahul commented Apr 21, 2025

Uh oh!

evodius96 commented Apr 22, 2025

Uh oh!

jurahul commented Apr 22, 2025 via email

Uh oh!

Uh oh!

[LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter #136456

[LLVM][TableGen] Parameterize NumToSkip in DecoderEmitter #136456

Uh oh!

Conversation

jurahul commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 19, 2025

Uh oh!

llvmbot commented Apr 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-barannikov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 21, 2025

Uh oh!

evodius96 commented Apr 21, 2025

Uh oh!

topperc commented Apr 21, 2025

Uh oh!

jurahul commented Apr 21, 2025

Uh oh!

llvm-ci commented Apr 21, 2025

Uh oh!

jurahul commented Apr 21, 2025

Uh oh!

evodius96 commented Apr 22, 2025

Uh oh!

jurahul commented Apr 22, 2025 via email

Uh oh!

Uh oh!

jurahul commented Apr 19, 2025 •

edited

Loading