Skip to content

[X86][MC] Report error when the instruction length exceeds 15 bytes #83708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 4, 2024

Conversation

KanRobert
Copy link
Contributor

@KanRobert KanRobert commented Mar 3, 2024

The instruction-size limit of 15 bytes still applies to APX
instructions.

Note that it is possible for an EVEX-encoded legacy instruction to reach
the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of
SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g.

addq    $184, -96, %rax   # encoding:
[0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]

If we added a segment prefix like fs, the length would be 16.

In such a case, no additional (ASIZE or segment override) prefix can be
used.

To help users find this issue earlier, especially for assembler users, we change
the internal compiler error to error in this patch.

Diagnostic is aligned with GAS https://sourceware.org/bugzilla/show_bug.cgi?id=31323

The instruction-size limit of 15 bytes still applies to APX
instructions.

Note that it is possible for an EVEX-encoded legacy instruction to reach
the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of
SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g.

```
addq    $184, -96, %rax   # encoding:
[0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]
```

If we added a segment prefix like fs, the length would be 16.

In such a case, no additional (ASIZE or segment override) prefix can be
used.

To help users find this issue earlier, we change the internal compiler
error to error in this patch.
@llvmbot llvmbot added backend:X86 mc Machine (object) code labels Mar 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-x86

Author: Shengchen Kan (KanRobert)

Changes

The instruction-size limit of 15 bytes still applies to APX
instructions.

Note that it is possible for an EVEX-encoded legacy instruction to reach
the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of
SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g.

addq    $184, -96, %rax   # encoding:
[0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]

If we added a segment prefix like fs, the length would be 16.

In such a case, no additional (ASIZE or segment override) prefix can be
used.

To help users find this issue earlier, we change the internal compiler
error to error in this patch.


Full diff: https://github.com/llvm/llvm-project/pull/83708.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp (+3-2)
  • (added) llvm/test/MC/X86/apx/long-instruction-err.s (+25)
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
index fdb11d1a408bb6..fd96a173af27e0 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
@@ -1924,8 +1924,9 @@ void X86MCCodeEmitter::encodeInstruction(const MCInst &MI,
   if ((TSFlags & X86II::OpMapMask) == X86II::ThreeDNow)
     emitByte(X86II::getBaseOpcodeFor(TSFlags), CB);
 
-  assert(CB.size() - StartByte <= 15 &&
-         "The size of instruction must be no longer than 15.");
+  if (CB.size() - StartByte > 15)
+    Ctx.reportError(MI.getLoc(),
+                    "The size of instruction must be no longer than 15.");
 #ifndef NDEBUG
   // FIXME: Verify.
   if (/*!Desc.isVariadic() &&*/ CurOp != NumOps) {
diff --git a/llvm/test/MC/X86/apx/long-instruction-err.s b/llvm/test/MC/X86/apx/long-instruction-err.s
new file mode 100644
index 00000000000000..a2210ecad81709
--- /dev/null
+++ b/llvm/test/MC/X86/apx/long-instruction-err.s
@@ -0,0 +1,25 @@
+# RUN: not llvm-mc -triple x86_64 -show-encoding %s 2>&1 | FileCheck %s
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: addq    $1234, %cs:-96, %rax
+addq    $1234, %cs:-96, %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: subq    $1234, %fs:257(%rbx, %rcx), %rax
+subq    $1234, %fs:257(%rbx, %rcx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: orq     $1234, 257(%ebx, %ecx), %rax
+orq     $1234, 257(%ebx, %ecx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: xorq    $1234, %gs:257(%ebx), %rax
+xorq    $1234, %gs:257(%ebx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: {nf} andq    $1234, %cs:-96
+{nf} andq    $1234, %cs:-96
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: {evex} adcq    $1234, %cs:-96
+{evex} adcq    $1234, %cs:-96

@e-kud
Copy link
Contributor

e-kud commented Mar 4, 2024

I haven't noticed that it was addressed somewhere. What about inline asm? It seems that now it bypasses all checks and fails only during encoding. Is it expected? Maybe we need to add additional checks?

P.S. This is almost irrelevant to the PR but is about To help users find this issue earlier

@KanRobert
Copy link
Contributor Author

I haven't noticed that it was addressed somewhere. What about inline asm? It seems that now it bypasses all checks and fails only during encoding. Is it expected? Maybe we need to add additional checks?

P.S. This is almost irrelevant to the PR but is about To help users find this issue earlier

It's expected.

Short answer:
It' hard to implement a generic API to determine the size of instruction before encoding in machine IR. We do not have special check for inline asm at machine IR stage.

Long answer:
We have some instruction size checks for machine IR #83578 though it's implemented specifically for OPmi_ND.
I proposed to implement TargetInstrInfo::getInstSizeInBytes() for X86 in https://discourse.llvm.org/t/rfc-support-long-instruction-fixup-for-x86/76539 to report an error when size_of(MachineInstr) > 15. But after investigation, I found it's hard b/c the complexity of

  1. X86_MC::needsAddressSizeOverride
  2. REX -> REX2, VEX2 -> VEX3 promotion
  3. The place where the register/immediate operand is encoded

So, I think for X86, checking the instruction size during encoding is good enough.

@phoebewang
Copy link
Contributor

phoebewang commented Mar 4, 2024

I haven't noticed that it was addressed somewhere. What about inline asm? It seems that now it bypasses all checks and fails only during encoding. Is it expected? Maybe we need to add additional checks?
P.S. This is almost irrelevant to the PR but is about To help users find this issue earlier

It's expected.

Short answer: It' hard to implement a generic API to determine the size of instruction before encoding in machine IR. We do not have special check for inline asm at machine IR stage.

Long answer: We have some instruction size checks for machine IR #83578 though it's implemented specifically for OPmi_ND. I proposed to implement TargetInstrInfo::getInstSizeInBytes() for X86 in https://discourse.llvm.org/t/rfc-support-long-instruction-fixup-for-x86/76539 to report an error when size_of(MachineInstr) > 15. But after investigation, I found it's hard b/c the complexity of

  1. X86_MC::needsAddressSizeOverride
  2. REX -> REX2, VEX2 -> VEX3 promotion
  3. The place where the register/immediate operand is encoded

So, I think for X86, checking the instruction size during encoding is good enough.

Does it crash too if you compile it to object file? https://godbolt.org/z/3aqd497xG

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@KanRobert
Copy link
Contributor Author

Does it crash too if you compile it to object file? https://godbolt.org/z/3aqd497xG
@phoebewang Yes, it crashes. @e-kud is asking if we could report the error earlier (before encoding)

Copy link
Contributor

@XinWang10 XinWang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phoebewang
Copy link
Contributor

Does it crash too if you compile it to object file? https://godbolt.org/z/3aqd497xG
@phoebewang Yes, it crashes. @e-kud is asking if we could report the error earlier (before encoding)

We usually don't verify the inline asemble in codegen, because we assume user should be responsible for their mistakes in inline asemble.
But the thing is a bit complicated here. If user uses "m" constrain, they cannot forsee compiler will generate a SIB addressing or using x87 mode for it. Compiler need to take this responsibility to emit error or never generate SIB addressing in this case.

@KanRobert
Copy link
Contributor Author

Compiler need to take this responsibility to emit error or never generate SIB addressing in this case.

I don't think compilers should be so smart for inline assemble... Error when encoding sounds acceptable to me.

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the file/line information, otherwise LGTM

@KanRobert KanRobert merged commit 96b2c3b into llvm:main Mar 4, 2024
@e-kud
Copy link
Contributor

e-kud commented Mar 4, 2024

@KanRobert Thank you for the explanation!

We usually don't verify the inline asemble in codegen, because we assume user should be responsible for their mistakes in inline asemble.

@phoebewang I agree but we still check the correctness of opcodes and operands. I wondered if we can check the size as well. Apparently it's not so straightforward, especially including your example with compiler generated SIB for the "m" constrain.

@phoebewang
Copy link
Contributor

@KanRobert Thank you for the explanation!

We usually don't verify the inline asemble in codegen, because we assume user should be responsible for their mistakes in inline asemble.

@phoebewang I agree but we still check the correctness of opcodes and operands. I wondered if we can check the size as well. Apparently it's not so straightforward, especially including your example with compiler generated SIB for the "m" constrain.

Yes. SIB is the most headache, because when checking inline asm, we don't know the memory operands. When we can get the precise memory operands, we won't check inline asm again. Given we will emit error during encoding, it's not much important to check it in codegen.

My point is from user's perspective, because the generation of SIB is out of their control. So they may condemn it's compiler's fault to turn a legal inline asm into illegal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants