-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[X86][MC] Report error when the instruction length exceeds 15 bytes #83708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The instruction-size limit of 15 bytes still applies to APX instructions. Note that it is possible for an EVEX-encoded legacy instruction to reach the 15-byte instruction length limit: 4 bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g. ``` addq $184, -96, %rax # encoding: [0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00] ``` If we added a segment prefix like fs, the length would be 16. In such a case, no additional (ASIZE or segment override) prefix can be used. To help users find this issue earlier, we change the internal compiler error to error in this patch.
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-x86 Author: Shengchen Kan (KanRobert) ChangesThe instruction-size limit of 15 bytes still applies to APX Note that it is possible for an EVEX-encoded legacy instruction to reach
If we added a segment prefix like fs, the length would be 16. In such a case, no additional (ASIZE or segment override) prefix can be To help users find this issue earlier, we change the internal compiler Full diff: https://github.com/llvm/llvm-project/pull/83708.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
index fdb11d1a408bb6..fd96a173af27e0 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
@@ -1924,8 +1924,9 @@ void X86MCCodeEmitter::encodeInstruction(const MCInst &MI,
if ((TSFlags & X86II::OpMapMask) == X86II::ThreeDNow)
emitByte(X86II::getBaseOpcodeFor(TSFlags), CB);
- assert(CB.size() - StartByte <= 15 &&
- "The size of instruction must be no longer than 15.");
+ if (CB.size() - StartByte > 15)
+ Ctx.reportError(MI.getLoc(),
+ "The size of instruction must be no longer than 15.");
#ifndef NDEBUG
// FIXME: Verify.
if (/*!Desc.isVariadic() &&*/ CurOp != NumOps) {
diff --git a/llvm/test/MC/X86/apx/long-instruction-err.s b/llvm/test/MC/X86/apx/long-instruction-err.s
new file mode 100644
index 00000000000000..a2210ecad81709
--- /dev/null
+++ b/llvm/test/MC/X86/apx/long-instruction-err.s
@@ -0,0 +1,25 @@
+# RUN: not llvm-mc -triple x86_64 -show-encoding %s 2>&1 | FileCheck %s
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: addq $1234, %cs:-96, %rax
+addq $1234, %cs:-96, %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: subq $1234, %fs:257(%rbx, %rcx), %rax
+subq $1234, %fs:257(%rbx, %rcx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: orq $1234, 257(%ebx, %ecx), %rax
+orq $1234, 257(%ebx, %ecx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: xorq $1234, %gs:257(%ebx), %rax
+xorq $1234, %gs:257(%ebx), %rax
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: {nf} andq $1234, %cs:-96
+{nf} andq $1234, %cs:-96
+
+# CHECK: error: The size of instruction must be no longer than 15.
+# CHECK: {evex} adcq $1234, %cs:-96
+{evex} adcq $1234, %cs:-96
|
I haven't noticed that it was addressed somewhere. What about inline asm? It seems that now it bypasses all checks and fails only during encoding. Is it expected? Maybe we need to add additional checks? P.S. This is almost irrelevant to the PR but is about To help users find this issue earlier |
It's expected. Short answer: Long answer:
So, I think for X86, checking the instruction size during encoding is good enough. |
Does it crash too if you compile it to object file? https://godbolt.org/z/3aqd497xG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We usually don't verify the inline asemble in codegen, because we assume user should be responsible for their mistakes in inline asemble. |
I don't think compilers should be so smart for inline assemble... Error when encoding sounds acceptable to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the file/line information, otherwise LGTM
@KanRobert Thank you for the explanation!
@phoebewang I agree but we still check the correctness of opcodes and operands. I wondered if we can check the size as well. Apparently it's not so straightforward, especially including your example with compiler generated SIB for the "m" constrain. |
Yes. SIB is the most headache, because when checking inline asm, we don't know the memory operands. When we can get the precise memory operands, we won't check inline asm again. Given we will emit error during encoding, it's not much important to check it in codegen. My point is from user's perspective, because the generation of SIB is out of their control. So they may condemn it's compiler's fault to turn a legal inline asm into illegal. |
The instruction-size limit of 15 bytes still applies to APX
instructions.
Note that it is possible for an EVEX-encoded legacy instruction to reach
the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of
SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g.
If we added a segment prefix like fs, the length would be 16.
In such a case, no additional (ASIZE or segment override) prefix can be
used.
To help users find this issue earlier, especially for assembler users, we change
the internal compiler error to error in this patch.
Diagnostic is aligned with GAS https://sourceware.org/bugzilla/show_bug.cgi?id=31323