-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds `.insn` support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions, but will print the disassebly as `<unknown>`. This changes some error messages so they are a little clearer. Co-authored-by: Sudharsan Veeravalli <[email protected]>
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-risc-v Author: Sam Elliott (lenary) ChangesThis adds This changes some error messages so they are a little clearer. I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here. Full diff: https://github.com/llvm/llvm-project/pull/110022.diff 7 Files Affected:
diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
index 5e29a92f0bacd6..f4e6cccd1b2878 100644
--- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
@@ -707,6 +707,8 @@ struct RISCVOperand final : public MCParsedAsmOperand {
bool isUImm16() const { return IsUImm<16>(); }
bool isUImm20() const { return IsUImm<20>(); }
bool isUImm32() const { return IsUImm<32>(); }
+ bool isUImm48() const { return IsUImm<48>(); }
+ bool isUImm64() const { return IsUImm<64>(); }
bool isUImm8GE32() const {
int64_t Imm;
@@ -3146,7 +3148,7 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
StringRef Format;
SMLoc ErrorLoc = Parser.getTok().getLoc();
if (Parser.parseIdentifier(Format)) {
- // Try parsing .insn [length], value
+ // Try parsing .insn [ length , ] value
int64_t Length = 0;
int64_t Value = 0;
if (Parser.parseIntToken(
@@ -3158,23 +3160,45 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
return true;
}
- // TODO: Add support for long instructions
- int64_t RealLength = (Value & 3) == 3 ? 4 : 2;
- if (!isUIntN(RealLength * 8, Value))
- return Error(ErrorLoc, "invalid operand for instruction");
- if (RealLength == 2 && !AllowC)
+ // TODO: Support Instructions > 64 bits.
+ if (Length > 8)
+ return Error(ErrorLoc,
+ "instruction lengths over 64 bits are not supported");
+
+ int64_t EncodingDerivedLength = 4;
+ unsigned Opcode = RISCV::Insn32;
+ if ((Value & 0b11) != 0b11) {
+ EncodingDerivedLength = 2;
+ Opcode = RISCV::Insn16;
+ } else
+ switch (Value & 0b111'1111) {
+ case 0b001'1111:
+ case 0b101'1111:
+ EncodingDerivedLength = 6;
+ Opcode = RISCV::Insn48;
+ break;
+ case 0b011'1111:
+ EncodingDerivedLength = 8;
+ Opcode = RISCV::Insn64;
+ break;
+ case 0b111'1111:
+ // TODO: Support Instructions > 64 bits.
+ return Error(ErrorLoc,
+ "instruction lengths over 64 bits are not supported");
+ }
+ if (Length != 0 && Length != EncodingDerivedLength)
+ return Error(ErrorLoc, "instruction length does not match the encoding");
+ if (!isUIntN(EncodingDerivedLength * 8, Value))
+ return Error(ErrorLoc, "encoding value does not fit into instruction");
+ if (!AllowC && (EncodingDerivedLength == 2))
return Error(ErrorLoc, "compressed instructions are not allowed");
- if (Length != 0 && Length != RealLength)
- return Error(ErrorLoc, "instruction length mismatch");
if (getParser().parseEOL("invalid operand for instruction")) {
getParser().eatToEndOfStatement();
return true;
}
- emitToStreamer(getStreamer(), MCInstBuilder(RealLength == 2 ? RISCV::Insn16
- : RISCV::Insn32)
- .addImm(Value));
+ emitToStreamer(getStreamer(), MCInstBuilder(Opcode).addImm(Value));
return false;
}
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
index cf3ea3e4ea2131..d82f78498418da 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
@@ -309,6 +309,8 @@ enum OperandType : unsigned {
OPERAND_UIMM12,
OPERAND_UIMM16,
OPERAND_UIMM32,
+ OPERAND_UIMM48,
+ OPERAND_UIMM64,
OPERAND_ZERO,
OPERAND_SIMM5,
OPERAND_SIMM5_PLUS1,
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
index eb21498d15e86c..66970ed37f2724 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
@@ -355,6 +355,21 @@ void RISCVMCCodeEmitter::encodeInstruction(const MCInst &MI,
support::endian::write(CB, Bits, llvm::endianness::little);
break;
}
+ case 6: {
+ uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI) & 0xffff'ffff'ffffu;
+ SmallVector<char, 8> Encoding;
+ support::endian::write(Encoding, Bits, llvm::endianness::little);
+ assert(Encoding[6] == 0 && Encoding[7] == 0 &&
+ "Unexpected encoding for 48-bit instruction");
+ Encoding.truncate(6);
+ CB.append(Encoding);
+ break;
+ }
+ case 8: {
+ uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI);
+ support::endian::write(CB, Bits, llvm::endianness::little);
+ break;
+ }
}
++MCNumEmitted; // Keep track of the # of mi's emitted.
diff --git a/llvm/lib/Target/RISCV/RISCVInstrFormats.td b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
index fcea18f81b3901..013c26c72bfd55 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrFormats.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
@@ -266,6 +266,22 @@ class RVInst<dag outs, dag ins, string opcodestr, string argstr,
let Size = 4;
}
+class RVInst48<dag outs, dag ins, string opcodestr, string argstr,
+ list<dag> pattern, InstFormat format>
+ : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+ field bits<48> Inst;
+ field bits<48> SoftFail = 0;
+ let Size = 6;
+}
+
+class RVInst64<dag outs, dag ins, string opcodestr, string argstr,
+ list<dag> pattern, InstFormat format>
+ : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+ field bits<64> Inst;
+ field bits<64> SoftFail = 0;
+ let Size = 8;
+}
+
// Pseudo instructions
class Pseudo<dag outs, dag ins, list<dag> pattern, string opcodestr = "", string argstr = "">
: RVInst<outs, ins, opcodestr, argstr, pattern, InstFormatPseudo> {
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index fe5623e2920e22..34439880c3a31a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -235,6 +235,8 @@ def uimm7 : RISCVUImmOp<7>;
def uimm8 : RISCVUImmOp<8>;
def uimm16 : RISCVUImmOp<16>;
def uimm32 : RISCVUImmOp<32>;
+def uimm48 : RISCVUImmOp<48>;
+def uimm64 : RISCVUImmOp<64>;
def simm12 : RISCVSImmLeafOp<12> {
let MCOperandPredicate = [{
int64_t Imm;
@@ -1135,6 +1137,16 @@ def Insn32 : RVInst<(outs), (ins uimm32:$value), "", "", [], InstFormatOther> {
let Inst{31-0} = value;
let AsmString = ".insn 0x4, $value";
}
+def Insn48 : RVInst48<(outs), (ins uimm48:$value), "", "", [], InstFormatOther> {
+ bits<48> value;
+ let Inst{47-0} = value;
+ let AsmString = ".insn 0x6, $value";
+}
+def Insn64 : RVInst64<(outs), (ins uimm64:$value), "", "", [], InstFormatOther> {
+ bits<64> value;
+ let Inst{63-0} = value;
+ let AsmString = ".insn 0x8, $value";
+}
}
// Use InstAliases to match these so that we can combine the insn and format
diff --git a/llvm/test/MC/RISCV/insn-invalid.s b/llvm/test/MC/RISCV/insn-invalid.s
index d6fabea4e17016..e32619976108ac 100644
--- a/llvm/test/MC/RISCV/insn-invalid.s
+++ b/llvm/test/MC/RISCV/insn-invalid.s
@@ -26,8 +26,35 @@
.insn . # CHECK: :[[@LINE]]:7: error: expected instruction format or an integer constant
.insn 0x2, # CHECK: :[[@LINE]]:12: error: expected an integer constant
-.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: invalid operand for instruction
-.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+
.insn 0x4, 0x13, 0 # CHECK: :[[@LINE]]:16: error: invalid operand for instruction
+
+.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 10, 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 0x2, 0x03 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x1f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x3f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x4, 0x00000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x6, 0x000000000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x000000000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x00000000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x8, 0x0000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x0000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x2, 0xffff0001 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x4, 0xffff00000003 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x6, 0xffff00000000001f # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+
+.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+.insn 0x2, 0x0001 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
diff --git a/llvm/test/MC/RISCV/insn.s b/llvm/test/MC/RISCV/insn.s
index b95c3b87b442f2..ff09393a6b285c 100644
--- a/llvm/test/MC/RISCV/insn.s
+++ b/llvm/test/MC/RISCV/insn.s
@@ -164,3 +164,13 @@ target:
# CHECK-ASM: encoding: [0x13,0x00,0x00,0x00]
# CHECK-OBJ: addi zero, zero, 0x0
.insn 0x4, 0x13
+
+# CHECK-ASM: .insn 0x6, 31
+# CHECK-ASM: encoding: [0x1f,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 6, 0x1f
+
+# CHECK-ASM: .insn 0x8, 63
+# CHECK-ASM: encoding: [0x3f,0x00,0x00,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 8, 0x3f
|
As far as I know the encoding for >32-bit instructions is not ratified, nor even stable/frozen. Has that changed recently? |
Is that the only documentation for the fact they're not ratified? Note the preface says "Standard instruction-length encodings have been defined for 48-bit, 64-bit, and 64-bit This patch follows Table 70, and kicks the can down the road for much longer instructions, but maybe that still means there's no path to landing this and supporting hand-assembling wider instructions so they can be evaluated? Alternatively, is it maybe worth adding a feature that we can put behind |
That addition came out of a bunch of discussion in riscv/riscv-isa-manual#280 and resulted in riscv/riscv-isa-manual@1d8fbc8, which post-dates 2.0, the description of which is where the quote you took comes from. "Table 70. RISC-V base opcode map, inst[1:0]=11" would benefit from clarifying that the encodings are reserved but not ratified for that specific use. |
Ok, I will revisit this PR tomorrow, to put these behind an experimental extension. I'm much less clear on what should be done in the disassembler though, as it also uses this encoding information. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Looking back on the patch this morning, the only thing that is actually dependent on the length encodings themselves is actually the length detection logic, so I have updated the patch to never attempt to understand if an instruction might be longer than 4 bytes - instead you have to provide an explicit length for 6/8 byte encodings. I think this is a much better way forwards than a new feature, while the encodings are not yet frozen - and should still allow experimentation. |
I hope this is ready for a re-review, given I think I have adequately taken into account the feedback from @jrtc27 about the instability of longer encodings, and have hopefully found a way forwards to allow users to begin experimenting with longer instruction encodings without being beholden to something that is not yet frozen/ratified in the specification. I think once the specification does freeze/ratify some encodings for longer instructions, we can maybe start inferring lengths from the encoding again, which is to say I don't think this patch's behaviour has to be how I just added some docs to the review as well, to document how to use PS: I will be away for a week, so not able to respond to reviews. |
FYI binutils supports inferring the length from its encoding: https://github.com/bminor/binutils-gdb/blob/428f3561bc16dfc2944ee641201acdd166315aa2/gas/testsuite/gas/riscv/insn.s#L74-L81. But I agree that we can improve this after the spec is ratified :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments but this seems good to me. I think this is a good solution to the issue of the longer encodings not being ratified, so thanks for finding it!
@asb I think I've done all the requested changes. I'll wait until tomorrow to merge. |
This adds
.insn
support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions.This changes some error messages so they are a little clearer.
I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here.