[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

lenary · 2024-09-25T18:26:41Z

This adds .insn support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions.

This changes some error messages so they are a little clearer.

I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here.

This adds `.insn` support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions, but will print the disassebly as `<unknown>`. This changes some error messages so they are a little clearer. Co-authored-by: Sudharsan Veeravalli <[email protected]>

llvmbot · 2024-09-25T18:27:03Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-risc-v

Author: Sam Elliott (lenary)

Changes

This adds .insn support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions.

This changes some error messages so they are a little clearer.

I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here.

Full diff: https://github.com/llvm/llvm-project/pull/110022.diff

7 Files Affected:

(modified) llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp (+35-11)
(modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h (+2)
(modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp (+15)
(modified) llvm/lib/Target/RISCV/RISCVInstrFormats.td (+16)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+12)
(modified) llvm/test/MC/RISCV/insn-invalid.s (+31-4)
(modified) llvm/test/MC/RISCV/insn.s (+10)

diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
index 5e29a92f0bacd6..f4e6cccd1b2878 100644
--- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
@@ -707,6 +707,8 @@ struct RISCVOperand final : public MCParsedAsmOperand {
   bool isUImm16() const { return IsUImm<16>(); }
   bool isUImm20() const { return IsUImm<20>(); }
   bool isUImm32() const { return IsUImm<32>(); }
+  bool isUImm48() const { return IsUImm<48>(); }
+  bool isUImm64() const { return IsUImm<64>(); }
 
   bool isUImm8GE32() const {
     int64_t Imm;
@@ -3146,7 +3148,7 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
   StringRef Format;
   SMLoc ErrorLoc = Parser.getTok().getLoc();
   if (Parser.parseIdentifier(Format)) {
-    // Try parsing .insn [length], value
+    // Try parsing .insn [ length , ] value
     int64_t Length = 0;
     int64_t Value = 0;
     if (Parser.parseIntToken(
@@ -3158,23 +3160,45 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
         return true;
     }
 
-    // TODO: Add support for long instructions
-    int64_t RealLength = (Value & 3) == 3 ? 4 : 2;
-    if (!isUIntN(RealLength * 8, Value))
-      return Error(ErrorLoc, "invalid operand for instruction");
-    if (RealLength == 2 && !AllowC)
+    // TODO: Support Instructions > 64 bits.
+    if (Length > 8)
+      return Error(ErrorLoc,
+                   "instruction lengths over 64 bits are not supported");
+
+    int64_t EncodingDerivedLength = 4;
+    unsigned Opcode = RISCV::Insn32;
+    if ((Value & 0b11) != 0b11) {
+      EncodingDerivedLength = 2;
+      Opcode = RISCV::Insn16;
+    } else
+      switch (Value & 0b111'1111) {
+      case 0b001'1111:
+      case 0b101'1111:
+        EncodingDerivedLength = 6;
+        Opcode = RISCV::Insn48;
+        break;
+      case 0b011'1111:
+        EncodingDerivedLength = 8;
+        Opcode = RISCV::Insn64;
+        break;
+      case 0b111'1111:
+        // TODO: Support Instructions > 64 bits.
+        return Error(ErrorLoc,
+                     "instruction lengths over 64 bits are not supported");
+      }
+    if (Length != 0 && Length != EncodingDerivedLength)
+      return Error(ErrorLoc, "instruction length does not match the encoding");
+    if (!isUIntN(EncodingDerivedLength * 8, Value))
+      return Error(ErrorLoc, "encoding value does not fit into instruction");
+    if (!AllowC && (EncodingDerivedLength == 2))
       return Error(ErrorLoc, "compressed instructions are not allowed");
-    if (Length != 0 && Length != RealLength)
-      return Error(ErrorLoc, "instruction length mismatch");
 
     if (getParser().parseEOL("invalid operand for instruction")) {
       getParser().eatToEndOfStatement();
       return true;
     }
 
-    emitToStreamer(getStreamer(), MCInstBuilder(RealLength == 2 ? RISCV::Insn16
-                                                                : RISCV::Insn32)
-                                      .addImm(Value));
+    emitToStreamer(getStreamer(), MCInstBuilder(Opcode).addImm(Value));
     return false;
   }
 
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
index cf3ea3e4ea2131..d82f78498418da 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
@@ -309,6 +309,8 @@ enum OperandType : unsigned {
   OPERAND_UIMM12,
   OPERAND_UIMM16,
   OPERAND_UIMM32,
+  OPERAND_UIMM48,
+  OPERAND_UIMM64,
   OPERAND_ZERO,
   OPERAND_SIMM5,
   OPERAND_SIMM5_PLUS1,
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
index eb21498d15e86c..66970ed37f2724 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
@@ -355,6 +355,21 @@ void RISCVMCCodeEmitter::encodeInstruction(const MCInst &MI,
     support::endian::write(CB, Bits, llvm::endianness::little);
     break;
   }
+  case 6: {
+    uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI) & 0xffff'ffff'ffffu;
+    SmallVector<char, 8> Encoding;
+    support::endian::write(Encoding, Bits, llvm::endianness::little);
+    assert(Encoding[6] == 0 && Encoding[7] == 0 &&
+           "Unexpected encoding for 48-bit instruction");
+    Encoding.truncate(6);
+    CB.append(Encoding);
+    break;
+  }
+  case 8: {
+    uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI);
+    support::endian::write(CB, Bits, llvm::endianness::little);
+    break;
+  }
   }
 
   ++MCNumEmitted; // Keep track of the # of mi's emitted.
diff --git a/llvm/lib/Target/RISCV/RISCVInstrFormats.td b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
index fcea18f81b3901..013c26c72bfd55 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrFormats.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
@@ -266,6 +266,22 @@ class RVInst<dag outs, dag ins, string opcodestr, string argstr,
   let Size = 4;
 }
 
+class RVInst48<dag outs, dag ins, string opcodestr, string argstr,
+               list<dag> pattern, InstFormat format>
+    : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+  field bits<48> Inst;
+  field bits<48> SoftFail = 0;
+  let Size = 6;
+}
+
+class RVInst64<dag outs, dag ins, string opcodestr, string argstr,
+               list<dag> pattern, InstFormat format>
+    : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+  field bits<64> Inst;
+  field bits<64> SoftFail = 0;
+  let Size = 8;
+}
+
 // Pseudo instructions
 class Pseudo<dag outs, dag ins, list<dag> pattern, string opcodestr = "", string argstr = "">
     : RVInst<outs, ins, opcodestr, argstr, pattern, InstFormatPseudo> {
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index fe5623e2920e22..34439880c3a31a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -235,6 +235,8 @@ def uimm7 : RISCVUImmOp<7>;
 def uimm8 : RISCVUImmOp<8>;
 def uimm16 : RISCVUImmOp<16>;
 def uimm32 : RISCVUImmOp<32>;
+def uimm48 : RISCVUImmOp<48>;
+def uimm64 : RISCVUImmOp<64>;
 def simm12 : RISCVSImmLeafOp<12> {
   let MCOperandPredicate = [{
     int64_t Imm;
@@ -1135,6 +1137,16 @@ def Insn32 : RVInst<(outs), (ins uimm32:$value), "", "", [], InstFormatOther> {
   let Inst{31-0} = value;
   let AsmString = ".insn 0x4, $value";
 }
+def Insn48 : RVInst48<(outs), (ins uimm48:$value), "", "", [], InstFormatOther> {
+  bits<48> value;
+  let Inst{47-0} = value;
+  let AsmString = ".insn 0x6, $value";
+}
+def Insn64 : RVInst64<(outs), (ins uimm64:$value), "", "", [], InstFormatOther> {
+  bits<64> value;
+  let Inst{63-0} = value;
+  let AsmString = ".insn 0x8, $value";
+}
 }
 
 // Use InstAliases to match these so that we can combine the insn and format
diff --git a/llvm/test/MC/RISCV/insn-invalid.s b/llvm/test/MC/RISCV/insn-invalid.s
index d6fabea4e17016..e32619976108ac 100644
--- a/llvm/test/MC/RISCV/insn-invalid.s
+++ b/llvm/test/MC/RISCV/insn-invalid.s
@@ -26,8 +26,35 @@
 
 .insn . # CHECK: :[[@LINE]]:7: error: expected instruction format or an integer constant
 .insn 0x2, # CHECK: :[[@LINE]]:12: error: expected an integer constant
-.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: invalid operand for instruction
-.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+
 .insn 0x4, 0x13, 0 # CHECK: :[[@LINE]]:16: error: invalid operand for instruction
+
+.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 10, 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 0x2, 0x03 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x1f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x3f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x4, 0x00000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x6, 0x000000000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x000000000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x00000000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x8, 0x0000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x0000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x2, 0xffff0001 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x4, 0xffff00000003 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x6, 0xffff00000000001f # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+
+.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+.insn 0x2, 0x0001 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
diff --git a/llvm/test/MC/RISCV/insn.s b/llvm/test/MC/RISCV/insn.s
index b95c3b87b442f2..ff09393a6b285c 100644
--- a/llvm/test/MC/RISCV/insn.s
+++ b/llvm/test/MC/RISCV/insn.s
@@ -164,3 +164,13 @@ target:
 # CHECK-ASM: encoding: [0x13,0x00,0x00,0x00]
 # CHECK-OBJ: addi zero, zero, 0x0
 .insn 0x4, 0x13
+
+# CHECK-ASM: .insn 0x6, 31
+# CHECK-ASM: encoding: [0x1f,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 6, 0x1f
+
+# CHECK-ASM: .insn 0x8, 63
+# CHECK-ASM: encoding: [0x3f,0x00,0x00,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 8, 0x3f

jrtc27 · 2024-09-25T18:31:40Z

As far as I know the encoding for >32-bit instructions is not ratified, nor even stable/frozen. Has that changed recently?

jrtc27 · 2024-09-25T18:33:43Z

https://github.com/riscv/riscv-isa-manual/blob/7023c601443adf21850772e02e3d920bc68b2237/src/intro.adoc?plain=1#L496 seems to still be there

lenary · 2024-09-25T18:43:25Z

Is that the only documentation for the fact they're not ratified? Note the preface says "Standard instruction-length encodings have been defined for 48-bit, 64-bit, and 64-bit
instructions.", and Table 70 (which may be deemed not to be the canonical encoding information) also lacks that note.

This patch follows Table 70, and kicks the can down the road for much longer instructions, but maybe that still means there's no path to landing this and supporting hand-assembling wider instructions so they can be evaluated?

Alternatively, is it maybe worth adding a feature that we can put behind -menable-experimental-extensions so LLVM can support this without the wider encodings needing to be ratified?

jrtc27 · 2024-09-25T19:12:25Z

That addition came out of a bunch of discussion in riscv/riscv-isa-manual#280 and resulted in riscv/riscv-isa-manual@1d8fbc8, which post-dates 2.0, the description of which is where the quote you took comes from. "Table 70. RISC-V base opcode map, inst[1:0]=11" would benefit from clarifying that the encodings are reserved but not ratified for that specific use.

lenary · 2024-09-25T20:43:46Z

Ok, I will revisit this PR tomorrow, to put these behind an experimental extension. I'm much less clear on what should be done in the disassembler though, as it also uses this encoding information.

github-actions · 2024-09-26T13:54:42Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lenary · 2024-09-26T13:56:18Z

Looking back on the patch this morning, the only thing that is actually dependent on the length encodings themselves is actually the length detection logic, so I have updated the patch to never attempt to understand if an instruction might be longer than 4 bytes - instead you have to provide an explicit length for 6/8 byte encodings. I think this is a much better way forwards than a new feature, while the encodings are not yet frozen - and should still allow experimentation.

lenary · 2024-09-27T13:56:33Z

I hope this is ready for a re-review, given I think I have adequately taken into account the feedback from @jrtc27 about the instability of longer encodings, and have hopefully found a way forwards to allow users to begin experimenting with longer instruction encodings without being beholden to something that is not yet frozen/ratified in the specification.

I think once the specification does freeze/ratify some encodings for longer instructions, we can maybe start inferring lengths from the encoding again, which is to say I don't think this patch's behaviour has to be how .insn works indefinitely into the future.

I just added some docs to the review as well, to document how to use .insn for longer instructions, given the behaviour is no longer straightforward.

PS: I will be away for a week, so not able to respond to reviews.

dtcxzyw · 2024-09-27T14:17:58Z

I hope this is ready for a re-review, given I think I have adequately taken into account the feedback from @jrtc27 about the instability of longer encodings, and have hopefully found a way forwards to allow users to begin experimenting with longer instruction encodings without being beholden to something that is not yet frozen/ratified in the specification.

I think once the specification does freeze/ratify some encodings for longer instructions, we can maybe start inferring lengths from the encoding again, which is to say I don't think this patch's behaviour has to be how .insn works indefinitely into the future.

I just added some docs to the review as well, to document how to use .insn for longer instructions, given the behaviour is no longer straightforward.

PS: I will be away for a week, so not able to respond to reviews.

FYI binutils supports inferring the length from its encoding: https://github.com/bminor/binutils-gdb/blob/428f3561bc16dfc2944ee641201acdd166315aa2/gas/testsuite/gas/riscv/insn.s#L74-L81. But I agree that we can improve this after the spec is ratified :)

topperc

LGTM

asb

Left some minor comments but this seems good to me. I think this is a good solution to the issue of the longer encodings not being ratified, so thanks for finding it!

llvm/docs/RISCVUsage.rst

llvm/test/MC/RISCV/insn-invalid.s

lenary · 2024-10-07T16:12:42Z

@asb I think I've done all the requested changes. I'll wait until tomorrow to merge.

lenary added the backend:RISC-V label Sep 25, 2024

lenary requested review from asb, dtcxzyw, topperc and svs-quic September 25, 2024 18:26

llvmbot added the mc Machine (object) code label Sep 25, 2024

fixup! [RISCV][MC] Support Assembling 48- and 64-bit Instructions

90f096c

fixup! [RISCV][MC] Support Assembling 48- and 64-bit Instructions

306c1bd

lenary added 3 commits September 26, 2024 06:56

fixup! [RISCV][MC] Support Assembling 48- and 64-bit Instructions

e1ed7ed

fixup! Correctly deref optional

1e87d8d

fixup! [RISCV][MC] Support Assembling 48- and 64-bit Instructions

b3d98bc

dtcxzyw requested a review from kito-cheng September 27, 2024 13:45

Add Docs for RISC-V .insn directive

1b64d8c

topperc approved these changes Sep 27, 2024

View reviewed changes

asb approved these changes Oct 1, 2024

View reviewed changes

llvm/docs/RISCVUsage.rst Outdated Show resolved Hide resolved

llvm/docs/RISCVUsage.rst Outdated Show resolved Hide resolved

llvm/test/MC/RISCV/insn-invalid.s Show resolved Hide resolved

fixup! [RISCV][MC] Support Assembling 48- and 64-bit Instructions

ce5f18c

lenary merged commit f93f925 into llvm:main Oct 8, 2024
8 of 10 checks passed

lenary deleted the pr/riscv-insn-longer branch October 8, 2024 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

Uh oh!

lenary commented Sep 25, 2024

Uh oh!

llvmbot commented Sep 25, 2024 •

edited

Loading

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

lenary commented Sep 25, 2024

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

lenary commented Sep 25, 2024

Uh oh!

github-actions bot commented Sep 26, 2024 •

edited

Loading

Uh oh!

lenary commented Sep 26, 2024

Uh oh!

lenary commented Sep 27, 2024 •

edited

Loading

Uh oh!

dtcxzyw commented Sep 27, 2024

Uh oh!

topperc left a comment

Uh oh!

asb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lenary commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

Uh oh!

Conversation

lenary commented Sep 25, 2024

Uh oh!

llvmbot commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

lenary commented Sep 25, 2024

Uh oh!

jrtc27 commented Sep 25, 2024

Uh oh!

lenary commented Sep 25, 2024

Uh oh!

github-actions bot commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lenary commented Sep 26, 2024

Uh oh!

lenary commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Sep 27, 2024

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

asb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lenary commented Oct 7, 2024

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Sep 25, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading

lenary commented Sep 27, 2024 •

edited

Loading