Skip to content

[RISCV][MC] Support Assembling 48- and 64-bit Instructions #110022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 8, 2024

Conversation

lenary
Copy link
Member

@lenary lenary commented Sep 25, 2024

This adds .insn support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions.

This changes some error messages so they are a little clearer.


I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here.

This adds `.insn` support for assembling instructions of 48- and
64-bits. Disassembly already knows to bunch up the instruction bits for
these instructions, but will print the disassebly as `<unknown>`.

This changes some error messages so they are a little clearer.

Co-authored-by: Sudharsan Veeravalli <[email protected]>
@llvmbot llvmbot added the mc Machine (object) code label Sep 25, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-risc-v

Author: Sam Elliott (lenary)

Changes

This adds .insn support for assembling instructions of 48- and 64-bits. Disassembly already knows to bunch up the instruction bits for these instructions.

This changes some error messages so they are a little clearer.


I didn't go beyond 64-bit as that requires significant surgery to the backend's instruction encoding mechanism, whereas we can support up to 64-bit with these minimal changes here.


Full diff: https://github.com/llvm/llvm-project/pull/110022.diff

7 Files Affected:

  • (modified) llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp (+35-11)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h (+2)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp (+15)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrFormats.td (+16)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+12)
  • (modified) llvm/test/MC/RISCV/insn-invalid.s (+31-4)
  • (modified) llvm/test/MC/RISCV/insn.s (+10)
diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
index 5e29a92f0bacd6..f4e6cccd1b2878 100644
--- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
@@ -707,6 +707,8 @@ struct RISCVOperand final : public MCParsedAsmOperand {
   bool isUImm16() const { return IsUImm<16>(); }
   bool isUImm20() const { return IsUImm<20>(); }
   bool isUImm32() const { return IsUImm<32>(); }
+  bool isUImm48() const { return IsUImm<48>(); }
+  bool isUImm64() const { return IsUImm<64>(); }
 
   bool isUImm8GE32() const {
     int64_t Imm;
@@ -3146,7 +3148,7 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
   StringRef Format;
   SMLoc ErrorLoc = Parser.getTok().getLoc();
   if (Parser.parseIdentifier(Format)) {
-    // Try parsing .insn [length], value
+    // Try parsing .insn [ length , ] value
     int64_t Length = 0;
     int64_t Value = 0;
     if (Parser.parseIntToken(
@@ -3158,23 +3160,45 @@ bool RISCVAsmParser::parseDirectiveInsn(SMLoc L) {
         return true;
     }
 
-    // TODO: Add support for long instructions
-    int64_t RealLength = (Value & 3) == 3 ? 4 : 2;
-    if (!isUIntN(RealLength * 8, Value))
-      return Error(ErrorLoc, "invalid operand for instruction");
-    if (RealLength == 2 && !AllowC)
+    // TODO: Support Instructions > 64 bits.
+    if (Length > 8)
+      return Error(ErrorLoc,
+                   "instruction lengths over 64 bits are not supported");
+
+    int64_t EncodingDerivedLength = 4;
+    unsigned Opcode = RISCV::Insn32;
+    if ((Value & 0b11) != 0b11) {
+      EncodingDerivedLength = 2;
+      Opcode = RISCV::Insn16;
+    } else
+      switch (Value & 0b111'1111) {
+      case 0b001'1111:
+      case 0b101'1111:
+        EncodingDerivedLength = 6;
+        Opcode = RISCV::Insn48;
+        break;
+      case 0b011'1111:
+        EncodingDerivedLength = 8;
+        Opcode = RISCV::Insn64;
+        break;
+      case 0b111'1111:
+        // TODO: Support Instructions > 64 bits.
+        return Error(ErrorLoc,
+                     "instruction lengths over 64 bits are not supported");
+      }
+    if (Length != 0 && Length != EncodingDerivedLength)
+      return Error(ErrorLoc, "instruction length does not match the encoding");
+    if (!isUIntN(EncodingDerivedLength * 8, Value))
+      return Error(ErrorLoc, "encoding value does not fit into instruction");
+    if (!AllowC && (EncodingDerivedLength == 2))
       return Error(ErrorLoc, "compressed instructions are not allowed");
-    if (Length != 0 && Length != RealLength)
-      return Error(ErrorLoc, "instruction length mismatch");
 
     if (getParser().parseEOL("invalid operand for instruction")) {
       getParser().eatToEndOfStatement();
       return true;
     }
 
-    emitToStreamer(getStreamer(), MCInstBuilder(RealLength == 2 ? RISCV::Insn16
-                                                                : RISCV::Insn32)
-                                      .addImm(Value));
+    emitToStreamer(getStreamer(), MCInstBuilder(Opcode).addImm(Value));
     return false;
   }
 
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
index cf3ea3e4ea2131..d82f78498418da 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
@@ -309,6 +309,8 @@ enum OperandType : unsigned {
   OPERAND_UIMM12,
   OPERAND_UIMM16,
   OPERAND_UIMM32,
+  OPERAND_UIMM48,
+  OPERAND_UIMM64,
   OPERAND_ZERO,
   OPERAND_SIMM5,
   OPERAND_SIMM5_PLUS1,
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
index eb21498d15e86c..66970ed37f2724 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
@@ -355,6 +355,21 @@ void RISCVMCCodeEmitter::encodeInstruction(const MCInst &MI,
     support::endian::write(CB, Bits, llvm::endianness::little);
     break;
   }
+  case 6: {
+    uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI) & 0xffff'ffff'ffffu;
+    SmallVector<char, 8> Encoding;
+    support::endian::write(Encoding, Bits, llvm::endianness::little);
+    assert(Encoding[6] == 0 && Encoding[7] == 0 &&
+           "Unexpected encoding for 48-bit instruction");
+    Encoding.truncate(6);
+    CB.append(Encoding);
+    break;
+  }
+  case 8: {
+    uint64_t Bits = getBinaryCodeForInstr(MI, Fixups, STI);
+    support::endian::write(CB, Bits, llvm::endianness::little);
+    break;
+  }
   }
 
   ++MCNumEmitted; // Keep track of the # of mi's emitted.
diff --git a/llvm/lib/Target/RISCV/RISCVInstrFormats.td b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
index fcea18f81b3901..013c26c72bfd55 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrFormats.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrFormats.td
@@ -266,6 +266,22 @@ class RVInst<dag outs, dag ins, string opcodestr, string argstr,
   let Size = 4;
 }
 
+class RVInst48<dag outs, dag ins, string opcodestr, string argstr,
+               list<dag> pattern, InstFormat format>
+    : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+  field bits<48> Inst;
+  field bits<48> SoftFail = 0;
+  let Size = 6;
+}
+
+class RVInst64<dag outs, dag ins, string opcodestr, string argstr,
+               list<dag> pattern, InstFormat format>
+    : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {
+  field bits<64> Inst;
+  field bits<64> SoftFail = 0;
+  let Size = 8;
+}
+
 // Pseudo instructions
 class Pseudo<dag outs, dag ins, list<dag> pattern, string opcodestr = "", string argstr = "">
     : RVInst<outs, ins, opcodestr, argstr, pattern, InstFormatPseudo> {
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index fe5623e2920e22..34439880c3a31a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -235,6 +235,8 @@ def uimm7 : RISCVUImmOp<7>;
 def uimm8 : RISCVUImmOp<8>;
 def uimm16 : RISCVUImmOp<16>;
 def uimm32 : RISCVUImmOp<32>;
+def uimm48 : RISCVUImmOp<48>;
+def uimm64 : RISCVUImmOp<64>;
 def simm12 : RISCVSImmLeafOp<12> {
   let MCOperandPredicate = [{
     int64_t Imm;
@@ -1135,6 +1137,16 @@ def Insn32 : RVInst<(outs), (ins uimm32:$value), "", "", [], InstFormatOther> {
   let Inst{31-0} = value;
   let AsmString = ".insn 0x4, $value";
 }
+def Insn48 : RVInst48<(outs), (ins uimm48:$value), "", "", [], InstFormatOther> {
+  bits<48> value;
+  let Inst{47-0} = value;
+  let AsmString = ".insn 0x6, $value";
+}
+def Insn64 : RVInst64<(outs), (ins uimm64:$value), "", "", [], InstFormatOther> {
+  bits<64> value;
+  let Inst{63-0} = value;
+  let AsmString = ".insn 0x8, $value";
+}
 }
 
 // Use InstAliases to match these so that we can combine the insn and format
diff --git a/llvm/test/MC/RISCV/insn-invalid.s b/llvm/test/MC/RISCV/insn-invalid.s
index d6fabea4e17016..e32619976108ac 100644
--- a/llvm/test/MC/RISCV/insn-invalid.s
+++ b/llvm/test/MC/RISCV/insn-invalid.s
@@ -26,8 +26,35 @@
 
 .insn . # CHECK: :[[@LINE]]:7: error: expected instruction format or an integer constant
 .insn 0x2, # CHECK: :[[@LINE]]:12: error: expected an integer constant
-.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction length mismatch
-.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: invalid operand for instruction
-.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+
 .insn 0x4, 0x13, 0 # CHECK: :[[@LINE]]:16: error: invalid operand for instruction
+
+.insn 0x2, 0xffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x2, 0xffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0xffffffffff # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 10, 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+.insn 0x000007f # CHECK: :[[@LINE]]:7: error: instruction lengths over 64 bits are not supported
+
+.insn 0x2, 0x03 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x1f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x2, 0x3f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x4, 0x00000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x4, 0x0000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x6, 0x000000000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x000000000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x6, 0x00000000003f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x8, 0x0000001 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x0000013 # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+.insn 0x8, 0x000001f # CHECK: :[[@LINE]]:7: error: instruction length does not match the encoding
+
+.insn 0x2, 0xffff0001 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x4, 0xffff00000003 # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+.insn 0x6, 0xffff00000000001f # CHECK: :[[@LINE]]:7: error: encoding value does not fit into instruction
+
+.insn 0x0010 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
+.insn 0x2, 0x0001 # CHECK: :[[@LINE]]:7: error: compressed instructions are not allowed
diff --git a/llvm/test/MC/RISCV/insn.s b/llvm/test/MC/RISCV/insn.s
index b95c3b87b442f2..ff09393a6b285c 100644
--- a/llvm/test/MC/RISCV/insn.s
+++ b/llvm/test/MC/RISCV/insn.s
@@ -164,3 +164,13 @@ target:
 # CHECK-ASM: encoding: [0x13,0x00,0x00,0x00]
 # CHECK-OBJ: addi zero, zero, 0x0
 .insn 0x4, 0x13
+
+# CHECK-ASM: .insn 0x6, 31
+# CHECK-ASM: encoding: [0x1f,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 6, 0x1f
+
+# CHECK-ASM: .insn 0x8, 63
+# CHECK-ASM: encoding: [0x3f,0x00,0x00,0x00,0x00,0x00,0x00,0x00]
+# CHECK-OBJ: <unknown>
+.insn 8, 0x3f

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 25, 2024

As far as I know the encoding for >32-bit instructions is not ratified, nor even stable/frozen. Has that changed recently?

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 25, 2024

@lenary
Copy link
Member Author

lenary commented Sep 25, 2024

Is that the only documentation for the fact they're not ratified? Note the preface says "Standard instruction-length encodings have been defined for 48-bit, 64-bit, and 64-bit
instructions.", and Table 70 (which may be deemed not to be the canonical encoding information) also lacks that note.

This patch follows Table 70, and kicks the can down the road for much longer instructions, but maybe that still means there's no path to landing this and supporting hand-assembling wider instructions so they can be evaluated?

Alternatively, is it maybe worth adding a feature that we can put behind -menable-experimental-extensions so LLVM can support this without the wider encodings needing to be ratified?

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 25, 2024

That addition came out of a bunch of discussion in riscv/riscv-isa-manual#280 and resulted in riscv/riscv-isa-manual@1d8fbc8, which post-dates 2.0, the description of which is where the quote you took comes from. "Table 70. RISC-V base opcode map, inst[1:0]=11" would benefit from clarifying that the encodings are reserved but not ratified for that specific use.

@lenary
Copy link
Member Author

lenary commented Sep 25, 2024

Ok, I will revisit this PR tomorrow, to put these behind an experimental extension. I'm much less clear on what should be done in the disassembler though, as it also uses this encoding information.

Copy link

github-actions bot commented Sep 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@lenary
Copy link
Member Author

lenary commented Sep 26, 2024

Looking back on the patch this morning, the only thing that is actually dependent on the length encodings themselves is actually the length detection logic, so I have updated the patch to never attempt to understand if an instruction might be longer than 4 bytes - instead you have to provide an explicit length for 6/8 byte encodings. I think this is a much better way forwards than a new feature, while the encodings are not yet frozen - and should still allow experimentation.

@dtcxzyw dtcxzyw requested a review from kito-cheng September 27, 2024 13:45
@lenary
Copy link
Member Author

lenary commented Sep 27, 2024

I hope this is ready for a re-review, given I think I have adequately taken into account the feedback from @jrtc27 about the instability of longer encodings, and have hopefully found a way forwards to allow users to begin experimenting with longer instruction encodings without being beholden to something that is not yet frozen/ratified in the specification.

I think once the specification does freeze/ratify some encodings for longer instructions, we can maybe start inferring lengths from the encoding again, which is to say I don't think this patch's behaviour has to be how .insn works indefinitely into the future.

I just added some docs to the review as well, to document how to use .insn for longer instructions, given the behaviour is no longer straightforward.

PS: I will be away for a week, so not able to respond to reviews.

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 27, 2024

I hope this is ready for a re-review, given I think I have adequately taken into account the feedback from @jrtc27 about the instability of longer encodings, and have hopefully found a way forwards to allow users to begin experimenting with longer instruction encodings without being beholden to something that is not yet frozen/ratified in the specification.

I think once the specification does freeze/ratify some encodings for longer instructions, we can maybe start inferring lengths from the encoding again, which is to say I don't think this patch's behaviour has to be how .insn works indefinitely into the future.

I just added some docs to the review as well, to document how to use .insn for longer instructions, given the behaviour is no longer straightforward.

PS: I will be away for a week, so not able to respond to reviews.

FYI binutils supports inferring the length from its encoding: https://github.com/bminor/binutils-gdb/blob/428f3561bc16dfc2944ee641201acdd166315aa2/gas/testsuite/gas/riscv/insn.s#L74-L81. But I agree that we can improve this after the spec is ratified :)

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@asb asb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but this seems good to me. I think this is a good solution to the issue of the longer encodings not being ratified, so thanks for finding it!

@lenary
Copy link
Member Author

lenary commented Oct 7, 2024

@asb I think I've done all the requested changes. I'll wait until tomorrow to merge.

@lenary lenary merged commit f93f925 into llvm:main Oct 8, 2024
8 of 10 checks passed
@lenary lenary deleted the pr/riscv-insn-longer branch October 8, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants