[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446

topperc · 2024-09-05T18:31:52Z

This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits.

I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations.

Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes.

I plan to extend this idea to Zfinx next.

After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class.

This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next. After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mip64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward.

llvmbot · 2024-09-05T18:32:25Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits.

I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations.

Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes.

I plan to extend this idea to Zfinx next.

After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class.

Patch is 62.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107446.diff

21 Files Affected:

(modified) llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp (+10)
(modified) llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp (+13)
(modified) llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp (+7-2)
(modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+4-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+55-2)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+23)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+4-4)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td (+18-9)
(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+7-7)
(modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+79-33)
(modified) llvm/test/CodeGen/RISCV/fastcc-without-f-reg.ll (+70-70)
(modified) llvm/test/CodeGen/RISCV/half-arith.ll (+15-5)
(modified) llvm/test/CodeGen/RISCV/half-bitmanip-dagcombines.ll (+16-8)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+12)
(modified) llvm/test/CodeGen/RISCV/half-imm.ll (+6-2)
(modified) llvm/test/CodeGen/RISCV/half-intrinsics.ll (+16-11)
(modified) llvm/test/CodeGen/RISCV/half-maximum-minimum.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/half-mem.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/half-select-fcmp.ll (+15-15)
(modified) llvm/test/CodeGen/RISCV/half-select-icmp.ll (+20-20)
(modified) llvm/test/CodeGen/RISCV/kcfi-mir.ll (+2-2)

diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
index 6d33a1f64195d5..de9591d4cf72ac 100644
--- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
@@ -480,7 +480,13 @@ struct RISCVOperand final : public MCParsedAsmOperand {
            RISCVMCRegisterClasses[RISCV::GPRRegClassID].contains(Reg.RegNum);
   }
 
+  bool isGPRF16() const {
+    return Kind == KindTy::Register &&
+           RISCVMCRegisterClasses[RISCV::GPRF16RegClassID].contains(Reg.RegNum);
+  }
+
   bool isGPRAsFPR() const { return isGPR() && Reg.IsGPRAsFPR; }
+  bool isGPRAsFPR16() const { return isGPRF16() && Reg.IsGPRAsFPR; }
 
   bool isGPRPair() const {
     return Kind == KindTy::Register &&
@@ -1341,6 +1347,10 @@ unsigned RISCVAsmParser::validateTargetOperandClass(MCParsedAsmOperand &AsmOp,
     Op.Reg.RegNum = convertFPR64ToFPR16(Reg);
     return Match_Success;
   }
+  if (Kind == MCK_GPRAsFPR16 && Op.isGPRAsFPR()) {
+    Op.Reg.RegNum = Reg - RISCV::X0 + RISCV::X0_H;
+    return Match_Success;
+  }
   // As the parser couldn't differentiate an VRM2/VRM4/VRM8 from an VR, coerce
   // the register from VR to VRM2/VRM4/VRM8 if necessary.
   if (IsRegVR && (Kind == MCK_VRM2 || Kind == MCK_VRM4 || Kind == MCK_VRM8)) {
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 23897e2d98f634..76ca7728ebdd3a 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -81,6 +81,19 @@ static DecodeStatus DecodeGPRRegisterClass(MCInst &Inst, uint32_t RegNo,
   return MCDisassembler::Success;
 }
 
+static DecodeStatus DecodeGPRF16RegisterClass(MCInst &Inst, uint32_t RegNo,
+                                              uint64_t Address,
+                                              const MCDisassembler *Decoder) {
+  bool IsRVE = Decoder->getSubtargetInfo().hasFeature(RISCV::FeatureStdExtE);
+
+  if (RegNo >= 32 || (IsRVE && RegNo >= 16))
+    return MCDisassembler::Fail;
+
+  MCRegister Reg = RISCV::X0_H + RegNo;
+  Inst.addOperand(MCOperand::createReg(Reg));
+  return MCDisassembler::Success;
+}
+
 static DecodeStatus DecodeGPRX1X5RegisterClass(MCInst &Inst, uint32_t RegNo,
                                                uint64_t Address,
                                                const MCDisassembler *Decoder) {
diff --git a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
index cce0ffe16e5fe3..713c7a0661defe 100644
--- a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
+++ b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
@@ -93,14 +93,19 @@ bool RISCVDeadRegisterDefinitions::runOnMachineFunction(MachineFunction &MF) {
           continue;
         LLVM_DEBUG(dbgs() << "    Dead def operand #" << I << " in:\n      ";
                    MI.print(dbgs()));
+        Register X0Reg;
         const TargetRegisterClass *RC = TII->getRegClass(Desc, I, TRI, MF);
-        if (!(RC && RC->contains(RISCV::X0))) {
+        if (RC && RC->contains(RISCV::X0)) {
+          X0Reg = RISCV::X0;
+        } else if (RC && RC->contains(RISCV::X0_H)) {
+          X0Reg = RISCV::X0_H;
+        } else {
           LLVM_DEBUG(dbgs() << "    Ignoring, register is not a GPR.\n");
           continue;
         }
         assert(LIS.hasInterval(Reg));
         LIS.removeInterval(Reg);
-        MO.setReg(RISCV::X0);
+        MO.setReg(X0Reg);
         LLVM_DEBUG(dbgs() << "    Replacing with zero register. New:\n      ";
                    MI.print(dbgs()));
         ++NumDeadDefsReplaced;
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4580f3191d1389..d8db2694213c17 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -959,7 +959,10 @@ void RISCVDAGToDAGISel::Select(SDNode *Node) {
     }
 
     SDNode *Res;
-    if (Opc == RISCV::FCVT_D_W_IN32X || Opc == RISCV::FCVT_D_W)
+    if (VT.SimpleTy == MVT::f16 && Opc == RISCV::COPY) {
+      Res =
+          CurDAG->getTargetExtractSubreg(RISCV::sub_16, DL, VT, Imm).getNode();
+    } else if (Opc == RISCV::FCVT_D_W_IN32X || Opc == RISCV::FCVT_D_W)
       Res = CurDAG->getMachineNode(
           Opc, DL, VT, Imm,
           CurDAG->getTargetConstant(RISCVFPRndMode::RNE, DL, XLenVT));
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 6b4219b4623847..eb957e24ccee86 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -19059,6 +19059,23 @@ ArrayRef<MCPhysReg> RISCV::getArgGPRs(const RISCVABI::ABI ABI) {
   return ArrayRef(ArgIGPRs);
 }
 
+static ArrayRef<MCPhysReg> getArgGPR16s(const RISCVABI::ABI ABI) {
+  // The GPRs used for passing arguments in the ILP32* and LP64* ABIs, except
+  // the ILP32E ABI.
+  static const MCPhysReg ArgIGPRs[] = {RISCV::X10_H, RISCV::X11_H, RISCV::X12_H,
+                                       RISCV::X13_H, RISCV::X14_H, RISCV::X15_H,
+                                       RISCV::X16_H, RISCV::X17_H};
+  // The GPRs used for passing arguments in the ILP32E/ILP64E ABI.
+  static const MCPhysReg ArgEGPRs[] = {RISCV::X10_H, RISCV::X11_H,
+                                       RISCV::X12_H, RISCV::X13_H,
+                                       RISCV::X14_H, RISCV::X15_H};
+
+  if (ABI == RISCVABI::ABI_ILP32E || ABI == RISCVABI::ABI_LP64E)
+    return ArrayRef(ArgEGPRs);
+
+  return ArrayRef(ArgIGPRs);
+}
+
 static ArrayRef<MCPhysReg> getFastCCArgGPRs(const RISCVABI::ABI ABI) {
   // The GPRs used for passing arguments in the FastCC, X5 and X6 might be used
   // for save-restore libcall, so we don't use them.
@@ -19077,6 +19094,26 @@ static ArrayRef<MCPhysReg> getFastCCArgGPRs(const RISCVABI::ABI ABI) {
   return ArrayRef(FastCCIGPRs);
 }
 
+static ArrayRef<MCPhysReg> getFastCCArgGPRF16s(const RISCVABI::ABI ABI) {
+  // The GPRs used for passing arguments in the FastCC, X5 and X6 might be used
+  // for save-restore libcall, so we don't use them.
+  // Don't use X7 for fastcc, since Zicfilp uses X7 as the label register.
+  static const MCPhysReg FastCCIGPRs[] = {
+      RISCV::X10_H, RISCV::X11_H, RISCV::X12_H, RISCV::X13_H,
+      RISCV::X14_H, RISCV::X15_H, RISCV::X16_H, RISCV::X17_H,
+      RISCV::X28_H, RISCV::X29_H, RISCV::X30_H, RISCV::X31_H};
+
+  // The GPRs used for passing arguments in the FastCC when using ILP32E/ILP64E.
+  static const MCPhysReg FastCCEGPRs[] = {RISCV::X10_H, RISCV::X11_H,
+                                          RISCV::X12_H, RISCV::X13_H,
+                                          RISCV::X14_H, RISCV::X15_H};
+
+  if (ABI == RISCVABI::ABI_ILP32E || ABI == RISCVABI::ABI_LP64E)
+    return ArrayRef(FastCCEGPRs);
+
+  return ArrayRef(FastCCIGPRs);
+}
+
 // Pass a 2*XLEN argument that has been split into two XLEN values through
 // registers or the stack as necessary.
 static bool CC_RISCVAssign2XLen(unsigned XLen, CCState &State, CCValAssign VA1,
@@ -19225,6 +19262,15 @@ bool RISCV::CC_RISCV(const DataLayout &DL, RISCVABI::ABI ABI, unsigned ValNo,
   // similar local variables rather than directly checking against the target
   // ABI.
 
+  const RISCVSubtarget &STI =
+      State.getMachineFunction().getSubtarget<RISCVSubtarget>();
+  if ((ValVT == MVT::f16 && STI.hasStdExtZhinxmin())) {
+    if (MCRegister Reg = State.AllocateReg(getArgGPR16s(ABI))) {
+      State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
+      return false;
+    }
+  }
+
   ArrayRef<MCPhysReg> ArgGPRs = RISCV::getArgGPRs(ABI);
 
   if (UseGPRForF16_F32 && (ValVT == MVT::f16 || ValVT == MVT::bf16 ||
@@ -19685,8 +19731,7 @@ bool RISCV::CC_RISCV_FastCC(const DataLayout &DL, RISCVABI::ABI ABI,
   }
 
   // Check if there is an available GPR before hitting the stack.
-  if ((LocVT == MVT::f16 && Subtarget.hasStdExtZhinxmin()) ||
-      (LocVT == MVT::f32 && Subtarget.hasStdExtZfinx()) ||
+  if ((LocVT == MVT::f32 && Subtarget.hasStdExtZfinx()) ||
       (LocVT == MVT::f64 && Subtarget.is64Bit() &&
        Subtarget.hasStdExtZdinx())) {
     if (MCRegister Reg = State.AllocateReg(getFastCCArgGPRs(ABI))) {
@@ -19703,6 +19748,14 @@ bool RISCV::CC_RISCV_FastCC(const DataLayout &DL, RISCVABI::ABI ABI,
     }
   }
 
+  // Check if there is an available GPRF16 before hitting the stack.
+  if ((LocVT == MVT::f16 && Subtarget.hasStdExtZhinxmin())) {
+    if (MCRegister Reg = State.AllocateReg(getFastCCArgGPRF16s(ABI))) {
+      State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
+      return false;
+    }
+  }
+
   if (LocVT == MVT::f16) {
     unsigned Offset2 = State.AllocateStack(2, Align(2));
     State.addLoc(CCValAssign::getMem(ValNo, ValVT, Offset2, LocVT, LocInfo));
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 0a64a8e1440084..cb1840a2c60130 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -452,6 +452,23 @@ void RISCVInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     return;
   }
 
+  if (RISCV::GPRF16RegClass.contains(DstReg, SrcReg)) {
+    if (STI.hasStdExtZhinx()) {
+      BuildMI(MBB, MBBI, DL, get(RISCV::FSGNJ_H_INX), DstReg)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .addReg(SrcReg, getKillRegState(KillSrc));
+      return;
+    }
+    DstReg =
+        TRI->getMatchingSuperReg(DstReg, RISCV::sub_16, &RISCV::GPRRegClass);
+    SrcReg =
+        TRI->getMatchingSuperReg(SrcReg, RISCV::sub_16, &RISCV::GPRRegClass);
+    BuildMI(MBB, MBBI, DL, get(RISCV::ADDI), DstReg)
+        .addReg(SrcReg, getKillRegState(KillSrc))
+        .addImm(0);
+    return;
+  }
+
   if (RISCV::GPRPairRegClass.contains(DstReg, SrcReg)) {
     // Emit an ADDI for both parts of GPRPair.
     BuildMI(MBB, MBBI, DL, get(RISCV::ADDI),
@@ -573,6 +590,9 @@ void RISCVInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
     Opcode = TRI->getRegSizeInBits(RISCV::GPRRegClass) == 32 ?
              RISCV::SW : RISCV::SD;
     IsScalableVector = false;
+  } else if (RISCV::GPRF16RegClass.hasSubClassEq(RC)) {
+    Opcode = RISCV::SH_INX;
+    IsScalableVector = false;
   } else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) {
     Opcode = RISCV::PseudoRV32ZdinxSD;
     IsScalableVector = false;
@@ -656,6 +676,9 @@ void RISCVInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
     Opcode = TRI->getRegSizeInBits(RISCV::GPRRegClass) == 32 ?
              RISCV::LW : RISCV::LD;
     IsScalableVector = false;
+  } else if (RISCV::GPRF16RegClass.hasSubClassEq(RC)) {
+    Opcode = RISCV::LH_INX;
+    IsScalableVector = false;
   } else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) {
     Opcode = RISCV::PseudoRV32ZdinxLD;
     IsScalableVector = false;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 6d0952a42eda9f..deb7c8b8435b8b 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -514,8 +514,8 @@ class BranchCC_rri<bits<3> funct3, string opcodestr>
 }
 
 let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in {
-class Load_ri<bits<3> funct3, string opcodestr>
-    : RVInstI<funct3, OPC_LOAD, (outs GPR:$rd), (ins GPRMem:$rs1, simm12:$imm12),
+class Load_ri<bits<3> funct3, string opcodestr, DAGOperand rty = GPR>
+    : RVInstI<funct3, OPC_LOAD, (outs rty:$rd), (ins GPRMem:$rs1, simm12:$imm12),
               opcodestr, "$rd, ${imm12}(${rs1})">;
 
 class HLoad_r<bits<7> funct7, bits<5> funct5, string opcodestr>
@@ -529,9 +529,9 @@ class HLoad_r<bits<7> funct7, bits<5> funct5, string opcodestr>
 // reflecting the order these fields are specified in the instruction
 // encoding.
 let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in {
-class Store_rri<bits<3> funct3, string opcodestr>
+class Store_rri<bits<3> funct3, string opcodestr, DAGOperand rty = GPR>
     : RVInstS<funct3, OPC_STORE, (outs),
-              (ins GPR:$rs2, GPRMem:$rs1, simm12:$imm12),
+              (ins rty:$rs2, GPRMem:$rs1, simm12:$imm12),
               opcodestr, "$rs2, ${imm12}(${rs1})">;
 
 class HStore_rr<bits<7> funct7, string opcodestr>
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
index 792cb7fa6dbc2f..7f417d29fc6c2d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
@@ -33,9 +33,15 @@ def riscv_fmv_x_signexth
 
 // Zhinxmin and Zhinx
 
+def GPRAsFPR16 : AsmOperandClass {
+  let Name = "GPRAsFPR16";
+  let ParserMethod = "parseGPRAsFPR";
+  let RenderMethod = "addRegOperands";
+}
+
 def FPR16INX : RegisterOperand<GPRF16> {
-  let ParserMatchClass = GPRAsFPR;
-  let DecoderMethod = "DecodeGPRRegisterClass";
+  let ParserMatchClass = GPRAsFPR16;
+  let DecoderMethod = "DecodeGPRF16RegisterClass";
 }
 
 def ZfhExt     : ExtInfo<"", "", [HasStdExtZfh],
@@ -84,6 +90,12 @@ def FLH : FPLoad_r<0b001, "flh", FPR16, WriteFLD16>;
 def FSH : FPStore_r<0b001, "fsh", FPR16, WriteFST16>;
 } // Predicates = [HasHalfFPLoadStoreMove]
 
+let Predicates = [HasStdExtZhinxmin], isCodeGenOnly = 1 in {
+def LH_INX : Load_ri<0b001, "lh", GPRF16>, Sched<[WriteLDH, ReadMemBase]>;
+def SH_INX : Store_rri<0b001, "sh", GPRF16>,
+             Sched<[WriteSTH, ReadStoreData, ReadMemBase]>;
+}
+
 foreach Ext = ZfhExts in {
   let SchedRW = [WriteFMA16, ReadFMA16, ReadFMA16, ReadFMA16Addend] in {
     defm FMADD_H  : FPFMA_rrr_frm_m<OPC_MADD,  0b10, "fmadd.h",  Ext>;
@@ -426,13 +438,10 @@ let Predicates = [HasStdExtZhinxmin] in {
 defm Select_FPR16INX : SelectCC_GPR_rrirr<FPR16INX, f16>;
 
 /// Loads
-def : Pat<(f16 (load (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
-          (COPY_TO_REGCLASS (LH GPR:$rs1, simm12:$imm12), GPRF16)>;
+def : LdPat<load, LH_INX, f16>;
 
 /// Stores
-def : Pat<(store (f16 FPR16INX:$rs2),
-                 (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
-          (SH (COPY_TO_REGCLASS FPR16INX:$rs2, GPR), GPR:$rs1, simm12:$imm12)>;
+def : StPat<store, SH_INX, GPRF16, f16>;
 } // Predicates = [HasStdExtZhinxmin]
 
 let Predicates = [HasStdExtZfhmin] in {
@@ -458,8 +467,8 @@ def : Pat<(any_fpround FPR32INX:$rs1), (FCVT_H_S_INX FPR32INX:$rs1, FRM_DYN)>;
 def : Pat<(any_fpextend FPR16INX:$rs1), (FCVT_S_H_INX FPR16INX:$rs1, FRM_RNE)>;
 
 // Moves (no conversion)
-def : Pat<(f16 (riscv_fmv_h_x GPR:$src)), (COPY_TO_REGCLASS GPR:$src, GPR)>;
-def : Pat<(riscv_fmv_x_anyexth FPR16INX:$src), (COPY_TO_REGCLASS FPR16INX:$src, GPR)>;
+def : Pat<(f16 (riscv_fmv_h_x GPR:$src)), (EXTRACT_SUBREG GPR:$src, sub_16)>;
+def : Pat<(riscv_fmv_x_anyexth FPR16INX:$src), (INSERT_SUBREG (XLenVT (IMPLICIT_DEF)), FPR16INX:$src, sub_16)>;
 
 def : Pat<(fcopysign FPR32INX:$rs1, FPR16INX:$rs2), (FSGNJ_S_INX $rs1, (FCVT_S_H_INX $rs2, FRM_RNE))>;
 } // Predicates = [HasStdExtZhinxmin]
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 701594c0fb05dc..2bd41386e2dfb9 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -115,11 +115,11 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   }
 
   // Use markSuperRegs to ensure any register aliases are also reserved
-  markSuperRegs(Reserved, RISCV::X2); // sp
-  markSuperRegs(Reserved, RISCV::X3); // gp
-  markSuperRegs(Reserved, RISCV::X4); // tp
+  markSuperRegs(Reserved, RISCV::X2_H); // sp
+  markSuperRegs(Reserved, RISCV::X3_H); // gp
+  markSuperRegs(Reserved, RISCV::X4_H); // tp
   if (TFI->hasFP(MF))
-    markSuperRegs(Reserved, RISCV::X8); // fp
+    markSuperRegs(Reserved, RISCV::X8_H); // fp
   // Reserve the base register if we need to realign the stack and allocate
   // variable-sized objects at runtime.
   if (TFI->hasBP(MF))
@@ -131,7 +131,7 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
 
   // There are only 16 GPRs for RVE.
   if (Subtarget.hasStdExtE())
-    for (MCPhysReg Reg = RISCV::X16; Reg <= RISCV::X31; Reg++)
+    for (MCPhysReg Reg = RISCV::X16_H; Reg <= RISCV::X31_H; Reg++)
       markSuperRegs(Reserved, Reg);
 
   // V registers for code generation. We handle them manually.
@@ -150,8 +150,8 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   if (MF.getFunction().getCallingConv() == CallingConv::GRAAL) {
     if (Subtarget.hasStdExtE())
       report_fatal_error("Graal reserved registers do not exist in RVE");
-    markSuperRegs(Reserved, RISCV::X23);
-    markSuperRegs(Reserved, RISCV::X27);
+    markSuperRegs(Reserved, RISCV::X23_H);
+    markSuperRegs(Reserved, RISCV::X27_H);
   }
 
   // Shadow stack pointer.
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 5725d8eda88ced..37a1643ef5236f 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -83,41 +83,81 @@ def sub_gpr_odd  : SubRegIndex<32, 32> {
 
 let RegAltNameIndices = [ABIRegAltName] in {
   let isConstant = true in
-  def X0  : RISCVReg<0, "x0", ["zero"]>, DwarfRegNum<[0]>;
+  def X0_H  : RISCVReg<0, "x0", ["zero"]>;
   let CostPerUse = [0, 1] in {
-  def X1  : RISCVReg<1, "x1", ["ra"]>, DwarfRegNum<[1]>;
-  def X2  : RISCVReg<2, "x2", ["sp"]>, DwarfRegNum<[2]>;
-  def X3  : RISCVReg<3, "x3", ["gp"]>, DwarfRegNum<[3]>;
-  def X4  : RISCVReg<4, "x4", ["tp"]>, DwarfRegNum<[4]>;
-  def X5  : RISCVReg<5, "x5", ["t0"]>, DwarfRegNum<[5]>;
-  def X6  : RISCVReg<6, "x6", ["t1"]>, DwarfRegNum<[6]>;
-  def X7  : RISCVReg<7, "x7", ["t2"]>, DwarfRegNum<[7]>;
+  def X1_H  : RISCVReg<1, "x1", ["ra"]>;
+  def X2_H  : RISCVReg<2, "x2", ["sp"]>;
+  def X3_H  : RISCVReg<3, "x3", ["gp"]>;
+  def X4_H  : RISCVReg<4, "x4", ["tp"]>;
+  def X5_H  : RISCVReg<5, "x5", ["t0"]>;
+  def X6_H  : RISCVReg<6, "x6", ["t1"]>;
+  def X7_H  : RISCVReg<7, "x7", ["t2"]>;
   }
-  def X8  : RISCVReg<8, "x8", ["s0", "fp"]>, DwarfRegNum<[8]>;
-  def X9  : RISCVReg<9, "x9", ["s1"]>, DwarfRegNum<[9]>;
-  def X10 : RISCVReg<10,"x10", ["a0"]>, DwarfRegNum<[10]>;
-  def X11 : RISCVReg<11,"x11", ["a1"]>, DwarfRegNum<[11]>;
-  def X12 : RISCVReg<12,"x12", ["a2"]>, DwarfRegNum<[12]>;
-  def X13 : RISCVReg<13,"x13", ["a3"]>, DwarfRegNum<[13]>;
-  def X14 : RISCVReg<14,"x14", ["a4"]>, DwarfRegNum<[14]>;
-  def X15 : RISCVReg<15,"x15", ["a5"]>, DwarfRegNum<[15]>;
+  def X8_H  : RISCVReg<8, "x8", ["s0", "fp"]>;
+  def X9_H  : RISCVReg<9, "x9", ["s1"]>;
+  def X10_H : RISCVReg<10,"x10", ["a0"]>;
+  def X11_H : RISCVReg<11,"x11", ["a1"]>;
+  def X12_H : RISCVReg<12,"x12", ["a2"]>;
+  def X13_H : RISCVReg<13,"x13", ["a3"]>;
+  def X14_H : RISCVReg<14,"x14", ["a4"]>;
+  def X15_H : RISCVReg<15,"x15", ["a5"]>;
   let CostPerUse = [0, 1] in {
-  def X16 : RISCVReg<16,"x16", ["a6"]>, DwarfRegNum<[16]>;
-  def X17 : RISCVReg<17,"x17", ["a7"]>, DwarfRegNum<[17]>;
-  def X18 : RISCVReg<18,"x18", ["s2"]>, DwarfRegNum<[18]>;
-  def X19 : RISCVReg<19,"x19", ["s3"]>, DwarfRegNum<[19]>;
-  def X20 : RISCVReg<20,"x20", ["s4"]>, DwarfRegNum<[20]>;
-  def X21 : RISCVReg<21,"x21", ["s5"]>, DwarfRegNum<[21]>;
-  def X22 : RISCVReg<22,"x22", ["s6"]>, DwarfRegNum<[22]>;
-  def X23 : RISCVReg<23,"x23", ["s7"]>, DwarfRegNum<[23]>;
-  def X24 : RISCVReg<24,"x24", ["s8"]>, DwarfRegNum<[24]>;
-  def X25 : RISCVReg<25,"x25", ["s9"]>, DwarfRegNum<[25]>;
-  def X26 : RISCVReg<26,"x26", ["s10"]>, DwarfRegNum<[26]>;
-  def X27 : RISCVReg<27,"x27", ["s11"]>, DwarfRegNum<[27]>;
-  def X28 : RISCVReg<28,"x28", ["t3"]>, DwarfRegNum<[28]>;
-  def X29 : RISCVReg<29,"x29", ["t4"]>, DwarfRegNum<[29]>;
-  def X30 : RISCVReg<30,"x30", ["t5"]>, DwarfRegNum<[30]>;
-  def X31 : RISCVReg<31,"x31", ["t6"]>, DwarfRegNum<[31]>;
+  def X16_H : RISCVReg<16,"x16", ["a6"]>;
+  def X17_H : RISCVReg<17,"x17", ["a7"]>;
+  def X18_H : RISCVReg<18,"x18", ["s2"]>;
+  def X19_H : RISCVReg<19,"x19", ["s3"]>;
+  def X20_H : RISCVReg<20,"x20", ["s4"]>;
+  def X21_H : RISCVReg<21,"x21", ["s5"]>;
+  def X22_H : RISCVReg<22,"x22", ["s6"]>;
+  def X23_H : RISCVReg<23,"x23", ["s7"]>;
+  def X24_H : RISCVReg<24,"x24", ["s8"]>;
+  def X25_H : RISCVReg<25,"x25", ["s9"]>;
+  def X26_H : RISCVReg<26,"x26", ["s10"]>;
+  def X27_H : RISCVReg<27,"x27", ["s11"]>;
+  def X28_H ...
[truncated]

With Zfinx/Zdinx, f32/f64 are legal types for a GPR, we don't need a bitcast. This avoids turning fneg/fabs into bitwise operations purely because of these bitcasts. If the bitwise operations are faster for some reason on a Zfinx CPU, then that seems like it should be done for all fneg/fabs, not just ones near function arguments/returns. I don't have much interest in Zfinx, this just makes the code more similar to what I proposed for Zhinx in llvm#107446.

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

preames · 2024-09-06T17:32:11Z

This goes beyond what I can meaningfully review, but I'll note that I see nothing to particularly object to here.

Naming wise, _H is, I assume, for half. Since we'll probably use these for bf16, and possible someday i16, maybe pick a different name suffix?

jrtc27 · 2024-09-06T17:38:40Z

FLH and LH both use half as the term, and the format of the float shouldn’t affect the load or store, just the interpretation for arithmetic purposes, so _H seems like the appropriate thing to use for me?

preames · 2024-09-06T17:55:12Z

FLH and LH both use half as the term, and the format of the float shouldn’t affect the load or store, just the interpretation for arithmetic purposes, so _H seems like the appropriate thing to use for me?

Good point, naming suggestion withdrawn.

…7464) With Zfinx/Zdinx, f32/f64 are legal types for a GPR, we don't need a bitcast. This avoids turning fneg/fabs into bitwise operations purely because of these bitcasts. If the bitwise operations are faster for some reason on a Zfinx CPU, then that seems like it should be done for all fneg/fabs, not just ones near function arguments/returns. I don't have much interest in Zfinx, this just makes the code more similar to what I proposed for Zhinx in #107446.

asb

LGTM in that the reasoning makes sense to me and I can't see problems from going through the code. That said, it's been a while since I added a new register class.

I think it would be nice to have a brief comment in RISCVRegisterInfo.td explaining why we're defining this register class and how it's used. Perhaps that makes more sense once you make similar changes for zfinx, but in general I find such in-tree documentation of intent really helpful.

You say that fsgnj gives better results vs mv - I'm assuming this is for real world code? Because for the modified in-tree tests, surely using mv where possible is better due to being potentially compressible? Happy to go with your judgement, but it probably deserves a TODO/FIXME somewhere like the tests to note that a semantically equivalent mv would be more compressible (and so someone sufficiently motivated might later pick it up).

EDIT to add:

After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class.

This definitely sounds like an interesting direction. And if we can demonstrate adding those missing instructions makes life easier for the compiler, it's probably worth at least discussing in e.g. the scalar efficiency working group.

topperc · 2024-09-10T16:21:03Z

You say that fsgnj gives better results vs mv - I'm assuming this is for real world code? Because for the modified in-tree tests, surely using mv where possible is better due to being potentially compressible? Happy to go with your judgement, but it probably deserves a TODO/FIXME somewhere like the tests to note that a semantically equivalent mv would be more compressible (and so someone sufficiently motivated might later pick it up).

The subreg operations needed to use the existing ADDI seem to pessimize Machine Copy Propagation on some of the lit tests. I think this is because some implicit operands get added and MachineCopyPropagation doesn't know how those implicit operands relate to the explicit operands so it conservatively gives up. I have not done any testing of this on real world code. I suspect a GPRF16 CodeGenOnly copy instruction may fix this issue, but haven't checked yet.

This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.

github-actions · 2024-09-15T02:43:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.

topperc · 2024-09-20T16:31:45Z

@asb can you re-review this? I added compressed versions of the load/store CodeGenOnly instructions, a pseudo for mv, updated MakeCompressible, MergeBaseOffset.

asb

LGTM! Sorry for missing your request to re-review. I'm still missing the setup I had with Phabricator Herald :( But I'll try to improve my workflow.

asb · 2024-09-26T13:16:11Z

llvm/lib/Target/RISCV/RISCVCallingConv.cpp

+  static const MCPhysReg ArgIGPRs[] = {RISCV::X10_H, RISCV::X11_H, RISCV::X12_H,
+                                       RISCV::X13_H, RISCV::X14_H, RISCV::X15_H,
+                                       RISCV::X16_H, RISCV::X17_H};
+  // The GPRs used for passing arguments in the ILP32E/ILP64E ABI.


Typo: ILP64E => LP64E (I've just committed a fix to other similar instances in this file).

asb · 2024-09-26T13:31:05Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

@@ -1505,6 +1520,9 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
  }

  switch (Opcode) {
+  case RISCV::PseudoMV_FPR16INX:
+    // MV is always compressible.


This comment made me double check, so sharing that process: Although we'd expect c.li to be used instead, c.mv can't handle an X0 operand. But from what I can see, we expand to ADDI and we have the obvious CompressPat from ADDI with X0 operand to c.li.

Perhaps expand the comment to "MV is always compressible (to either c.mv or c.li rd, 0)."

This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next.

This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.

This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to #107446 which adds a 16 bit register class.

This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.

topperc requested review from asb, jrtc27, preames, lukel97 and wangpc-pp September 5, 2024 18:31

llvmbot added the backend:RISC-V label Sep 5, 2024

topperc mentioned this pull request Sep 5, 2024

[RISCV] Pass f32/f64 directly without a bitcast for Zfinx/Zdinx. #107464

Merged

lukel97 mentioned this pull request Sep 5, 2024

[RISCV][MI] Support partial spill/reload for vector registers #105661

Open

wangpc-pp reviewed Sep 6, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVRegisterInfo.td Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

5898d52

topperc added 2 commits September 9, 2024 13:27

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

4833be5

fixup! Remove unneeded DecoderMethod =

98dab0f

asb approved these changes Sep 10, 2024

View reviewed changes

topperc added 2 commits September 10, 2024 11:46

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

451e58a

fixup! Add comment

ae9fcd6

asb approved these changes Sep 11, 2024

View reviewed changes

topperc mentioned this pull request Sep 12, 2024

[RISCV] Add 32 bit GPR sub-register for Zfinx. #108336

Merged

topperc added 2 commits September 12, 2024 00:19

fixup! Add compressed load/store CodeGenOnly instructions.

404aa7b

fixup! Add Zhinx load/store to RISCVMakeCompressible

37bd470

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

4315415

topperc added 3 commits September 13, 2024 11:41

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

be725f5

fixup! Add a pseudo MV instruction for copy so we can compress it.

3f61dbb

fixup! Add load/store to other locations that reference RISCV::LH/SH

1d4f4b6

fixup! More updates for RISCVMakeCompressible.

a209293

topperc added 3 commits September 16, 2024 13:57

fixup! testing and fixes for RISCVMakeCompressible

56c3b94

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

7827646

fixup! clang-format

025652c

topperc requested a review from asb September 20, 2024 16:30

asb approved these changes Sep 26, 2024

View reviewed changes

topperc added 2 commits September 26, 2024 07:40

Merge remote-tracking branch 'origin/main' into pr/zhinx-subreg

09f963a

fixup! Address comments.

9d3176b

topperc merged commit 8a7843c into llvm:main Sep 27, 2024
8 checks passed

topperc deleted the pr/zhinx-subreg branch September 27, 2024 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446

[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446

Uh oh!

topperc commented Sep 5, 2024

Uh oh!

llvmbot commented Sep 5, 2024

Uh oh!

Uh oh!

preames commented Sep 6, 2024

Uh oh!

jrtc27 commented Sep 6, 2024

Uh oh!

preames commented Sep 6, 2024

Uh oh!

asb left a comment •

edited

Loading

Uh oh!

topperc commented Sep 10, 2024

Uh oh!

github-actions bot commented Sep 15, 2024 •

edited

Loading

Uh oh!

topperc commented Sep 20, 2024

Uh oh!

asb left a comment

Uh oh!

asb Sep 26, 2024

Uh oh!

asb Sep 26, 2024

Uh oh!

Uh oh!

Uh oh!

[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446

[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446

Uh oh!

Conversation

topperc commented Sep 5, 2024

Uh oh!

llvmbot commented Sep 5, 2024

Uh oh!

Uh oh!

preames commented Sep 6, 2024

Uh oh!

jrtc27 commented Sep 6, 2024

Uh oh!

preames commented Sep 6, 2024

Uh oh!

asb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc commented Sep 10, 2024

Uh oh!

github-actions bot commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Sep 20, 2024

Uh oh!

asb left a comment

Choose a reason for hiding this comment

Uh oh!

asb Sep 26, 2024

Choose a reason for hiding this comment

Uh oh!

asb Sep 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asb left a comment •

edited

Loading

github-actions bot commented Sep 15, 2024 •

edited

Loading