-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV] Add 16 bit GPR sub-register for Zhinx. #107446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next. After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mip64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward.
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesThis patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next. After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class. Patch is 62.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107446.diff 21 Files Affected:
diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
index 6d33a1f64195d5..de9591d4cf72ac 100644
--- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
+++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
@@ -480,7 +480,13 @@ struct RISCVOperand final : public MCParsedAsmOperand {
RISCVMCRegisterClasses[RISCV::GPRRegClassID].contains(Reg.RegNum);
}
+ bool isGPRF16() const {
+ return Kind == KindTy::Register &&
+ RISCVMCRegisterClasses[RISCV::GPRF16RegClassID].contains(Reg.RegNum);
+ }
+
bool isGPRAsFPR() const { return isGPR() && Reg.IsGPRAsFPR; }
+ bool isGPRAsFPR16() const { return isGPRF16() && Reg.IsGPRAsFPR; }
bool isGPRPair() const {
return Kind == KindTy::Register &&
@@ -1341,6 +1347,10 @@ unsigned RISCVAsmParser::validateTargetOperandClass(MCParsedAsmOperand &AsmOp,
Op.Reg.RegNum = convertFPR64ToFPR16(Reg);
return Match_Success;
}
+ if (Kind == MCK_GPRAsFPR16 && Op.isGPRAsFPR()) {
+ Op.Reg.RegNum = Reg - RISCV::X0 + RISCV::X0_H;
+ return Match_Success;
+ }
// As the parser couldn't differentiate an VRM2/VRM4/VRM8 from an VR, coerce
// the register from VR to VRM2/VRM4/VRM8 if necessary.
if (IsRegVR && (Kind == MCK_VRM2 || Kind == MCK_VRM4 || Kind == MCK_VRM8)) {
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 23897e2d98f634..76ca7728ebdd3a 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -81,6 +81,19 @@ static DecodeStatus DecodeGPRRegisterClass(MCInst &Inst, uint32_t RegNo,
return MCDisassembler::Success;
}
+static DecodeStatus DecodeGPRF16RegisterClass(MCInst &Inst, uint32_t RegNo,
+ uint64_t Address,
+ const MCDisassembler *Decoder) {
+ bool IsRVE = Decoder->getSubtargetInfo().hasFeature(RISCV::FeatureStdExtE);
+
+ if (RegNo >= 32 || (IsRVE && RegNo >= 16))
+ return MCDisassembler::Fail;
+
+ MCRegister Reg = RISCV::X0_H + RegNo;
+ Inst.addOperand(MCOperand::createReg(Reg));
+ return MCDisassembler::Success;
+}
+
static DecodeStatus DecodeGPRX1X5RegisterClass(MCInst &Inst, uint32_t RegNo,
uint64_t Address,
const MCDisassembler *Decoder) {
diff --git a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
index cce0ffe16e5fe3..713c7a0661defe 100644
--- a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
+++ b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
@@ -93,14 +93,19 @@ bool RISCVDeadRegisterDefinitions::runOnMachineFunction(MachineFunction &MF) {
continue;
LLVM_DEBUG(dbgs() << " Dead def operand #" << I << " in:\n ";
MI.print(dbgs()));
+ Register X0Reg;
const TargetRegisterClass *RC = TII->getRegClass(Desc, I, TRI, MF);
- if (!(RC && RC->contains(RISCV::X0))) {
+ if (RC && RC->contains(RISCV::X0)) {
+ X0Reg = RISCV::X0;
+ } else if (RC && RC->contains(RISCV::X0_H)) {
+ X0Reg = RISCV::X0_H;
+ } else {
LLVM_DEBUG(dbgs() << " Ignoring, register is not a GPR.\n");
continue;
}
assert(LIS.hasInterval(Reg));
LIS.removeInterval(Reg);
- MO.setReg(RISCV::X0);
+ MO.setReg(X0Reg);
LLVM_DEBUG(dbgs() << " Replacing with zero register. New:\n ";
MI.print(dbgs()));
++NumDeadDefsReplaced;
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 4580f3191d1389..d8db2694213c17 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -959,7 +959,10 @@ void RISCVDAGToDAGISel::Select(SDNode *Node) {
}
SDNode *Res;
- if (Opc == RISCV::FCVT_D_W_IN32X || Opc == RISCV::FCVT_D_W)
+ if (VT.SimpleTy == MVT::f16 && Opc == RISCV::COPY) {
+ Res =
+ CurDAG->getTargetExtractSubreg(RISCV::sub_16, DL, VT, Imm).getNode();
+ } else if (Opc == RISCV::FCVT_D_W_IN32X || Opc == RISCV::FCVT_D_W)
Res = CurDAG->getMachineNode(
Opc, DL, VT, Imm,
CurDAG->getTargetConstant(RISCVFPRndMode::RNE, DL, XLenVT));
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 6b4219b4623847..eb957e24ccee86 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -19059,6 +19059,23 @@ ArrayRef<MCPhysReg> RISCV::getArgGPRs(const RISCVABI::ABI ABI) {
return ArrayRef(ArgIGPRs);
}
+static ArrayRef<MCPhysReg> getArgGPR16s(const RISCVABI::ABI ABI) {
+ // The GPRs used for passing arguments in the ILP32* and LP64* ABIs, except
+ // the ILP32E ABI.
+ static const MCPhysReg ArgIGPRs[] = {RISCV::X10_H, RISCV::X11_H, RISCV::X12_H,
+ RISCV::X13_H, RISCV::X14_H, RISCV::X15_H,
+ RISCV::X16_H, RISCV::X17_H};
+ // The GPRs used for passing arguments in the ILP32E/ILP64E ABI.
+ static const MCPhysReg ArgEGPRs[] = {RISCV::X10_H, RISCV::X11_H,
+ RISCV::X12_H, RISCV::X13_H,
+ RISCV::X14_H, RISCV::X15_H};
+
+ if (ABI == RISCVABI::ABI_ILP32E || ABI == RISCVABI::ABI_LP64E)
+ return ArrayRef(ArgEGPRs);
+
+ return ArrayRef(ArgIGPRs);
+}
+
static ArrayRef<MCPhysReg> getFastCCArgGPRs(const RISCVABI::ABI ABI) {
// The GPRs used for passing arguments in the FastCC, X5 and X6 might be used
// for save-restore libcall, so we don't use them.
@@ -19077,6 +19094,26 @@ static ArrayRef<MCPhysReg> getFastCCArgGPRs(const RISCVABI::ABI ABI) {
return ArrayRef(FastCCIGPRs);
}
+static ArrayRef<MCPhysReg> getFastCCArgGPRF16s(const RISCVABI::ABI ABI) {
+ // The GPRs used for passing arguments in the FastCC, X5 and X6 might be used
+ // for save-restore libcall, so we don't use them.
+ // Don't use X7 for fastcc, since Zicfilp uses X7 as the label register.
+ static const MCPhysReg FastCCIGPRs[] = {
+ RISCV::X10_H, RISCV::X11_H, RISCV::X12_H, RISCV::X13_H,
+ RISCV::X14_H, RISCV::X15_H, RISCV::X16_H, RISCV::X17_H,
+ RISCV::X28_H, RISCV::X29_H, RISCV::X30_H, RISCV::X31_H};
+
+ // The GPRs used for passing arguments in the FastCC when using ILP32E/ILP64E.
+ static const MCPhysReg FastCCEGPRs[] = {RISCV::X10_H, RISCV::X11_H,
+ RISCV::X12_H, RISCV::X13_H,
+ RISCV::X14_H, RISCV::X15_H};
+
+ if (ABI == RISCVABI::ABI_ILP32E || ABI == RISCVABI::ABI_LP64E)
+ return ArrayRef(FastCCEGPRs);
+
+ return ArrayRef(FastCCIGPRs);
+}
+
// Pass a 2*XLEN argument that has been split into two XLEN values through
// registers or the stack as necessary.
static bool CC_RISCVAssign2XLen(unsigned XLen, CCState &State, CCValAssign VA1,
@@ -19225,6 +19262,15 @@ bool RISCV::CC_RISCV(const DataLayout &DL, RISCVABI::ABI ABI, unsigned ValNo,
// similar local variables rather than directly checking against the target
// ABI.
+ const RISCVSubtarget &STI =
+ State.getMachineFunction().getSubtarget<RISCVSubtarget>();
+ if ((ValVT == MVT::f16 && STI.hasStdExtZhinxmin())) {
+ if (MCRegister Reg = State.AllocateReg(getArgGPR16s(ABI))) {
+ State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
+ return false;
+ }
+ }
+
ArrayRef<MCPhysReg> ArgGPRs = RISCV::getArgGPRs(ABI);
if (UseGPRForF16_F32 && (ValVT == MVT::f16 || ValVT == MVT::bf16 ||
@@ -19685,8 +19731,7 @@ bool RISCV::CC_RISCV_FastCC(const DataLayout &DL, RISCVABI::ABI ABI,
}
// Check if there is an available GPR before hitting the stack.
- if ((LocVT == MVT::f16 && Subtarget.hasStdExtZhinxmin()) ||
- (LocVT == MVT::f32 && Subtarget.hasStdExtZfinx()) ||
+ if ((LocVT == MVT::f32 && Subtarget.hasStdExtZfinx()) ||
(LocVT == MVT::f64 && Subtarget.is64Bit() &&
Subtarget.hasStdExtZdinx())) {
if (MCRegister Reg = State.AllocateReg(getFastCCArgGPRs(ABI))) {
@@ -19703,6 +19748,14 @@ bool RISCV::CC_RISCV_FastCC(const DataLayout &DL, RISCVABI::ABI ABI,
}
}
+ // Check if there is an available GPRF16 before hitting the stack.
+ if ((LocVT == MVT::f16 && Subtarget.hasStdExtZhinxmin())) {
+ if (MCRegister Reg = State.AllocateReg(getFastCCArgGPRF16s(ABI))) {
+ State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
+ return false;
+ }
+ }
+
if (LocVT == MVT::f16) {
unsigned Offset2 = State.AllocateStack(2, Align(2));
State.addLoc(CCValAssign::getMem(ValNo, ValVT, Offset2, LocVT, LocInfo));
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 0a64a8e1440084..cb1840a2c60130 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -452,6 +452,23 @@ void RISCVInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
return;
}
+ if (RISCV::GPRF16RegClass.contains(DstReg, SrcReg)) {
+ if (STI.hasStdExtZhinx()) {
+ BuildMI(MBB, MBBI, DL, get(RISCV::FSGNJ_H_INX), DstReg)
+ .addReg(SrcReg, getKillRegState(KillSrc))
+ .addReg(SrcReg, getKillRegState(KillSrc));
+ return;
+ }
+ DstReg =
+ TRI->getMatchingSuperReg(DstReg, RISCV::sub_16, &RISCV::GPRRegClass);
+ SrcReg =
+ TRI->getMatchingSuperReg(SrcReg, RISCV::sub_16, &RISCV::GPRRegClass);
+ BuildMI(MBB, MBBI, DL, get(RISCV::ADDI), DstReg)
+ .addReg(SrcReg, getKillRegState(KillSrc))
+ .addImm(0);
+ return;
+ }
+
if (RISCV::GPRPairRegClass.contains(DstReg, SrcReg)) {
// Emit an ADDI for both parts of GPRPair.
BuildMI(MBB, MBBI, DL, get(RISCV::ADDI),
@@ -573,6 +590,9 @@ void RISCVInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
Opcode = TRI->getRegSizeInBits(RISCV::GPRRegClass) == 32 ?
RISCV::SW : RISCV::SD;
IsScalableVector = false;
+ } else if (RISCV::GPRF16RegClass.hasSubClassEq(RC)) {
+ Opcode = RISCV::SH_INX;
+ IsScalableVector = false;
} else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) {
Opcode = RISCV::PseudoRV32ZdinxSD;
IsScalableVector = false;
@@ -656,6 +676,9 @@ void RISCVInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
Opcode = TRI->getRegSizeInBits(RISCV::GPRRegClass) == 32 ?
RISCV::LW : RISCV::LD;
IsScalableVector = false;
+ } else if (RISCV::GPRF16RegClass.hasSubClassEq(RC)) {
+ Opcode = RISCV::LH_INX;
+ IsScalableVector = false;
} else if (RISCV::GPRPairRegClass.hasSubClassEq(RC)) {
Opcode = RISCV::PseudoRV32ZdinxLD;
IsScalableVector = false;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 6d0952a42eda9f..deb7c8b8435b8b 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -514,8 +514,8 @@ class BranchCC_rri<bits<3> funct3, string opcodestr>
}
let hasSideEffects = 0, mayLoad = 1, mayStore = 0 in {
-class Load_ri<bits<3> funct3, string opcodestr>
- : RVInstI<funct3, OPC_LOAD, (outs GPR:$rd), (ins GPRMem:$rs1, simm12:$imm12),
+class Load_ri<bits<3> funct3, string opcodestr, DAGOperand rty = GPR>
+ : RVInstI<funct3, OPC_LOAD, (outs rty:$rd), (ins GPRMem:$rs1, simm12:$imm12),
opcodestr, "$rd, ${imm12}(${rs1})">;
class HLoad_r<bits<7> funct7, bits<5> funct5, string opcodestr>
@@ -529,9 +529,9 @@ class HLoad_r<bits<7> funct7, bits<5> funct5, string opcodestr>
// reflecting the order these fields are specified in the instruction
// encoding.
let hasSideEffects = 0, mayLoad = 0, mayStore = 1 in {
-class Store_rri<bits<3> funct3, string opcodestr>
+class Store_rri<bits<3> funct3, string opcodestr, DAGOperand rty = GPR>
: RVInstS<funct3, OPC_STORE, (outs),
- (ins GPR:$rs2, GPRMem:$rs1, simm12:$imm12),
+ (ins rty:$rs2, GPRMem:$rs1, simm12:$imm12),
opcodestr, "$rs2, ${imm12}(${rs1})">;
class HStore_rr<bits<7> funct7, string opcodestr>
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
index 792cb7fa6dbc2f..7f417d29fc6c2d 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
@@ -33,9 +33,15 @@ def riscv_fmv_x_signexth
// Zhinxmin and Zhinx
+def GPRAsFPR16 : AsmOperandClass {
+ let Name = "GPRAsFPR16";
+ let ParserMethod = "parseGPRAsFPR";
+ let RenderMethod = "addRegOperands";
+}
+
def FPR16INX : RegisterOperand<GPRF16> {
- let ParserMatchClass = GPRAsFPR;
- let DecoderMethod = "DecodeGPRRegisterClass";
+ let ParserMatchClass = GPRAsFPR16;
+ let DecoderMethod = "DecodeGPRF16RegisterClass";
}
def ZfhExt : ExtInfo<"", "", [HasStdExtZfh],
@@ -84,6 +90,12 @@ def FLH : FPLoad_r<0b001, "flh", FPR16, WriteFLD16>;
def FSH : FPStore_r<0b001, "fsh", FPR16, WriteFST16>;
} // Predicates = [HasHalfFPLoadStoreMove]
+let Predicates = [HasStdExtZhinxmin], isCodeGenOnly = 1 in {
+def LH_INX : Load_ri<0b001, "lh", GPRF16>, Sched<[WriteLDH, ReadMemBase]>;
+def SH_INX : Store_rri<0b001, "sh", GPRF16>,
+ Sched<[WriteSTH, ReadStoreData, ReadMemBase]>;
+}
+
foreach Ext = ZfhExts in {
let SchedRW = [WriteFMA16, ReadFMA16, ReadFMA16, ReadFMA16Addend] in {
defm FMADD_H : FPFMA_rrr_frm_m<OPC_MADD, 0b10, "fmadd.h", Ext>;
@@ -426,13 +438,10 @@ let Predicates = [HasStdExtZhinxmin] in {
defm Select_FPR16INX : SelectCC_GPR_rrirr<FPR16INX, f16>;
/// Loads
-def : Pat<(f16 (load (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
- (COPY_TO_REGCLASS (LH GPR:$rs1, simm12:$imm12), GPRF16)>;
+def : LdPat<load, LH_INX, f16>;
/// Stores
-def : Pat<(store (f16 FPR16INX:$rs2),
- (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
- (SH (COPY_TO_REGCLASS FPR16INX:$rs2, GPR), GPR:$rs1, simm12:$imm12)>;
+def : StPat<store, SH_INX, GPRF16, f16>;
} // Predicates = [HasStdExtZhinxmin]
let Predicates = [HasStdExtZfhmin] in {
@@ -458,8 +467,8 @@ def : Pat<(any_fpround FPR32INX:$rs1), (FCVT_H_S_INX FPR32INX:$rs1, FRM_DYN)>;
def : Pat<(any_fpextend FPR16INX:$rs1), (FCVT_S_H_INX FPR16INX:$rs1, FRM_RNE)>;
// Moves (no conversion)
-def : Pat<(f16 (riscv_fmv_h_x GPR:$src)), (COPY_TO_REGCLASS GPR:$src, GPR)>;
-def : Pat<(riscv_fmv_x_anyexth FPR16INX:$src), (COPY_TO_REGCLASS FPR16INX:$src, GPR)>;
+def : Pat<(f16 (riscv_fmv_h_x GPR:$src)), (EXTRACT_SUBREG GPR:$src, sub_16)>;
+def : Pat<(riscv_fmv_x_anyexth FPR16INX:$src), (INSERT_SUBREG (XLenVT (IMPLICIT_DEF)), FPR16INX:$src, sub_16)>;
def : Pat<(fcopysign FPR32INX:$rs1, FPR16INX:$rs2), (FSGNJ_S_INX $rs1, (FCVT_S_H_INX $rs2, FRM_RNE))>;
} // Predicates = [HasStdExtZhinxmin]
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 701594c0fb05dc..2bd41386e2dfb9 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -115,11 +115,11 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
}
// Use markSuperRegs to ensure any register aliases are also reserved
- markSuperRegs(Reserved, RISCV::X2); // sp
- markSuperRegs(Reserved, RISCV::X3); // gp
- markSuperRegs(Reserved, RISCV::X4); // tp
+ markSuperRegs(Reserved, RISCV::X2_H); // sp
+ markSuperRegs(Reserved, RISCV::X3_H); // gp
+ markSuperRegs(Reserved, RISCV::X4_H); // tp
if (TFI->hasFP(MF))
- markSuperRegs(Reserved, RISCV::X8); // fp
+ markSuperRegs(Reserved, RISCV::X8_H); // fp
// Reserve the base register if we need to realign the stack and allocate
// variable-sized objects at runtime.
if (TFI->hasBP(MF))
@@ -131,7 +131,7 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
// There are only 16 GPRs for RVE.
if (Subtarget.hasStdExtE())
- for (MCPhysReg Reg = RISCV::X16; Reg <= RISCV::X31; Reg++)
+ for (MCPhysReg Reg = RISCV::X16_H; Reg <= RISCV::X31_H; Reg++)
markSuperRegs(Reserved, Reg);
// V registers for code generation. We handle them manually.
@@ -150,8 +150,8 @@ BitVector RISCVRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
if (MF.getFunction().getCallingConv() == CallingConv::GRAAL) {
if (Subtarget.hasStdExtE())
report_fatal_error("Graal reserved registers do not exist in RVE");
- markSuperRegs(Reserved, RISCV::X23);
- markSuperRegs(Reserved, RISCV::X27);
+ markSuperRegs(Reserved, RISCV::X23_H);
+ markSuperRegs(Reserved, RISCV::X27_H);
}
// Shadow stack pointer.
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 5725d8eda88ced..37a1643ef5236f 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -83,41 +83,81 @@ def sub_gpr_odd : SubRegIndex<32, 32> {
let RegAltNameIndices = [ABIRegAltName] in {
let isConstant = true in
- def X0 : RISCVReg<0, "x0", ["zero"]>, DwarfRegNum<[0]>;
+ def X0_H : RISCVReg<0, "x0", ["zero"]>;
let CostPerUse = [0, 1] in {
- def X1 : RISCVReg<1, "x1", ["ra"]>, DwarfRegNum<[1]>;
- def X2 : RISCVReg<2, "x2", ["sp"]>, DwarfRegNum<[2]>;
- def X3 : RISCVReg<3, "x3", ["gp"]>, DwarfRegNum<[3]>;
- def X4 : RISCVReg<4, "x4", ["tp"]>, DwarfRegNum<[4]>;
- def X5 : RISCVReg<5, "x5", ["t0"]>, DwarfRegNum<[5]>;
- def X6 : RISCVReg<6, "x6", ["t1"]>, DwarfRegNum<[6]>;
- def X7 : RISCVReg<7, "x7", ["t2"]>, DwarfRegNum<[7]>;
+ def X1_H : RISCVReg<1, "x1", ["ra"]>;
+ def X2_H : RISCVReg<2, "x2", ["sp"]>;
+ def X3_H : RISCVReg<3, "x3", ["gp"]>;
+ def X4_H : RISCVReg<4, "x4", ["tp"]>;
+ def X5_H : RISCVReg<5, "x5", ["t0"]>;
+ def X6_H : RISCVReg<6, "x6", ["t1"]>;
+ def X7_H : RISCVReg<7, "x7", ["t2"]>;
}
- def X8 : RISCVReg<8, "x8", ["s0", "fp"]>, DwarfRegNum<[8]>;
- def X9 : RISCVReg<9, "x9", ["s1"]>, DwarfRegNum<[9]>;
- def X10 : RISCVReg<10,"x10", ["a0"]>, DwarfRegNum<[10]>;
- def X11 : RISCVReg<11,"x11", ["a1"]>, DwarfRegNum<[11]>;
- def X12 : RISCVReg<12,"x12", ["a2"]>, DwarfRegNum<[12]>;
- def X13 : RISCVReg<13,"x13", ["a3"]>, DwarfRegNum<[13]>;
- def X14 : RISCVReg<14,"x14", ["a4"]>, DwarfRegNum<[14]>;
- def X15 : RISCVReg<15,"x15", ["a5"]>, DwarfRegNum<[15]>;
+ def X8_H : RISCVReg<8, "x8", ["s0", "fp"]>;
+ def X9_H : RISCVReg<9, "x9", ["s1"]>;
+ def X10_H : RISCVReg<10,"x10", ["a0"]>;
+ def X11_H : RISCVReg<11,"x11", ["a1"]>;
+ def X12_H : RISCVReg<12,"x12", ["a2"]>;
+ def X13_H : RISCVReg<13,"x13", ["a3"]>;
+ def X14_H : RISCVReg<14,"x14", ["a4"]>;
+ def X15_H : RISCVReg<15,"x15", ["a5"]>;
let CostPerUse = [0, 1] in {
- def X16 : RISCVReg<16,"x16", ["a6"]>, DwarfRegNum<[16]>;
- def X17 : RISCVReg<17,"x17", ["a7"]>, DwarfRegNum<[17]>;
- def X18 : RISCVReg<18,"x18", ["s2"]>, DwarfRegNum<[18]>;
- def X19 : RISCVReg<19,"x19", ["s3"]>, DwarfRegNum<[19]>;
- def X20 : RISCVReg<20,"x20", ["s4"]>, DwarfRegNum<[20]>;
- def X21 : RISCVReg<21,"x21", ["s5"]>, DwarfRegNum<[21]>;
- def X22 : RISCVReg<22,"x22", ["s6"]>, DwarfRegNum<[22]>;
- def X23 : RISCVReg<23,"x23", ["s7"]>, DwarfRegNum<[23]>;
- def X24 : RISCVReg<24,"x24", ["s8"]>, DwarfRegNum<[24]>;
- def X25 : RISCVReg<25,"x25", ["s9"]>, DwarfRegNum<[25]>;
- def X26 : RISCVReg<26,"x26", ["s10"]>, DwarfRegNum<[26]>;
- def X27 : RISCVReg<27,"x27", ["s11"]>, DwarfRegNum<[27]>;
- def X28 : RISCVReg<28,"x28", ["t3"]>, DwarfRegNum<[28]>;
- def X29 : RISCVReg<29,"x29", ["t4"]>, DwarfRegNum<[29]>;
- def X30 : RISCVReg<30,"x30", ["t5"]>, DwarfRegNum<[30]>;
- def X31 : RISCVReg<31,"x31", ["t6"]>, DwarfRegNum<[31]>;
+ def X16_H : RISCVReg<16,"x16", ["a6"]>;
+ def X17_H : RISCVReg<17,"x17", ["a7"]>;
+ def X18_H : RISCVReg<18,"x18", ["s2"]>;
+ def X19_H : RISCVReg<19,"x19", ["s3"]>;
+ def X20_H : RISCVReg<20,"x20", ["s4"]>;
+ def X21_H : RISCVReg<21,"x21", ["s5"]>;
+ def X22_H : RISCVReg<22,"x22", ["s6"]>;
+ def X23_H : RISCVReg<23,"x23", ["s7"]>;
+ def X24_H : RISCVReg<24,"x24", ["s8"]>;
+ def X25_H : RISCVReg<25,"x25", ["s9"]>;
+ def X26_H : RISCVReg<26,"x26", ["s10"]>;
+ def X27_H : RISCVReg<27,"x27", ["s11"]>;
+ def X28_H ...
[truncated]
|
With Zfinx/Zdinx, f32/f64 are legal types for a GPR, we don't need a bitcast. This avoids turning fneg/fabs into bitwise operations purely because of these bitcasts. If the bitwise operations are faster for some reason on a Zfinx CPU, then that seems like it should be done for all fneg/fabs, not just ones near function arguments/returns. I don't have much interest in Zfinx, this just makes the code more similar to what I proposed for Zhinx in llvm#107446.
This goes beyond what I can meaningfully review, but I'll note that I see nothing to particularly object to here. Naming wise, _H is, I assume, for half. Since we'll probably use these for bf16, and possible someday i16, maybe pick a different name suffix? |
FLH and LH both use half as the term, and the format of the float shouldn’t affect the load or store, just the interpretation for arithmetic purposes, so _H seems like the appropriate thing to use for me? |
Good point, naming suggestion withdrawn. |
…7464) With Zfinx/Zdinx, f32/f64 are legal types for a GPR, we don't need a bitcast. This avoids turning fneg/fabs into bitwise operations purely because of these bitcasts. If the bitwise operations are faster for some reason on a Zfinx CPU, then that seems like it should be done for all fneg/fabs, not just ones near function arguments/returns. I don't have much interest in Zfinx, this just makes the code more similar to what I proposed for Zhinx in #107446.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in that the reasoning makes sense to me and I can't see problems from going through the code. That said, it's been a while since I added a new register class.
I think it would be nice to have a brief comment in RISCVRegisterInfo.td explaining why we're defining this register class and how it's used. Perhaps that makes more sense once you make similar changes for zfinx, but in general I find such in-tree documentation of intent really helpful.
You say that fsgnj gives better results vs mv - I'm assuming this is for real world code? Because for the modified in-tree tests, surely using mv
where possible is better due to being potentially compressible? Happy to go with your judgement, but it probably deserves a TODO/FIXME somewhere like the tests to note that a semantically equivalent mv would be more compressible (and so someone sufficiently motivated might later pick it up).
EDIT to add:
After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class.
This definitely sounds like an interesting direction. And if we can demonstrate adding those missing instructions makes life easier for the compiler, it's probably worth at least discussing in e.g. the scalar efficiency working group.
The subreg operations needed to use the existing ADDI seem to pessimize Machine Copy Propagation on some of the lit tests. I think this is because some implicit operands get added and MachineCopyPropagation doesn't know how those implicit operands relate to the explicit operands so it conservatively gives up. I have not done any testing of this on real world code. I suspect a GPRF16 CodeGenOnly copy instruction may fix this issue, but haven't checked yet. |
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.
✅ With the latest revision this PR passed the C/C++ code formatter. |
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.
@asb can you re-review this? I added compressed versions of the load/store CodeGenOnly instructions, a pseudo for mv, updated MakeCompressible, MergeBaseOffset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Sorry for missing your request to re-review. I'm still missing the setup I had with Phabricator Herald :( But I'll try to improve my workflow.
static const MCPhysReg ArgIGPRs[] = {RISCV::X10_H, RISCV::X11_H, RISCV::X12_H, | ||
RISCV::X13_H, RISCV::X14_H, RISCV::X15_H, | ||
RISCV::X16_H, RISCV::X17_H}; | ||
// The GPRs used for passing arguments in the ILP32E/ILP64E ABI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: ILP64E => LP64E (I've just committed a fix to other similar instances in this file).
@@ -1505,6 +1520,9 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const { | |||
} | |||
|
|||
switch (Opcode) { | |||
case RISCV::PseudoMV_FPR16INX: | |||
// MV is always compressible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment made me double check, so sharing that process: Although we'd expect c.li
to be used instead, c.mv
can't handle an X0 operand. But from what I can see, we expand to ADDI and we have the obvious CompressPat from ADDI with X0 operand to c.li.
Perhaps expand the comment to "MV is always compressible (to either c.mv or c.li rd, 0)."
This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits. I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations. Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes. I plan to extend this idea to Zfinx next.
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF32 copy with Zfinx as that gave better results from MachineCopyPropagation. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to #107446 which adds a 16 bit register class.
This patches adds a 32 bit register class for use with Zfinx instructions. This makes them more similar to F instructions and allows us to only spill 32 bits. I've added CodeGenOnly instructions for load/store using GPRF32 as that gave better results than insert_subreg/extract_subreg. Function arguments use this new GPRF32 register class for f32 arguments with Zfinx. Eliminating the need to use RISCVISD::FMV* nodes. This is similar to llvm#107446 which adds a 16 bit register class.
This patches adds a 16 bit register class for use with Zhinx instructions. This makes them more similar to Zfh instructions and allows us to only spill 16 bits.
I've added CodeGenOnly instructions for load/store using GPRF16 as that gave better results than insert_subreg/extract_subreg. I'm using FSGNJ for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will use ADDI+subreg operations.
Function arguments use this new GPRF16 register class for f16 arguments with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes.
I plan to extend this idea to Zfinx next.
After that, I want to try to extend this to 32 bit integer W instructions. My thought is that we can arrange to have all writes to the 32 bit GPR guarantee sign extension similar to how Mips64 is handled. Unfortunately, we are missing some W instructions in Zba and Zbs that would make this straightforward. I want to see if this is more viable than my previous attempt at making i32 a legal type which kept it inside the existing 64-bit GPR class.