Skip to content

Commit 067caaa

Browse files
authored
[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)
AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.
1 parent c4f723a commit 067caaa

12 files changed

+4823
-676
lines changed

llvm/include/llvm/CodeGen/MachineRegisterInfo.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
#include "llvm/ADT/iterator_range.h"
2424
#include "llvm/CodeGen/MachineBasicBlock.h"
2525
#include "llvm/CodeGen/MachineFunction.h"
26+
#include "llvm/CodeGen/MachineInstr.h"
2627
#include "llvm/CodeGen/MachineInstrBundle.h"
2728
#include "llvm/CodeGen/MachineOperand.h"
2829
#include "llvm/CodeGen/RegisterBank.h"
@@ -585,6 +586,9 @@ class MachineRegisterInfo {
585586
/// multiple uses.
586587
bool hasOneNonDBGUser(Register RegNo) const;
587588

589+
/// If the register has a single non-Debug instruction using the specified
590+
/// register, returns it; otherwise returns nullptr.
591+
MachineInstr *getOneNonDBGUser(Register RegNo) const;
588592

589593
/// hasAtMostUses - Return true if the given register has at most \p MaxUsers
590594
/// non-debug user instructions.

llvm/lib/CodeGen/MachineRegisterInfo.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,11 @@ bool MachineRegisterInfo::hasOneNonDBGUser(Register RegNo) const {
432432
return hasSingleElement(use_nodbg_instructions(RegNo));
433433
}
434434

435+
MachineInstr *MachineRegisterInfo::getOneNonDBGUser(Register RegNo) const {
436+
auto RegNoDbgUsers = use_nodbg_instructions(RegNo);
437+
return hasSingleElement(RegNoDbgUsers) ? &*RegNoDbgUsers.begin() : nullptr;
438+
}
439+
435440
bool MachineRegisterInfo::hasAtMostUserInstrs(Register Reg,
436441
unsigned MaxUsers) const {
437442
return hasNItemsOrLess(use_instr_nodbg_begin(Reg), use_instr_nodbg_end(),

llvm/lib/Target/AMDGPU/GCNRegPressure.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,20 @@ struct GCNRegPressure {
5353
/// UnifiedVGPRFile
5454
unsigned getVGPRNum(bool UnifiedVGPRFile) const {
5555
if (UnifiedVGPRFile) {
56-
return Value[AGPR32] ? alignTo(Value[VGPR32], 4) + Value[AGPR32]
56+
return Value[AGPR32] ? getUnifiedVGPRNum(Value[VGPR32], Value[AGPR32])
5757
: Value[VGPR32] + Value[AGPR32];
5858
}
5959
return std::max(Value[VGPR32], Value[AGPR32]);
6060
}
61+
62+
/// Returns the aggregated VGPR pressure, assuming \p NumArchVGPRs ArchVGPRs
63+
/// and \p NumAGPRs AGPRS, for a target with a unified VGPR file.
64+
inline static unsigned getUnifiedVGPRNum(unsigned NumArchVGPRs,
65+
unsigned NumAGPRs) {
66+
return alignTo(NumArchVGPRs, AMDGPU::IsaInfo::getArchVGPRAllocGranule()) +
67+
NumAGPRs;
68+
}
69+
6170
/// \returns the ArchVGPR32 pressure
6271
unsigned getArchVGPRNum() const { return Value[VGPR32]; }
6372
/// \returns the AccVGPR32 pressure

0 commit comments

Comments
 (0)