Skip to content

[AMDGPU] Add target hook to isGlobalMemoryObject #112781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/include/llvm/CodeGen/TargetInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ class TargetInstrInfo : public MCInstrInfo {
const TargetRegisterInfo *TRI,
const MachineFunction &MF) const;

/// Returns true if MI is an instruction we are unable to reason about
/// (like a call or something with unmodeled side effects).
virtual bool isGlobalMemoryObject(const MachineInstr *MI) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the pattern that isCopyInstr uses. Have the virtual overridable Impl function, with the generic cases handled directly


/// Return true if the instruction is trivially rematerializable, meaning it
/// has no side effects and requires no operands that aren't always available.
/// This means the only allowed uses are constants and unallocatable physical
Expand Down
10 changes: 3 additions & 7 deletions llvm/lib/CodeGen/ScheduleDAGInstrs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/ScheduleDFS.h"
#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/Config/llvm-config.h"
Expand Down Expand Up @@ -547,12 +548,6 @@ void ScheduleDAGInstrs::addVRegUseDeps(SUnit *SU, unsigned OperIdx) {
}
}

/// Returns true if MI is an instruction we are unable to reason about
/// (like a call or something with unmodeled side effects).
static inline bool isGlobalMemoryObject(MachineInstr *MI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and TII::isSchedulingBoundary?

Also wondering if we could treat these like isPosition

Copy link
Member Author

@kerbowa kerbowa Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and TII::isSchedulingBoundary?

isSchedulingBoundary creates divisions between scheduling regions, it separates and defines independent ScheduleDAGs. isGlobalMemoryObject controls dependency edges within a ScheduleDAG.

Also wondering if we could treat these like isPosition

In what way? You mean treat these as debug info instead of instructions?

Copy link
Contributor

@arsenm arsenm Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean make it an instruction property, rather than introducing a very specific override for this one case. Are you allowed to put other isLabel or isPosition instructions anywhere in the block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if an existing instruction property would work, isPosition makes an instruction a scheduling boundary and would split the block as far as the scheduler is concerned.

return MI->isCall() || MI->hasUnmodeledSideEffects() ||
(MI->hasOrderedMemoryRef() && !MI->isDereferenceableInvariantLoad());
}

void ScheduleDAGInstrs::addChainDependency (SUnit *SUa, SUnit *SUb,
unsigned Latency) {
Expand Down Expand Up @@ -899,8 +894,9 @@ void ScheduleDAGInstrs::buildSchedGraph(AAResults *AA,
// isLoadFromStackSLot are not usable after stack slots are lowered to
// actual addresses).

const TargetInstrInfo *TII = ST.getInstrInfo();
// This is a barrier event that acts as a pivotal node in the DAG.
if (isGlobalMemoryObject(&MI)) {
if (TII->isGlobalMemoryObject(&MI)) {

// Become the barrier chain.
if (BarrierChain)
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/CodeGen/TargetInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1917,3 +1917,8 @@ bool TargetInstrInfo::isMBBSafeToOutlineFrom(MachineBasicBlock &MBB,
}
return true;
}

bool TargetInstrInfo::isGlobalMemoryObject(const MachineInstr *MI) const {
return MI->isCall() || MI->hasUnmodeledSideEffects() ||
(MI->hasOrderedMemoryRef() && !MI->isDereferenceableInvariantLoad());
}
21 changes: 0 additions & 21 deletions llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -239,23 +239,6 @@ class SchedGroup {
}
};

// Remove all existing edges from a SCHED_BARRIER or SCHED_GROUP_BARRIER.
static void resetEdges(SUnit &SU, ScheduleDAGInstrs *DAG) {
assert(SU.getInstr()->getOpcode() == AMDGPU::SCHED_BARRIER ||
SU.getInstr()->getOpcode() == AMDGPU::SCHED_GROUP_BARRIER ||
SU.getInstr()->getOpcode() == AMDGPU::IGLP_OPT);

while (!SU.Preds.empty())
for (auto &P : SU.Preds)
SU.removePred(P);

while (!SU.Succs.empty())
for (auto &S : SU.Succs)
for (auto &SP : S.getSUnit()->Preds)
if (SP.getSUnit() == &SU)
S.getSUnit()->removePred(SP);
}

using SUToCandSGsPair = std::pair<SUnit *, SmallVector<int, 4>>;
using SUsToCandSGsVec = SmallVector<SUToCandSGsPair, 4>;

Expand Down Expand Up @@ -459,7 +442,6 @@ void PipelineSolver::makePipeline() {
// Command line requested IGroupLP doesn't have SGBarr
if (!SGBarr)
continue;
resetEdges(*SGBarr, DAG);
SG.link(*SGBarr, false);
}
}
Expand Down Expand Up @@ -2611,7 +2593,6 @@ void IGroupLPDAGMutation::apply(ScheduleDAGInstrs *DAGInstrs) {
initSchedGroupBarrierPipelineStage(R);
FoundSB = true;
} else if (Opc == AMDGPU::IGLP_OPT) {
resetEdges(*R, DAG);
if (!FoundSB && !FoundIGLP) {
FoundIGLP = true;
ShouldApplyIGLP = initIGLPOpt(*R);
Expand All @@ -2633,7 +2614,6 @@ void IGroupLPDAGMutation::addSchedBarrierEdges(SUnit &SchedBarrier) {
assert(MI.getOpcode() == AMDGPU::SCHED_BARRIER);
// Remove all existing edges from the SCHED_BARRIER that were added due to the
// instruction having side effects.
resetEdges(SchedBarrier, DAG);
LLVM_DEBUG(dbgs() << "Building SchedGroup for SchedBarrier with Mask: "
<< MI.getOperand(0).getImm() << "\n");
auto InvertedMask =
Expand Down Expand Up @@ -2691,7 +2671,6 @@ void IGroupLPDAGMutation::initSchedGroupBarrierPipelineStage(
std::vector<SUnit>::reverse_iterator RIter) {
// Remove all existing edges from the SCHED_GROUP_BARRIER that were added due
// to the instruction having side effects.
resetEdges(*RIter, DAG);
MachineInstr &SGB = *RIter->getInstr();
assert(SGB.getOpcode() == AMDGPU::SCHED_GROUP_BARRIER);
int32_t SGMask = SGB.getOperand(0).getImm();
Expand Down
12 changes: 8 additions & 4 deletions llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,12 @@ static void getRegisterPressures(
Pressure[AMDGPU::RegisterPressureSets::AGPR_32] = NewPressure.getAGPRNum();
}

// Return true if the instruction is mutually exclusive with all non-IGLP DAG
// mutations, requiring all other mutations to be disabled.
static bool isIGLPMutationOnly(unsigned Opcode) {
return Opcode == AMDGPU::SCHED_GROUP_BARRIER || Opcode == AMDGPU::IGLP_OPT;
}

void GCNSchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop,
const RegPressureTracker &RPTracker,
Expand Down Expand Up @@ -1152,8 +1158,7 @@ bool GCNSchedStage::initGCNRegion() {
StageID == GCNSchedStageID::ILPInitialSchedule) {
for (auto &I : DAG) {
Unsched.push_back(&I);
if (I.getOpcode() == AMDGPU::SCHED_GROUP_BARRIER ||
I.getOpcode() == AMDGPU::IGLP_OPT)
if (isIGLPMutationOnly(I.getOpcode()))
DAG.RegionsWithIGLPInstrs[RegionIdx] = true;
}
} else {
Expand Down Expand Up @@ -1894,8 +1899,7 @@ void GCNScheduleDAGMILive::updateRegionBoundaries(

static bool hasIGLPInstrs(ScheduleDAGInstrs *DAG) {
return any_of(*DAG, [](MachineBasicBlock::iterator MI) {
unsigned Opc = MI->getOpcode();
return Opc == AMDGPU::SCHED_GROUP_BARRIER || Opc == AMDGPU::IGLP_OPT;
return isIGLPMutationOnly(MI->getOpcode());
});
}

Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10051,3 +10051,10 @@ void SIInstrInfo::enforceOperandRCAlignment(MachineInstr &MI,
Op.setSubReg(AMDGPU::sub0);
MI.addOperand(MachineOperand::CreateReg(NewVR, false, true));
}

bool SIInstrInfo::isGlobalMemoryObject(const MachineInstr *MI) const {
if (isIGLP(*MI))
return false;

return TargetInstrInfo::isGlobalMemoryObject(MI);
}
9 changes: 9 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,8 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
bool areLoadsFromSameBasePtr(SDNode *Load0, SDNode *Load1, int64_t &Offset0,
int64_t &Offset1) const override;

bool isGlobalMemoryObject(const MachineInstr *MI) const override;

bool getMemOperandsWithOffsetWidth(
const MachineInstr &LdSt,
SmallVectorImpl<const MachineOperand *> &BaseOps, int64_t &Offset,
Expand Down Expand Up @@ -968,6 +970,13 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
return get(Opcode).TSFlags & SIInstrFlags::TiedSourceNotRead;
}

bool isIGLP(unsigned Opcode) const {
return Opcode == AMDGPU::SCHED_BARRIER ||
Opcode == AMDGPU::SCHED_GROUP_BARRIER || Opcode == AMDGPU::IGLP_OPT;
}

bool isIGLP(const MachineInstr &MI) const { return isIGLP(MI.getOpcode()); }

static unsigned getNonSoftWaitcntOpcode(unsigned Opcode) {
switch (Opcode) {
case AMDGPU::S_WAITCNT_soft:
Expand Down
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@
; GCN-NEXT: ; implicit-def: $vgpr79
; GCN-NEXT: ; implicit-def: $vgpr80
; GCN-NEXT: ; implicit-def: $vgpr91
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: s_waitcnt vmcnt(8)
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ; kill: killed $sgpr16_sgpr17_sgpr18_sgpr19
; GCN-NEXT: ; iglp_opt mask(0x00000002)
; GCN-NEXT: s_nop 1
Expand Down Expand Up @@ -477,6 +474,9 @@
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: buffer_inv sc0 sc1
; GCN-NEXT: v_mfma_f32_32x32x8_f16 v[32:47], v[4:5], v[8:9], v[32:47]
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: s_waitcnt vmcnt(8)
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: v_mov_b32_e32 v4, 0
; GCN-NEXT: v_mfma_f32_32x32x8_f16 v[32:47], v[6:7], v[0:1], v[32:47]
; GCN-NEXT: v_mfma_f32_32x32x8_f16 v[48:63], v[2:3], v[0:1], v[48:63]
Expand Down
35 changes: 35 additions & 0 deletions llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,41 @@ entry:
ret void
}

define amdgpu_kernel void @test_iglp_opt_asm_sideeffect(ptr addrspace(3) noalias %in, ptr addrspace(3) noalias %out) #0 {
; GCN-LABEL: test_iglp_opt_asm_sideeffect:
; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x24
; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GCN-NEXT: v_and_b32_e32 v0, 0xffc, v0
; GCN-NEXT: ; iglp_opt mask(0x00000000)
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_add_u32_e32 v1, s0, v0
; GCN-NEXT: ds_read_b32 v1, v1
; GCN-NEXT: v_add_u32_e32 v0, s1, v0
; GCN-NEXT: v_mov_b32_e32 v2, s0
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: ds_write_b32 v0, v1
; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: ds_read_b32 v0, v2 offset:256
; GCN-NEXT: v_mov_b32_e32 v1, s1
; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: ds_write_b32 v1, v0 offset:256
; GCN-NEXT: s_endpgm
entry:
%idx = call i32 @llvm.amdgcn.workitem.id.x()
%load.0.addr = getelementptr float, ptr addrspace(3) %in, i32 %idx
%load.0 = load float, ptr addrspace(3) %load.0.addr
%store.0.addr = getelementptr float, ptr addrspace(3) %out, i32 %idx
store float %load.0, ptr addrspace(3) %store.0.addr
call void asm sideeffect "", ""() #1
call void @llvm.amdgcn.iglp.opt(i32 0) #1
%load.1.addr = getelementptr float, ptr addrspace(3) %in, i32 64
%load.1 = load float, ptr addrspace(3) %load.1.addr
%store.1.addr = getelementptr float, ptr addrspace(3) %out, i32 64
store float %load.1, ptr addrspace(3) %store.1.addr
ret void
}

declare void @llvm.amdgcn.iglp.opt(i32) #1
declare i32 @llvm.amdgcn.workitem.id.x() #1
Expand Down
Loading
Loading