Skip to content

[MachinePipeliner] Improve loop carried dependence analysis #94185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion llvm/include/llvm/CodeGen/MachinePipeliner.h
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,9 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {

const SwingSchedulerDDG *getDDG() const { return DDG.get(); }

bool mayOverlapInLaterIter(const MachineInstr *BaseMI,
const MachineInstr *OtherMI) const;

private:
void addLoopCarriedDependences(AAResults *AA);
void updatePhiDependences();
Expand All @@ -409,7 +412,7 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
void computeNodeOrder(NodeSetType &NodeSets);
void checkValidNodeOrder(const NodeSetType &Circuits) const;
bool schedulePipeline(SMSchedule &Schedule);
bool computeDelta(MachineInstr &MI, unsigned &Delta) const;
bool computeDelta(const MachineInstr &MI, int &Delta) const;
MachineInstr *findDefInLoop(Register Reg);
bool canUseLastOffsetValue(MachineInstr *MI, unsigned &BasePos,
unsigned &OffsetPos, unsigned &NewBase,
Expand Down
266 changes: 192 additions & 74 deletions llvm/lib/CodeGen/MachinePipeliner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2523,9 +2523,104 @@ bool SwingSchedulerDAG::schedulePipeline(SMSchedule &Schedule) {
return scheduleFound && Schedule.getMaxStageCount() > 0;
}

static Register findUniqueOperandDefinedInLoop(const MachineInstr &MI) {
const MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
Register Result;
for (const MachineOperand &Use : MI.all_uses()) {
Register Reg = Use.getReg();
if (!Reg.isVirtual())
return Register();
if (MRI.getVRegDef(Reg)->getParent() != MI.getParent())
continue;
if (Result)
return Register();
Result = Reg;
}
return Result;
}

/// When Op is a value that is incremented recursively in a loop and there is a
/// unique instruction that increments it, returns true and sets Value.
static bool findLoopIncrementValue(const MachineOperand &Op, int &Value) {
if (!Op.isReg() || !Op.getReg().isVirtual())
return false;

Register OrgReg = Op.getReg();
Register CurReg = OrgReg;
const MachineBasicBlock *LoopBB = Op.getParent()->getParent();
const MachineRegisterInfo &MRI = LoopBB->getParent()->getRegInfo();

const TargetInstrInfo *TII =
LoopBB->getParent()->getSubtarget().getInstrInfo();
const TargetRegisterInfo *TRI =
LoopBB->getParent()->getSubtarget().getRegisterInfo();

MachineInstr *Phi = nullptr;
MachineInstr *Increment = nullptr;

// Traverse definitions until it reaches Op or an instruction that does not
// satisfy the condition.
// Acceptable example:
// bb.0:
// %0 = PHI %3, %bb.0, ...
// %2 = ADD %0, Value
// ... = LOAD %2(Op)
// %3 = COPY %2
while (true) {
if (!CurReg.isValid() || !CurReg.isVirtual())
return false;
MachineInstr *Def = MRI.getVRegDef(CurReg);
if (Def->getParent() != LoopBB)
return false;

if (Def->isCopy()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does REG_SEQUENCE also need to be supported to arrive at the definition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice explanation @ytmukai!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @quic-santdas!
I checked in llvm-test-suite for Hexagon with the following code to see if REG_SEQUENCE appears, but could not find it.

    } else if (Def->isRegSequence()) {
      LLVM_DEBUG({
        dbgs() << "RegSequence Found\n";
      });

Could you tell me if there is a typical pattern in which REG_SEQUENCE appears in the calculation of the inductive variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned REG_SEQUENCE since you are handling COPY instructions. I thought this might need handling too since it is used to combine subregisters. There is no instance/example in the test suite though.

The code looks good from my side. If any corner case gets missed, that can be added later I guess.

// Ignore copy instructions unless they contain subregisters
if (Def->getOperand(0).getSubReg() || Def->getOperand(1).getSubReg())
return false;
CurReg = Def->getOperand(1).getReg();
} else if (Def->isPHI()) {
// There must be just one Phi
if (Phi)
return false;
Phi = Def;
CurReg = getLoopPhiReg(*Def, LoopBB);
} else if (TII->getIncrementValue(*Def, Value)) {
// Potentially a unique increment
if (Increment)
// Multiple increments exist
return false;

const MachineOperand *BaseOp;
int64_t Offset;
bool OffsetIsScalable;
if (TII->getMemOperandWithOffset(*Def, BaseOp, Offset, OffsetIsScalable,
TRI)) {
// Pre/post increment instruction
CurReg = BaseOp->getReg();
} else {
// If only one of the operands is defined within the loop, it is assumed
// to be an incremented value.
CurReg = findUniqueOperandDefinedInLoop(*Def);
if (!CurReg.isValid())
return false;
}
Increment = Def;
} else {
return false;
}
if (CurReg == OrgReg)
break;
}

if (!Phi || !Increment)
return false;

return true;
}

/// Return true if we can compute the amount the instruction changes
/// during each iteration. Set Delta to the amount of the change.
bool SwingSchedulerDAG::computeDelta(MachineInstr &MI, unsigned &Delta) const {
bool SwingSchedulerDAG::computeDelta(const MachineInstr &MI, int &Delta) const {
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
const MachineOperand *BaseOp;
int64_t Offset;
Expand All @@ -2540,24 +2635,7 @@ bool SwingSchedulerDAG::computeDelta(MachineInstr &MI, unsigned &Delta) const {
if (!BaseOp->isReg())
return false;

Register BaseReg = BaseOp->getReg();

MachineRegisterInfo &MRI = MF.getRegInfo();
// Check if there is a Phi. If so, get the definition in the loop.
MachineInstr *BaseDef = MRI.getVRegDef(BaseReg);
if (BaseDef && BaseDef->isPHI()) {
BaseReg = getLoopPhiReg(*BaseDef, MI.getParent());
BaseDef = MRI.getVRegDef(BaseReg);
}
if (!BaseDef)
return false;

int D = 0;
if (!TII->getIncrementValue(*BaseDef, D) && D >= 0)
return false;

Delta = D;
return true;
return findLoopIncrementValue(*BaseOp, Delta);
}

/// Check if we can change the instruction to use an offset value from the
Expand Down Expand Up @@ -2675,6 +2753,100 @@ MachineInstr *SwingSchedulerDAG::findDefInLoop(Register Reg) {
return Def;
}

/// Return false if there is no overlap between the region accessed by BaseMI in
/// an iteration and the region accessed by OtherMI in subsequent iterations.
bool SwingSchedulerDAG::mayOverlapInLaterIter(
const MachineInstr *BaseMI, const MachineInstr *OtherMI) const {
int DeltaB, DeltaO, Delta;
if (!computeDelta(*BaseMI, DeltaB) || !computeDelta(*OtherMI, DeltaO) ||
DeltaB != DeltaO)
return true;
Delta = DeltaB;

const MachineOperand *BaseOpB, *BaseOpO;
int64_t OffsetB, OffsetO;
bool OffsetBIsScalable, OffsetOIsScalable;
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
if (!TII->getMemOperandWithOffset(*BaseMI, BaseOpB, OffsetB,
OffsetBIsScalable, TRI) ||
!TII->getMemOperandWithOffset(*OtherMI, BaseOpO, OffsetO,
OffsetOIsScalable, TRI))
return true;

if (OffsetBIsScalable || OffsetOIsScalable)
return true;

if (!BaseOpB->isIdenticalTo(*BaseOpO)) {
// Pass cases with different base operands but same initial values.
// Typically for when pre/post increment is used.

if (!BaseOpB->isReg() || !BaseOpO->isReg())
return true;
Register RegB = BaseOpB->getReg(), RegO = BaseOpO->getReg();
if (!RegB.isVirtual() || !RegO.isVirtual())
return true;

MachineInstr *DefB = MRI.getVRegDef(BaseOpB->getReg());
MachineInstr *DefO = MRI.getVRegDef(BaseOpO->getReg());
if (!DefB || !DefO || !DefB->isPHI() || !DefO->isPHI())
return true;

unsigned InitValB = 0;
unsigned LoopValB = 0;
unsigned InitValO = 0;
unsigned LoopValO = 0;
getPhiRegs(*DefB, BB, InitValB, LoopValB);
getPhiRegs(*DefO, BB, InitValO, LoopValO);
MachineInstr *InitDefB = MRI.getVRegDef(InitValB);
MachineInstr *InitDefO = MRI.getVRegDef(InitValO);

if (!InitDefB->isIdenticalTo(*InitDefO))
return true;
}

LocationSize AccessSizeB = (*BaseMI->memoperands_begin())->getSize();
LocationSize AccessSizeO = (*OtherMI->memoperands_begin())->getSize();

// This is the main test, which checks the offset values and the loop
// increment value to determine if the accesses may be loop carried.
if (!AccessSizeB.hasValue() || !AccessSizeO.hasValue())
return true;

LLVM_DEBUG({
dbgs() << "Overlap check:\n";
dbgs() << " BaseMI: ";
BaseMI->dump();
dbgs() << " Base + " << OffsetB << " + I * " << Delta
<< ", Len: " << AccessSizeB.getValue() << "\n";
dbgs() << " OtherMI: ";
OtherMI->dump();
dbgs() << " Base + " << OffsetO << " + I * " << Delta
<< ", Len: " << AccessSizeO.getValue() << "\n";
});

// Excessive overlap may be detected in strided patterns.
// For example, the memory addresses of the store and the load in
// for (i=0; i<n; i+=2) a[i+1] = a[i];
// are assumed to overlap.
if (Delta < 0) {
int64_t BaseMinAddr = OffsetB;
int64_t OhterNextIterMaxAddr = OffsetO + Delta + AccessSizeO.getValue() - 1;
if (BaseMinAddr > OhterNextIterMaxAddr) {
LLVM_DEBUG(dbgs() << " Result: No overlap\n");
return false;
}
} else {
int64_t BaseMaxAddr = OffsetB + AccessSizeB.getValue() - 1;
int64_t OtherNextIterMinAddr = OffsetO + Delta;
if (BaseMaxAddr < OtherNextIterMinAddr) {
LLVM_DEBUG(dbgs() << " Result: No overlap\n");
return false;
}
}
LLVM_DEBUG(dbgs() << " Result: Overlap\n");
return true;
}

/// Return true for an order or output dependence that is loop carried
/// potentially. A dependence is loop carried if the destination defines a value
/// that may be used or defined by the source in a subsequent iteration.
Expand Down Expand Up @@ -2706,61 +2878,7 @@ bool SwingSchedulerDAG::isLoopCarriedDep(
// The conservative assumption is that a dependence between memory operations
// may be loop carried. The following code checks when it can be proved that
// there is no loop carried dependence.
unsigned DeltaS, DeltaD;
if (!computeDelta(*SI, DeltaS) || !computeDelta(*DI, DeltaD))
return true;

const MachineOperand *BaseOpS, *BaseOpD;
int64_t OffsetS, OffsetD;
bool OffsetSIsScalable, OffsetDIsScalable;
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
if (!TII->getMemOperandWithOffset(*SI, BaseOpS, OffsetS, OffsetSIsScalable,
TRI) ||
!TII->getMemOperandWithOffset(*DI, BaseOpD, OffsetD, OffsetDIsScalable,
TRI))
return true;

assert(!OffsetSIsScalable && !OffsetDIsScalable &&
"Expected offsets to be byte offsets");

MachineInstr *DefS = MRI.getVRegDef(BaseOpS->getReg());
MachineInstr *DefD = MRI.getVRegDef(BaseOpD->getReg());
if (!DefS || !DefD || !DefS->isPHI() || !DefD->isPHI())
return true;

unsigned InitValS = 0;
unsigned LoopValS = 0;
unsigned InitValD = 0;
unsigned LoopValD = 0;
getPhiRegs(*DefS, BB, InitValS, LoopValS);
getPhiRegs(*DefD, BB, InitValD, LoopValD);
MachineInstr *InitDefS = MRI.getVRegDef(InitValS);
MachineInstr *InitDefD = MRI.getVRegDef(InitValD);

if (!InitDefS->isIdenticalTo(*InitDefD))
return true;

// Check that the base register is incremented by a constant value for each
// iteration.
MachineInstr *LoopDefS = MRI.getVRegDef(LoopValS);
int D = 0;
if (!LoopDefS || !TII->getIncrementValue(*LoopDefS, D))
return true;

LocationSize AccessSizeS = (*SI->memoperands_begin())->getSize();
LocationSize AccessSizeD = (*DI->memoperands_begin())->getSize();

// This is the main test, which checks the offset values and the loop
// increment value to determine if the accesses may be loop carried.
if (!AccessSizeS.hasValue() || !AccessSizeD.hasValue())
return true;

if (DeltaS != DeltaD || DeltaS < AccessSizeS.getValue() ||
DeltaD < AccessSizeD.getValue())
return true;

return (OffsetS + (int64_t)AccessSizeS.getValue() <
OffsetD + (int64_t)AccessSizeD.getValue());
return mayOverlapInLaterIter(DI, SI);
}

void SwingSchedulerDAG::postProcessDAG() {
Expand Down
7 changes: 7 additions & 0 deletions llvm/test/CodeGen/Hexagon/swp-carried-dep1.mir
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@

# Test that the loop carried dependence check correctly identifies a recurrence.

# CHECK: Overlap check:
# CHECK-NEXT: BaseMI: S2_storerh_io %{{[0-9]+}}:intregs, 0, %{{[0-9]+}}:intregs :: (store (s16) into %ir.lsr.iv24)
# CHECK-NEXT: Base + 0 + I * 4, Len: 2
# CHECK-NEXT: OtherMI: %{{[0-9]+}}:intregs = L2_loadrh_io %{{[0-9]+}}:intregs, -8 :: (load (s16) from %ir.cgep10)
# CHECK-NEXT: Base + -8 + I * 4, Len: 2
# CHECK-NEXT: Result: Overlap

# CHECK: Rec NodeSet
# CHECK: Rec NodeSet
# CHECK: Rec NodeSet
Expand Down
24 changes: 18 additions & 6 deletions llvm/test/CodeGen/Hexagon/swp-carried-dep2.mir
Original file line number Diff line number Diff line change
@@ -1,14 +1,26 @@
# RUN: llc -mtriple=hexagon -run-pass pipeliner -debug-only=pipeliner %s -o /dev/null 2>&1 -pipeliner-experimental-cg=true | FileCheck %s
# REQUIRES: asserts

# Test that the loop carried dependence check correctly identifies a recurrence
# Test that the loop carried dependence check correctly identifies dependences
# when the loop variable decreases and the array index offset is negative.

# CHECK: Rec NodeSet
# CHECK: Rec NodeSet
# CHECK: SU(3)
# CHECK: SU(4)
# CHECK: SU(5)
# No dependence from the store to the load.
# CHECK: Overlap check:
# CHECK-NEXT: BaseMI: S2_storeri_io %{{[0-9]+}}:intregs, 0, %{{[0-9]+}}:intregs :: (store (s32) into %ir.lsr.iv1)
# CHECK-NEXT: Base + 0 + I * -4, Len: 4
# CHECK-NEXT: OtherMI: %{{[0-9]+}}:intregs = L2_loadri_io %{{[0-9]+}}:intregs, -8 :: (load (s32) from %ir.cgep)
# CHECK-NEXT: Base + -8 + I * -4, Len: 4
# CHECK-NEXT: Result: No overlap

# TODO: There is a loop carried dependence from the load to the store but it
# is not recognised. addLoopCarriedDependences() should be modified to
# recognise the dependence and enable the following checks.
# CHECK-AFTER-FIX: Overlap check:
# CHECK-AFTER-FIX-NEXT: BaseMI: %{{[0-9]+}}:intregs = L2_loadri_io %{{[0-9]+}}:intregs, -8 :: (load (s32) from %ir.cgep)
# CHECK-AFTER-FIX-NEXT: Base + -8 + I * -4, Len: 4
# CHECK-AFTER-FIX-NEXT: OtherMI: S2_storeri_io %{{[0-9]+}}:intregs, 0, %{{[0-9]+}}:intregs :: (store (s32) into %ir.lsr.iv1)
# CHECK-AFTER-FIX-NEXT: Base + 0 + I * -4, Len: 4
# CHECK-AFTER-FIX-NEXT: Result: Overlap!

--- |

Expand Down
7 changes: 7 additions & 0 deletions llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
# requires to use a single CHECK-NOT to match such a Rec NodeSet. Fortunately
# the atom '.' does not match a newline but anything else on a line.

# CHECK: Overlap check:
# CHECK-NEXT: BaseMI: %13:intregs = S2_storerh_pi %12:intregs(tied-def 0), 2, %20:intregs :: (store (s16))
# CHECK-NEXT: Base + 0 + I * 2, Len: 2
# CHECK-NEXT: OtherMI: %19:intregs, %15:intregs = L2_loadrh_pi %14:intregs(tied-def 1), 2 :: (load (s16))
# CHECK-NEXT: Base + 0 + I * 2, Len: 2
# CHECK-NEXT: Result: No overlap

# CHECK-NOT: Rec NodeSet{{.+[[:space:]]}} SU(5){{.+[[:space:]]}} SU(7)

...
Expand Down
Loading
Loading