-
Notifications
You must be signed in to change notification settings - Fork 13.6k
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis #80003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis #80003
Conversation
@llvm/pr-subscribers-llvm-globalisel Author: Petar Avramovic (petar-avramovic) ChangesImplement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: #73337 Patch is 137.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80003.diff 21 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineRegisterInfo.h b/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
index 9bca74a1d4fc..ebc2b94ea465 100644
--- a/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
@@ -752,6 +752,17 @@ class MachineRegisterInfo {
Register createVirtualRegister(const TargetRegisterClass *RegClass,
StringRef Name = "");
+ /// All avilable attributes a virtual register can have.
+ struct RegisterAttributes {
+ RegClassOrRegBank RCOrRB;
+ LLT Ty;
+ };
+
+ /// createVirtualRegister - Create and return a new virtual register in the
+ /// function with the specified register attributes.
+ Register createVirtualRegister(RegisterAttributes RegAttr,
+ StringRef Name = "");
+
/// Create and return a new virtual register in the function with the same
/// attributes as the given register.
Register cloneVirtualRegister(Register VReg, StringRef Name = "");
diff --git a/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h b/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
index e6da099751e7..1039ac4e5189 100644
--- a/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
+++ b/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
@@ -32,6 +32,25 @@ MachineUniformityInfo computeMachineUniformityInfo(
MachineFunction &F, const MachineCycleInfo &cycleInfo,
const MachineDomTree &domTree, bool HasBranchDivergence);
+/// Legacy analysis pass which computes a \ref MachineUniformityInfo.
+class MachineUniformityAnalysisPass : public MachineFunctionPass {
+ MachineUniformityInfo UI;
+
+public:
+ static char ID;
+
+ MachineUniformityAnalysisPass();
+
+ MachineUniformityInfo &getUniformityInfo() { return UI; }
+ const MachineUniformityInfo &getUniformityInfo() const { return UI; }
+
+ bool runOnMachineFunction(MachineFunction &F) override;
+ void getAnalysisUsage(AnalysisUsage &AU) const override;
+ void print(raw_ostream &OS, const Module *M = nullptr) const override;
+
+ // TODO: verify analysis
+};
+
} // namespace llvm
#endif // LLVM_CODEGEN_MACHINEUNIFORMITYANALYSIS_H
diff --git a/llvm/lib/CodeGen/MachineRegisterInfo.cpp b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
index 087604af6a71..fb1043937ca5 100644
--- a/llvm/lib/CodeGen/MachineRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
@@ -167,6 +167,17 @@ MachineRegisterInfo::createVirtualRegister(const TargetRegisterClass *RegClass,
return Reg;
}
+/// createVirtualRegister - Create and return a new virtual register in the
+/// function with the specified register attributes.
+Register MachineRegisterInfo::createVirtualRegister(RegisterAttributes RegAttr,
+ StringRef Name) {
+ Register Reg = createIncompleteVirtualRegister(Name);
+ VRegInfo[Reg].first = RegAttr.RCOrRB;
+ setType(Reg, RegAttr.Ty);
+ noteNewVirtualRegister(Reg);
+ return Reg;
+}
+
Register MachineRegisterInfo::cloneVirtualRegister(Register VReg,
StringRef Name) {
Register Reg = createIncompleteVirtualRegister(Name);
diff --git a/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp b/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
index 3e0fe2b1ba08..131138e0649e 100644
--- a/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+++ b/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
@@ -165,25 +165,6 @@ MachineUniformityInfo llvm::computeMachineUniformityInfo(
namespace {
-/// Legacy analysis pass which computes a \ref MachineUniformityInfo.
-class MachineUniformityAnalysisPass : public MachineFunctionPass {
- MachineUniformityInfo UI;
-
-public:
- static char ID;
-
- MachineUniformityAnalysisPass();
-
- MachineUniformityInfo &getUniformityInfo() { return UI; }
- const MachineUniformityInfo &getUniformityInfo() const { return UI; }
-
- bool runOnMachineFunction(MachineFunction &F) override;
- void getAnalysisUsage(AnalysisUsage &AU) const override;
- void print(raw_ostream &OS, const Module *M = nullptr) const override;
-
- // TODO: verify analysis
-};
-
class MachineUniformityInfoPrinterPass : public MachineFunctionPass {
public:
static char ID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
index 4cd8b1ec1051..4f65a95de82a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
@@ -16,7 +16,11 @@
//===----------------------------------------------------------------------===//
#include "AMDGPU.h"
+#include "SILowerI1Copies.h"
+#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineUniformityAnalysis.h"
+#include "llvm/InitializePasses.h"
#define DEBUG_TYPE "amdgpu-global-isel-divergence-lowering"
@@ -42,14 +46,146 @@ class AMDGPUGlobalISelDivergenceLowering : public MachineFunctionPass {
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
+ AU.addRequired<MachineDominatorTree>();
+ AU.addRequired<MachinePostDominatorTree>();
+ AU.addRequired<MachineUniformityAnalysisPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
};
+class DivergenceLoweringHelper : public PhiLoweringHelper {
+public:
+ DivergenceLoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT,
+ MachineUniformityInfo *MUI);
+
+private:
+ MachineUniformityInfo *MUI = nullptr;
+ MachineIRBuilder B;
+ Register buildRegCopyToLaneMask(Register Reg);
+
+public:
+ void markAsLaneMask(Register DstReg) const override;
+ void getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const override;
+ void collectIncomingValuesFromPhi(
+ const MachineInstr *MI,
+ SmallVectorImpl<Incoming> &Incomings) const override;
+ void replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) override;
+ void buildMergeLaneMasks(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg,
+ Register CurReg) override;
+ void constrainAsLaneMask(Incoming &In) override;
+};
+
+DivergenceLoweringHelper::DivergenceLoweringHelper(
+ MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT, MachineUniformityInfo *MUI)
+ : PhiLoweringHelper(MF, DT, PDT), MUI(MUI), B(*MF) {}
+
+// _(s1) -> SReg_32/64(s1)
+void DivergenceLoweringHelper::markAsLaneMask(Register DstReg) const {
+ assert(MRI->getType(DstReg) == LLT::scalar(1));
+
+ if (MRI->getRegClassOrNull(DstReg)) {
+ if (MRI->constrainRegClass(DstReg, ST->getBoolRC()))
+ return;
+ llvm_unreachable("Failed to constrain register class");
+ }
+
+ MRI->setRegClass(DstReg, ST->getBoolRC());
+}
+
+void DivergenceLoweringHelper::getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const {
+ LLT S1 = LLT::scalar(1);
+
+ // Add divergent i1 phis to the list
+ for (MachineBasicBlock &MBB : *MF) {
+ for (MachineInstr &MI : MBB.phis()) {
+ Register Dst = MI.getOperand(0).getReg();
+ if (MRI->getType(Dst) == S1 && MUI->isDivergent(Dst))
+ Vreg1Phis.push_back(&MI);
+ }
+ }
+}
+
+void DivergenceLoweringHelper::collectIncomingValuesFromPhi(
+ const MachineInstr *MI, SmallVectorImpl<Incoming> &Incomings) const {
+ for (unsigned i = 1; i < MI->getNumOperands(); i += 2) {
+ Incomings.emplace_back(MI->getOperand(i).getReg(),
+ MI->getOperand(i + 1).getMBB(), Register());
+ }
+}
+
+void DivergenceLoweringHelper::replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) {
+ BuildMI(*MBB, MBB->getFirstNonPHI(), {}, TII->get(AMDGPU::COPY), OldReg)
+ .addReg(NewReg);
+}
+
+// Copy Reg to new lane mask register, insert a copy after instruction that
+// defines Reg while skipping phis if needed.
+Register DivergenceLoweringHelper::buildRegCopyToLaneMask(Register Reg) {
+ Register LaneMask = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ MachineInstr *Instr = MRI->getVRegDef(Reg);
+ MachineBasicBlock *MBB = Instr->getParent();
+ B.setInsertPt(*MBB, MBB->SkipPHIsAndLabels(std::next(Instr->getIterator())));
+ B.buildCopy(LaneMask, Reg);
+ return LaneMask;
+}
+
+// bb.previous
+// %PrevReg = ...
+//
+// bb.current
+// %CurReg = ...
+//
+// %DstReg - not defined
+//
+// -> (wave32 example, new registers have sreg_32 reg class and S1 LLT)
+//
+// bb.previous
+// %PrevReg = ...
+// %PrevRegCopy:sreg_32(s1) = COPY %PrevReg
+//
+// bb.current
+// %CurReg = ...
+// %CurRegCopy:sreg_32(s1) = COPY %CurReg
+// ...
+// %PrevMaskedReg:sreg_32(s1) = ANDN2 %PrevRegCopy, ExecReg - active lanes 0
+// %CurMaskedReg:sreg_32(s1) = AND %ExecReg, CurRegCopy - inactive lanes to 0
+// %DstReg:sreg_32(s1) = OR %PrevMaskedReg, CurMaskedReg
+//
+// DstReg = for active lanes rewrite bit in PrevReg with bit from CurReg
+void DivergenceLoweringHelper::buildMergeLaneMasks(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg, Register CurReg) {
+ // DstReg = (PrevReg & !EXEC) | (CurReg & EXEC)
+ // TODO: check if inputs are constants or results of a compare.
+
+ Register PrevRegCopy = buildRegCopyToLaneMask(PrevReg);
+ Register CurRegCopy = buildRegCopyToLaneMask(CurReg);
+ Register PrevMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ Register CurMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+
+ B.setInsertPt(MBB, I);
+ B.buildInstr(AndN2Op, {PrevMaskedReg}, {PrevRegCopy, ExecReg});
+ B.buildInstr(AndOp, {CurMaskedReg}, {ExecReg, CurRegCopy});
+ B.buildInstr(OrOp, {DstReg}, {PrevMaskedReg, CurMaskedReg});
+}
+
+void DivergenceLoweringHelper::constrainAsLaneMask(Incoming &In) { return; }
+
} // End anonymous namespace.
INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
"AMDGPU GlobalISel divergence lowering", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachineUniformityAnalysisPass)
INITIALIZE_PASS_END(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
"AMDGPU GlobalISel divergence lowering", false, false)
@@ -64,5 +200,12 @@ FunctionPass *llvm::createAMDGPUGlobalISelDivergenceLoweringPass() {
bool AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction(
MachineFunction &MF) {
- return false;
+ MachineDominatorTree &DT = getAnalysis<MachineDominatorTree>();
+ MachinePostDominatorTree &PDT = getAnalysis<MachinePostDominatorTree>();
+ MachineUniformityInfo &MUI =
+ getAnalysis<MachineUniformityAnalysisPass>().getUniformityInfo();
+
+ DivergenceLoweringHelper Helper(&MF, &DT, &PDT, &MUI);
+
+ return Helper.lowerPhis();
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index f255d098b631..565788027996 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -210,6 +210,7 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const {
bool AMDGPUInstructionSelector::selectPHI(MachineInstr &I) const {
const Register DefReg = I.getOperand(0).getReg();
const LLT DefTy = MRI->getType(DefReg);
+
if (DefTy == LLT::scalar(1)) {
if (!AllowRiskySelect) {
LLVM_DEBUG(dbgs() << "Skipping risky boolean phi\n");
@@ -3552,8 +3553,6 @@ bool AMDGPUInstructionSelector::selectStackRestore(MachineInstr &MI) const {
}
bool AMDGPUInstructionSelector::select(MachineInstr &I) {
- if (I.isPHI())
- return selectPHI(I);
if (!I.isPreISelOpcode()) {
if (I.isCopy())
@@ -3696,6 +3695,8 @@ bool AMDGPUInstructionSelector::select(MachineInstr &I) {
return selectWaveAddress(I);
case AMDGPU::G_STACKRESTORE:
return selectStackRestore(I);
+ case AMDGPU::G_PHI:
+ return selectPHI(I);
default:
return selectImpl(I, *CoverageInfo);
}
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
index cfa0c21def79..59843438950a 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
@@ -31,9 +31,9 @@
using namespace llvm;
-static Register insertUndefLaneMask(MachineBasicBlock *MBB,
- MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs);
+static Register
+insertUndefLaneMask(MachineBasicBlock *MBB, MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs);
namespace {
@@ -78,7 +78,7 @@ class Vreg1LoweringHelper : public PhiLoweringHelper {
MachineBasicBlock::iterator I, const DebugLoc &DL,
Register DstReg, Register PrevReg,
Register CurReg) override;
- void constrainIncomingRegisterTakenAsIs(Incoming &In) override;
+ void constrainAsLaneMask(Incoming &In) override;
bool lowerCopiesFromI1();
bool lowerCopiesToI1();
@@ -304,7 +304,8 @@ class LoopFinder {
/// blocks, so that the SSA updater doesn't have to search all the way to the
/// function entry.
void addLoopEntries(unsigned LoopLevel, MachineSSAUpdater &SSAUpdater,
- MachineRegisterInfo &MRI, Register LaneMaskRegAttrs,
+ MachineRegisterInfo &MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs,
ArrayRef<Incoming> Incomings = {}) {
assert(LoopLevel < CommonDominators.size());
@@ -411,14 +412,15 @@ FunctionPass *llvm::createSILowerI1CopiesPass() {
return new SILowerI1Copies();
}
-Register llvm::createLaneMaskReg(MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs) {
- return MRI->cloneVirtualRegister(LaneMaskRegAttrs);
+Register llvm::createLaneMaskReg(
+ MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs) {
+ return MRI->createVirtualRegister(LaneMaskRegAttrs);
}
-static Register insertUndefLaneMask(MachineBasicBlock *MBB,
- MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs) {
+static Register
+insertUndefLaneMask(MachineBasicBlock *MBB, MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs) {
MachineFunction &MF = *MBB->getParent();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();
@@ -619,7 +621,7 @@ bool PhiLoweringHelper::lowerPhis() {
for (auto &Incoming : Incomings) {
MachineBasicBlock &IMBB = *Incoming.Block;
if (PIA.isSource(IMBB)) {
- constrainIncomingRegisterTakenAsIs(Incoming);
+ constrainAsLaneMask(Incoming);
SSAUpdater.AddAvailableValue(&IMBB, Incoming.Reg);
} else {
Incoming.UpdatedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
@@ -911,6 +913,4 @@ void Vreg1LoweringHelper::buildMergeLaneMasks(MachineBasicBlock &MBB,
}
}
-void Vreg1LoweringHelper::constrainIncomingRegisterTakenAsIs(Incoming &In) {
- return;
-}
+void Vreg1LoweringHelper::constrainAsLaneMask(Incoming &In) {}
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.h b/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
index 5099d39c2d14..0485f76d39a6 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
@@ -31,7 +31,9 @@ struct Incoming {
: Reg(Reg), Block(Block), UpdatedReg(UpdatedReg) {}
};
-Register createLaneMaskReg(MachineRegisterInfo *MRI, Register LaneMaskRegAttrs);
+Register
+createLaneMaskReg(MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs);
class PhiLoweringHelper {
public:
@@ -47,7 +49,7 @@ class PhiLoweringHelper {
MachineRegisterInfo *MRI = nullptr;
const GCNSubtarget *ST = nullptr;
const SIInstrInfo *TII = nullptr;
- Register LaneMaskRegAttrs;
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs;
#ifndef NDEBUG
DenseSet<Register> PhiRegisters;
@@ -68,7 +70,8 @@ class PhiLoweringHelper {
getSaluInsertionAtEnd(MachineBasicBlock &MBB) const;
void initializeLaneMaskRegisterAttributes(Register LaneMask) {
- LaneMaskRegAttrs = LaneMask;
+ LaneMaskRegAttrs.RCOrRB = MRI->getRegClassOrRegBank(LaneMask);
+ LaneMaskRegAttrs.Ty = MRI->getType(LaneMask);
}
bool isLaneMaskReg(Register Reg) const {
@@ -91,7 +94,7 @@ class PhiLoweringHelper {
MachineBasicBlock::iterator I,
const DebugLoc &DL, Register DstReg,
Register PrevReg, Register CurReg) = 0;
- virtual void constrainIncomingRegisterTakenAsIs(Incoming &In) = 0;
+ virtual void constrainAsLaneMask(Incoming &In) = 0;
};
} // end namespace llvm
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
index 7a68aec1a1c5..06a8f80e6aa3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -global-isel -amdgpu-global-isel-risky-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -amdgpu-global-isel-risky-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX10 %s
+; REQUIRES: do-not-run-me
; Divergent phis that don't require lowering using lane mask merging
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
index d314ebe355f5..55f22b0bbb4d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
@@ -1,5 +1,9 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
-# RUN: llc -global-isel -mtriple=amdgcn-mesa-amdpal -mcpu=gfx1010 -run-pass=amdgpu-global-isel-divergence-lowering %s -o - | FileCheck -check-prefix=GFX10 %s
+# RUN: llc -global-isel -mtriple=amdgcn-mesa-amdpal -mcpu=gfx1010 -run-pass=amdgpu-global-isel-divergence-lowering -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX10 %s
+
+# Test is updated but copies between S1-register-with-reg-class and
+# register-with-reg-class-no-LLT fail machine verification
+# REQUIRES: do-not-run-me-with-machine-verifier
--- |
define void @divergent_i1_phi_uniform_branch() {ret void}
@@ -46,7 +50,7 @@ body: |
; GFX10-NEXT: bb.2:
; GFX10-NEXT: successors: %bb.4(0x80000000)
; GFX10-NEXT: {{ $}}
- ; GFX10-NEXT: [[PHI:%[0-9]+]]:_(s1) = G_PHI %14(s1), %bb.3, [[ICMP]](s1), %bb.0
+ ; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = G_PHI %14(s1), %bb.3, [[ICMP]](s1), %bb.0
; GFX10-NEXT: G_BR %bb.4
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: bb.3:
@@ -126,6 +130,7 @@ body: |
; GFX10-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr0
; GFX10-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
; GFX10-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+ ; GFX10-NEXT: [[COPY4:%[0-9]+]]:sreg_32(s1) = COPY [[ICMP]](s1)
; GFX10-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; GFX10-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
; GFX10-NEXT: G_BRCOND [[ICMP1]](s1), %bb.2
@@ -136,12 +141,17 @@ body: |
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
; GFX10-NEXT: [[ICMP...
[truncated]
|
@llvm/pr-subscribers-backend-amdgpu Author: Petar Avramovic (petar-avramovic) ChangesImplement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: #73337 Patch is 137.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80003.diff 21 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineRegisterInfo.h b/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
index 9bca74a1d4fc..ebc2b94ea465 100644
--- a/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineRegisterInfo.h
@@ -752,6 +752,17 @@ class MachineRegisterInfo {
Register createVirtualRegister(const TargetRegisterClass *RegClass,
StringRef Name = "");
+ /// All avilable attributes a virtual register can have.
+ struct RegisterAttributes {
+ RegClassOrRegBank RCOrRB;
+ LLT Ty;
+ };
+
+ /// createVirtualRegister - Create and return a new virtual register in the
+ /// function with the specified register attributes.
+ Register createVirtualRegister(RegisterAttributes RegAttr,
+ StringRef Name = "");
+
/// Create and return a new virtual register in the function with the same
/// attributes as the given register.
Register cloneVirtualRegister(Register VReg, StringRef Name = "");
diff --git a/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h b/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
index e6da099751e7..1039ac4e5189 100644
--- a/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
+++ b/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h
@@ -32,6 +32,25 @@ MachineUniformityInfo computeMachineUniformityInfo(
MachineFunction &F, const MachineCycleInfo &cycleInfo,
const MachineDomTree &domTree, bool HasBranchDivergence);
+/// Legacy analysis pass which computes a \ref MachineUniformityInfo.
+class MachineUniformityAnalysisPass : public MachineFunctionPass {
+ MachineUniformityInfo UI;
+
+public:
+ static char ID;
+
+ MachineUniformityAnalysisPass();
+
+ MachineUniformityInfo &getUniformityInfo() { return UI; }
+ const MachineUniformityInfo &getUniformityInfo() const { return UI; }
+
+ bool runOnMachineFunction(MachineFunction &F) override;
+ void getAnalysisUsage(AnalysisUsage &AU) const override;
+ void print(raw_ostream &OS, const Module *M = nullptr) const override;
+
+ // TODO: verify analysis
+};
+
} // namespace llvm
#endif // LLVM_CODEGEN_MACHINEUNIFORMITYANALYSIS_H
diff --git a/llvm/lib/CodeGen/MachineRegisterInfo.cpp b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
index 087604af6a71..fb1043937ca5 100644
--- a/llvm/lib/CodeGen/MachineRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
@@ -167,6 +167,17 @@ MachineRegisterInfo::createVirtualRegister(const TargetRegisterClass *RegClass,
return Reg;
}
+/// createVirtualRegister - Create and return a new virtual register in the
+/// function with the specified register attributes.
+Register MachineRegisterInfo::createVirtualRegister(RegisterAttributes RegAttr,
+ StringRef Name) {
+ Register Reg = createIncompleteVirtualRegister(Name);
+ VRegInfo[Reg].first = RegAttr.RCOrRB;
+ setType(Reg, RegAttr.Ty);
+ noteNewVirtualRegister(Reg);
+ return Reg;
+}
+
Register MachineRegisterInfo::cloneVirtualRegister(Register VReg,
StringRef Name) {
Register Reg = createIncompleteVirtualRegister(Name);
diff --git a/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp b/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
index 3e0fe2b1ba08..131138e0649e 100644
--- a/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
+++ b/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
@@ -165,25 +165,6 @@ MachineUniformityInfo llvm::computeMachineUniformityInfo(
namespace {
-/// Legacy analysis pass which computes a \ref MachineUniformityInfo.
-class MachineUniformityAnalysisPass : public MachineFunctionPass {
- MachineUniformityInfo UI;
-
-public:
- static char ID;
-
- MachineUniformityAnalysisPass();
-
- MachineUniformityInfo &getUniformityInfo() { return UI; }
- const MachineUniformityInfo &getUniformityInfo() const { return UI; }
-
- bool runOnMachineFunction(MachineFunction &F) override;
- void getAnalysisUsage(AnalysisUsage &AU) const override;
- void print(raw_ostream &OS, const Module *M = nullptr) const override;
-
- // TODO: verify analysis
-};
-
class MachineUniformityInfoPrinterPass : public MachineFunctionPass {
public:
static char ID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
index 4cd8b1ec1051..4f65a95de82a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
@@ -16,7 +16,11 @@
//===----------------------------------------------------------------------===//
#include "AMDGPU.h"
+#include "SILowerI1Copies.h"
+#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineUniformityAnalysis.h"
+#include "llvm/InitializePasses.h"
#define DEBUG_TYPE "amdgpu-global-isel-divergence-lowering"
@@ -42,14 +46,146 @@ class AMDGPUGlobalISelDivergenceLowering : public MachineFunctionPass {
void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
+ AU.addRequired<MachineDominatorTree>();
+ AU.addRequired<MachinePostDominatorTree>();
+ AU.addRequired<MachineUniformityAnalysisPass>();
MachineFunctionPass::getAnalysisUsage(AU);
}
};
+class DivergenceLoweringHelper : public PhiLoweringHelper {
+public:
+ DivergenceLoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT,
+ MachineUniformityInfo *MUI);
+
+private:
+ MachineUniformityInfo *MUI = nullptr;
+ MachineIRBuilder B;
+ Register buildRegCopyToLaneMask(Register Reg);
+
+public:
+ void markAsLaneMask(Register DstReg) const override;
+ void getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const override;
+ void collectIncomingValuesFromPhi(
+ const MachineInstr *MI,
+ SmallVectorImpl<Incoming> &Incomings) const override;
+ void replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) override;
+ void buildMergeLaneMasks(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg,
+ Register CurReg) override;
+ void constrainAsLaneMask(Incoming &In) override;
+};
+
+DivergenceLoweringHelper::DivergenceLoweringHelper(
+ MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT, MachineUniformityInfo *MUI)
+ : PhiLoweringHelper(MF, DT, PDT), MUI(MUI), B(*MF) {}
+
+// _(s1) -> SReg_32/64(s1)
+void DivergenceLoweringHelper::markAsLaneMask(Register DstReg) const {
+ assert(MRI->getType(DstReg) == LLT::scalar(1));
+
+ if (MRI->getRegClassOrNull(DstReg)) {
+ if (MRI->constrainRegClass(DstReg, ST->getBoolRC()))
+ return;
+ llvm_unreachable("Failed to constrain register class");
+ }
+
+ MRI->setRegClass(DstReg, ST->getBoolRC());
+}
+
+void DivergenceLoweringHelper::getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const {
+ LLT S1 = LLT::scalar(1);
+
+ // Add divergent i1 phis to the list
+ for (MachineBasicBlock &MBB : *MF) {
+ for (MachineInstr &MI : MBB.phis()) {
+ Register Dst = MI.getOperand(0).getReg();
+ if (MRI->getType(Dst) == S1 && MUI->isDivergent(Dst))
+ Vreg1Phis.push_back(&MI);
+ }
+ }
+}
+
+void DivergenceLoweringHelper::collectIncomingValuesFromPhi(
+ const MachineInstr *MI, SmallVectorImpl<Incoming> &Incomings) const {
+ for (unsigned i = 1; i < MI->getNumOperands(); i += 2) {
+ Incomings.emplace_back(MI->getOperand(i).getReg(),
+ MI->getOperand(i + 1).getMBB(), Register());
+ }
+}
+
+void DivergenceLoweringHelper::replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) {
+ BuildMI(*MBB, MBB->getFirstNonPHI(), {}, TII->get(AMDGPU::COPY), OldReg)
+ .addReg(NewReg);
+}
+
+// Copy Reg to new lane mask register, insert a copy after instruction that
+// defines Reg while skipping phis if needed.
+Register DivergenceLoweringHelper::buildRegCopyToLaneMask(Register Reg) {
+ Register LaneMask = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ MachineInstr *Instr = MRI->getVRegDef(Reg);
+ MachineBasicBlock *MBB = Instr->getParent();
+ B.setInsertPt(*MBB, MBB->SkipPHIsAndLabels(std::next(Instr->getIterator())));
+ B.buildCopy(LaneMask, Reg);
+ return LaneMask;
+}
+
+// bb.previous
+// %PrevReg = ...
+//
+// bb.current
+// %CurReg = ...
+//
+// %DstReg - not defined
+//
+// -> (wave32 example, new registers have sreg_32 reg class and S1 LLT)
+//
+// bb.previous
+// %PrevReg = ...
+// %PrevRegCopy:sreg_32(s1) = COPY %PrevReg
+//
+// bb.current
+// %CurReg = ...
+// %CurRegCopy:sreg_32(s1) = COPY %CurReg
+// ...
+// %PrevMaskedReg:sreg_32(s1) = ANDN2 %PrevRegCopy, ExecReg - active lanes 0
+// %CurMaskedReg:sreg_32(s1) = AND %ExecReg, CurRegCopy - inactive lanes to 0
+// %DstReg:sreg_32(s1) = OR %PrevMaskedReg, CurMaskedReg
+//
+// DstReg = for active lanes rewrite bit in PrevReg with bit from CurReg
+void DivergenceLoweringHelper::buildMergeLaneMasks(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg, Register CurReg) {
+ // DstReg = (PrevReg & !EXEC) | (CurReg & EXEC)
+ // TODO: check if inputs are constants or results of a compare.
+
+ Register PrevRegCopy = buildRegCopyToLaneMask(PrevReg);
+ Register CurRegCopy = buildRegCopyToLaneMask(CurReg);
+ Register PrevMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ Register CurMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+
+ B.setInsertPt(MBB, I);
+ B.buildInstr(AndN2Op, {PrevMaskedReg}, {PrevRegCopy, ExecReg});
+ B.buildInstr(AndOp, {CurMaskedReg}, {ExecReg, CurRegCopy});
+ B.buildInstr(OrOp, {DstReg}, {PrevMaskedReg, CurMaskedReg});
+}
+
+void DivergenceLoweringHelper::constrainAsLaneMask(Incoming &In) { return; }
+
} // End anonymous namespace.
INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
"AMDGPU GlobalISel divergence lowering", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachineUniformityAnalysisPass)
INITIALIZE_PASS_END(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
"AMDGPU GlobalISel divergence lowering", false, false)
@@ -64,5 +200,12 @@ FunctionPass *llvm::createAMDGPUGlobalISelDivergenceLoweringPass() {
bool AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction(
MachineFunction &MF) {
- return false;
+ MachineDominatorTree &DT = getAnalysis<MachineDominatorTree>();
+ MachinePostDominatorTree &PDT = getAnalysis<MachinePostDominatorTree>();
+ MachineUniformityInfo &MUI =
+ getAnalysis<MachineUniformityAnalysisPass>().getUniformityInfo();
+
+ DivergenceLoweringHelper Helper(&MF, &DT, &PDT, &MUI);
+
+ return Helper.lowerPhis();
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index f255d098b631..565788027996 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -210,6 +210,7 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const {
bool AMDGPUInstructionSelector::selectPHI(MachineInstr &I) const {
const Register DefReg = I.getOperand(0).getReg();
const LLT DefTy = MRI->getType(DefReg);
+
if (DefTy == LLT::scalar(1)) {
if (!AllowRiskySelect) {
LLVM_DEBUG(dbgs() << "Skipping risky boolean phi\n");
@@ -3552,8 +3553,6 @@ bool AMDGPUInstructionSelector::selectStackRestore(MachineInstr &MI) const {
}
bool AMDGPUInstructionSelector::select(MachineInstr &I) {
- if (I.isPHI())
- return selectPHI(I);
if (!I.isPreISelOpcode()) {
if (I.isCopy())
@@ -3696,6 +3695,8 @@ bool AMDGPUInstructionSelector::select(MachineInstr &I) {
return selectWaveAddress(I);
case AMDGPU::G_STACKRESTORE:
return selectStackRestore(I);
+ case AMDGPU::G_PHI:
+ return selectPHI(I);
default:
return selectImpl(I, *CoverageInfo);
}
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
index cfa0c21def79..59843438950a 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
@@ -31,9 +31,9 @@
using namespace llvm;
-static Register insertUndefLaneMask(MachineBasicBlock *MBB,
- MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs);
+static Register
+insertUndefLaneMask(MachineBasicBlock *MBB, MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs);
namespace {
@@ -78,7 +78,7 @@ class Vreg1LoweringHelper : public PhiLoweringHelper {
MachineBasicBlock::iterator I, const DebugLoc &DL,
Register DstReg, Register PrevReg,
Register CurReg) override;
- void constrainIncomingRegisterTakenAsIs(Incoming &In) override;
+ void constrainAsLaneMask(Incoming &In) override;
bool lowerCopiesFromI1();
bool lowerCopiesToI1();
@@ -304,7 +304,8 @@ class LoopFinder {
/// blocks, so that the SSA updater doesn't have to search all the way to the
/// function entry.
void addLoopEntries(unsigned LoopLevel, MachineSSAUpdater &SSAUpdater,
- MachineRegisterInfo &MRI, Register LaneMaskRegAttrs,
+ MachineRegisterInfo &MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs,
ArrayRef<Incoming> Incomings = {}) {
assert(LoopLevel < CommonDominators.size());
@@ -411,14 +412,15 @@ FunctionPass *llvm::createSILowerI1CopiesPass() {
return new SILowerI1Copies();
}
-Register llvm::createLaneMaskReg(MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs) {
- return MRI->cloneVirtualRegister(LaneMaskRegAttrs);
+Register llvm::createLaneMaskReg(
+ MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs) {
+ return MRI->createVirtualRegister(LaneMaskRegAttrs);
}
-static Register insertUndefLaneMask(MachineBasicBlock *MBB,
- MachineRegisterInfo *MRI,
- Register LaneMaskRegAttrs) {
+static Register
+insertUndefLaneMask(MachineBasicBlock *MBB, MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs) {
MachineFunction &MF = *MBB->getParent();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();
@@ -619,7 +621,7 @@ bool PhiLoweringHelper::lowerPhis() {
for (auto &Incoming : Incomings) {
MachineBasicBlock &IMBB = *Incoming.Block;
if (PIA.isSource(IMBB)) {
- constrainIncomingRegisterTakenAsIs(Incoming);
+ constrainAsLaneMask(Incoming);
SSAUpdater.AddAvailableValue(&IMBB, Incoming.Reg);
} else {
Incoming.UpdatedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
@@ -911,6 +913,4 @@ void Vreg1LoweringHelper::buildMergeLaneMasks(MachineBasicBlock &MBB,
}
}
-void Vreg1LoweringHelper::constrainIncomingRegisterTakenAsIs(Incoming &In) {
- return;
-}
+void Vreg1LoweringHelper::constrainAsLaneMask(Incoming &In) {}
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.h b/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
index 5099d39c2d14..0485f76d39a6 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.h
@@ -31,7 +31,9 @@ struct Incoming {
: Reg(Reg), Block(Block), UpdatedReg(UpdatedReg) {}
};
-Register createLaneMaskReg(MachineRegisterInfo *MRI, Register LaneMaskRegAttrs);
+Register
+createLaneMaskReg(MachineRegisterInfo *MRI,
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs);
class PhiLoweringHelper {
public:
@@ -47,7 +49,7 @@ class PhiLoweringHelper {
MachineRegisterInfo *MRI = nullptr;
const GCNSubtarget *ST = nullptr;
const SIInstrInfo *TII = nullptr;
- Register LaneMaskRegAttrs;
+ MachineRegisterInfo::RegisterAttributes LaneMaskRegAttrs;
#ifndef NDEBUG
DenseSet<Register> PhiRegisters;
@@ -68,7 +70,8 @@ class PhiLoweringHelper {
getSaluInsertionAtEnd(MachineBasicBlock &MBB) const;
void initializeLaneMaskRegisterAttributes(Register LaneMask) {
- LaneMaskRegAttrs = LaneMask;
+ LaneMaskRegAttrs.RCOrRB = MRI->getRegClassOrRegBank(LaneMask);
+ LaneMaskRegAttrs.Ty = MRI->getType(LaneMask);
}
bool isLaneMaskReg(Register Reg) const {
@@ -91,7 +94,7 @@ class PhiLoweringHelper {
MachineBasicBlock::iterator I,
const DebugLoc &DL, Register DstReg,
Register PrevReg, Register CurReg) = 0;
- virtual void constrainIncomingRegisterTakenAsIs(Incoming &In) = 0;
+ virtual void constrainAsLaneMask(Incoming &In) = 0;
};
} // end namespace llvm
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
index 7a68aec1a1c5..06a8f80e6aa3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
@@ -1,5 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -global-isel -amdgpu-global-isel-risky-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
+; RUN: llc -global-isel -amdgpu-global-isel-risky-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefix=GFX10 %s
+; REQUIRES: do-not-run-me
; Divergent phis that don't require lowering using lane mask merging
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
index d314ebe355f5..55f22b0bbb4d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
@@ -1,5 +1,9 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
-# RUN: llc -global-isel -mtriple=amdgcn-mesa-amdpal -mcpu=gfx1010 -run-pass=amdgpu-global-isel-divergence-lowering %s -o - | FileCheck -check-prefix=GFX10 %s
+# RUN: llc -global-isel -mtriple=amdgcn-mesa-amdpal -mcpu=gfx1010 -run-pass=amdgpu-global-isel-divergence-lowering -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX10 %s
+
+# Test is updated but copies between S1-register-with-reg-class and
+# register-with-reg-class-no-LLT fail machine verification
+# REQUIRES: do-not-run-me-with-machine-verifier
--- |
define void @divergent_i1_phi_uniform_branch() {ret void}
@@ -46,7 +50,7 @@ body: |
; GFX10-NEXT: bb.2:
; GFX10-NEXT: successors: %bb.4(0x80000000)
; GFX10-NEXT: {{ $}}
- ; GFX10-NEXT: [[PHI:%[0-9]+]]:_(s1) = G_PHI %14(s1), %bb.3, [[ICMP]](s1), %bb.0
+ ; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = G_PHI %14(s1), %bb.3, [[ICMP]](s1), %bb.0
; GFX10-NEXT: G_BR %bb.4
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: bb.3:
@@ -126,6 +130,7 @@ body: |
; GFX10-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $sgpr0
; GFX10-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 6
; GFX10-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
+ ; GFX10-NEXT: [[COPY4:%[0-9]+]]:sreg_32(s1) = COPY [[ICMP]](s1)
; GFX10-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; GFX10-NEXT: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[COPY3]](s32), [[C1]]
; GFX10-NEXT: G_BRCOND [[ICMP1]](s1), %bb.2
@@ -136,12 +141,17 @@ body: |
; GFX10-NEXT: {{ $}}
; GFX10-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
; GFX10-NEXT: [[ICMP...
[truncated]
|
Third time's the charm. Fix for #78482. |
@@ -752,6 +752,17 @@ class MachineRegisterInfo { | |||
Register createVirtualRegister(const TargetRegisterClass *RegClass, | |||
StringRef Name = ""); | |||
|
|||
/// All avilable attributes a virtual register can have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: available
LLT Ty; | ||
}; | ||
|
||
/// createVirtualRegister - Create and return a new virtual register in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to repeat function name in docs
@@ -167,6 +167,17 @@ MachineRegisterInfo::createVirtualRegister(const TargetRegisterClass *RegClass, | |||
return Reg; | |||
} | |||
|
|||
/// createVirtualRegister - Create and return a new virtual register in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
@@ -752,6 +752,17 @@ class MachineRegisterInfo { | |||
Register createVirtualRegister(const TargetRegisterClass *RegClass, | |||
StringRef Name = ""); | |||
|
|||
/// All avilable attributes a virtual register can have. | |||
struct RegisterAttributes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this is needed. It seems like it's just an additional helper to create a new register with a type + RC/RB all in one right?
If so I think it should be contained to SILowerI1Copies
, a new MachineRegisterInfo helper is not really needed, or at least not one with a new type just for it (+ the name "Attribute" is confusing in this context). I'd just pass two arguments to the function directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this is needed. It seems like it's just an additional helper to create a new register with a type + RC/RB all in one right?
Yes.
"Attribute" was used to describe same thing for constrainRegAttrs in MachineRegisterInfo.h.
RegisterAttributes was defined here because it is also needed in MachineSSAUpdater. The struct was used because it is better to pass around one struct then two arguments. And since struct is already available it was used in createVirtualRegister for LLT + RC/RB.
Bigger picture is that it is a nice abstraction that hides need to explicitly deal with LLT in SDAG path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to VRegAttrs and added getVRegAttrs to fully hide LLT/RegClass from MachineSSAUpdater
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
6989a06
to
a47c5be
Compare
…0003) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: llvm#75340 patch 2: llvm#75349 patch 3: llvm#80003 patch 4: llvm#78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.
Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: #75340 patch 2: #75349 patch 3: #80003 patch 4: #78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes.
patch 3 from: #73337