Experimental patch for understanding the pre-RA MachineScheduler #136483

JonPsson1 · 2025-04-20T11:52:56Z

When I realized that the GenericScheduler caused a lot more spilling to happen in cactus along with a performance penalty (5%), I started to try to understand what was going wrong. After many experiments, this is now my method of how to analyze a test case in regards to the pre-ra scheduler and spilling, which I would like to share (not meant to be merged):

With this patch applied, build with '-mllvm -sched-print-pressures -mllvm -regalloc-stats' and the output (saved to a log) will have a lot of lines like

...
pp_sys.bcOOOOOOOPerl_pp_gmtimeOOOOOOOMBB67Region0       _GPR-3_FP2_GPRMax0_FPMax0_Size11_Stores1_Loads1_FPDefs1_OtherDefs3_FPTotLat4_OtherLat9
pp_sys.bcOOOOOOOPerl_pp_gmtimeOOOOOOOMBB95Region0       _GPR0_FP0_GPRMax0_FPMax0_Size4_Stores0_Loads0_FPDefs0_OtherDefs3_FPTotLat0_OtherLat9
REGALLOCSTAT: pp_sys.bcOOOOOOOPerl_pp_gmtime    _NumInstrs581_NumVirtRegs426_NumFPSpilled0_NumOtherSpilled1
...

The regalloc stats are per function, while the others are per scheduling region. The strings with "OOOO..."-delimiters are unique IDs for scheduling regions or functions that a script can parse along with the important numbers to the right.

So let's say that the interesting file is ML_BSSN_Advect (cactus) as it has more spilling with the pre-ra scheduler than without, and it would be good to see in which regions FP spilling increases:

grep ML_BSSN_Advect.bc BUILD.log | grep "_NumFPSpilled[1-9]" 
REGALLOCSTAT: ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd     _NumInstrs9857_NumVirtRegs10318_NumFPSpilled1848_NumOtherSpilled322

Compiling this one file once more per above with -mllvm -misched-inputorder added (same output as -enable-misched=false but with these dumps) shows some but much less FP spilling:
REGALLOCSTAT: ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd _NumInstrs9857_NumVirtRegs10318_NumFPSpilled483_NumOtherSpilled491

So this one function in the file spills 1848 FP vregs with the scheduler, but only 483 without it. To show the regions where the scheduler has caused more FP regs overlap than what was before scheduling:

grep ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd 2017_main_sched_print_pressures/BUILD.log | grep -v "_FP[0-]"

ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB184Region0       _GPR-77_FP66_GPRMax0_FPMax0_Size56_Stores0_Loads28_FPDefs28_OtherDefs27_FPTotLat112_OtherLat27
ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB186Region0       _GPR21113_FP81234_GPRMax0_FPMax-16_Size2512_Stores0_Loads787_FPDefs2512_OtherDefs0_FPTotLat13498_OtherLat0
ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB187Region0       _GPR15664_FP67158_GPRMax0_FPMax-8_Size2056_Stores0_Loads631_FPDefs2056_OtherDefs0_FPTotLat11074_OtherLat0
ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB188Region0       _GPR10214_FP43459_GPRMax0_FPMax-4_Size1603_Stores0_Loads478_FPDefs1603_OtherDefs0_FPTotLat8662_OtherLat0
ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB189Region0       _GPR-480_FP16762_GPRMax0_FPMax-3_Size1150_Stores0_Loads325_FPDefs1150_OtherDefs0_FPTotLat6250_OtherLat0

The (positive) FP values here (66, 81234, ...) are the differences between the sums of overlapping FP vregs before and after scheduling. This sum is basically counted as the number of live FP regs at each MI in the region, summed together across all MIs in the region.

It is now time to try to reduce a region into something smaller in order to try to understand what is going wrong. The following will analyze the two logs and find the interesting difference:

env PROPERTY=FP compareOLs.pl LOG LOG_inputorder -50 -2
Function : ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPd
Region   : ML_BSSN_Advect.bcOOOOOOO_ZL19ML_BSSN_Advect_BodyPK4_cGHiiPKdS3_S3_PKiS5_iPKPdOOOOOOOMBB184Region0
Overlaping intervals improvement  : -66
Num spilled intervals improvement : -1365
Difference exists.

LOG:            Built with -mllvm -sched-print-pressures -mllvm -regalloc-stats
LOG_inputorder: Built with -mllvm -sched-print-pressures -mllvm -regalloc-stats -mllvm -misched-inputorder
3rd arg (-50):  The overlap improvement that LOG_inputorder should have for a region
4th arg (-2):   The number of spilled intervals (per REGALLOCSTAT) improvement the containing function should have.

With this "interestingness test", llvm-reduce can be used to do the actual reduction. A wrapper for this purpose is reduce_region.sh, that needs LLCBIN in the environment and the input file as an argument. The test uses llc, so the ML_BSSN_Advect.ll passed is the output aftr the middle-end passes have run (clang -emit-llvm...).

env LLCBIN=$LLVM_BUILD/bin/llc $LLVM_BUILD/bin/llvm-reduce -j 1  --test=[path-to]/reduce_region.sh ML_BSSN_Advect.ll

Then in order to get a visual overview of the scheduling effects with the reduced test case:

llc -O3 -mcpu=z16 -o out.s reduced.ll -sched-show-ints >& show_out   # use 'less -r show_out' for the escape sequences to work.

In reduce_region.sh above, the values of -50 and -2 may be good to start with, and as a second step a yet smaller test case could be made by changing the numbers to, let's say -10 -1 or even -10 0, which may be useful. To get a side-by-side overview of the scheduling effects, I use two panes in a tmux session and adjust the llc output to look at both before and after, but there is also a script that doesn't work perfectly (yet) due to escape sequences (may need to adjust some spacing in it): sidebyside_sched.sh show_out.

It is quite interesting to see visually how the registers have been become more overlapping with each other.

At the top: Register indexes printed vertically with a color.
A line below a number represents the register being live, with the same color as the number above it.
the "F number" to the right is how many FP registers are live/overlapping at that point ("G numbers" are also printed if
-showgr is passed to llc).

The schedule has changed to the right with more overlapping lines and higher F values, which should explain the extra spilling. The original schedule had the VL64s (load from memory) scheduled low, while the GenericScheduler has allowed them to float up, which wasn't so good in this case.

This type of reduction/visualization has given me many clues to potential improvements of the scheduler strategy. This file and example shows the probably still most important heuristic, namely to schedule loads (and other instructions defining but not redefining a reg, with no uses becoming live) low in a big region. It seems that this is basically all you need to do to not cause spilling - at least in big regions like this. There are of course a few files that do not like that, so there is still room for improvement in SystemZPreRASchedStrategy. Usually, but not always, can the original problem be understood in the reduced region and if fixed also handles the original file.

In the SystemZPreRASchedStrategy this has evolved into computeSULivenessScore() over a series of many, many experiments with this. I have tried many good and reasonable ideas where only some happened to give good benchmark results as well. I have experimented with (small) DFS trees but removed them again as they where not needed to give good results. It is probably worth another try with them at some point. I also tried reversing this "loads down" heuristic to "stores up", but that didn't work very well for one or other reason (same with bi-directional). This is probably also worth revisting at some point with a careful approach - storing a value right after the last def should reduce spilling and be an improvement. There are still many things to be tried, but the current strategy is at a good point with good results. FP benchmarks are improving but Int not so much (no regressions however).

I may be wrong, but my feeling is that to get best performance this probably has to be tuned and benchmarked for a specific target. On the other hand, probably some other target could still get an improvement with this strategy, maybe after some adjustments.

There is also the -schedfile option which I have tried by building all different policies (including disabling) and then recompiled a final time taking for each region the policy that gave the least spill (from a file). This could also be interesting to revisit, but so far it hasn't given any increased performance.

The scripts:
sched_scripts.tar.gz

@uweigand @dominik-steenken @stefan-sf-ibm @atrick @michaelmaitland @wangpc-pp @mshockwave

-schedfile -sched-show-ints -regalloc-stats

github-actions · 2025-04-20T11:55:14Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/include/llvm/CodeGen/MachineScheduler.h llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h llvm/include/llvm/CodeGen/SlotIndexes.h llvm/include/llvm/CodeGen/TargetRegisterInfo.h llvm/lib/CodeGen/MachineScheduler.cpp llvm/lib/CodeGen/RegAllocGreedy.cpp llvm/lib/CodeGen/RegAllocGreedy.h llvm/lib/CodeGen/TargetRegisterInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.h

View the diff from clang-format here.

diff --git a/llvm/include/llvm/CodeGen/MachineScheduler.h b/llvm/include/llvm/CodeGen/MachineScheduler.h
index 8a94ac886..10fa1bc8e 100644
--- a/llvm/include/llvm/CodeGen/MachineScheduler.h
+++ b/llvm/include/llvm/CodeGen/MachineScheduler.h
@@ -427,7 +427,8 @@ class ScheduleDAGMILive : public ScheduleDAGMI {
 
   // Show liveness visually of registers before and after scheduling. Enabled
   // with -sched-show-ints and -misched-only-block=NUM.
-  void showIntervals(std::string Msg, std::string I, MachineBasicBlock *MBB) const;
+  void showIntervals(std::string Msg, std::string I,
+                     MachineBasicBlock *MBB) const;
 
 protected:
   RegisterClassInfo *RegClassInfo;
diff --git a/llvm/include/llvm/CodeGen/SlotIndexes.h b/llvm/include/llvm/CodeGen/SlotIndexes.h
index 6da9b858c..e4113ed93 100644
--- a/llvm/include/llvm/CodeGen/SlotIndexes.h
+++ b/llvm/include/llvm/CodeGen/SlotIndexes.h
@@ -100,8 +100,8 @@ class raw_ostream;
     unsigned getIndex() const {
       return listEntry()->getIndex() | getSlot();
     }
-  private:
 
+  private:
     /// Returns the slot for this SlotIndex.
     Slot getSlot() const {
       return static_cast<Slot>(lie.getInt());
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 90d5132d4..6dbaba5f5 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -962,6 +962,7 @@ public:
 private:
   mutable std::set<unsigned> PrioRegClasses;
   void initializePrioRegClasses() const;
+
 public:
   bool isPrioRC(unsigned RegClassID) const;
   bool isPrioVirtReg(Register Reg, const MachineRegisterInfo *MRI) const;
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 300a79a70..c6bb8e5b7 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -71,12 +71,12 @@
 #include <vector>
 
 #include "llvm/IR/Module.h"
-#include <sstream>
-#include <iomanip>
+#include <cstdio>
 #include <fstream>
+#include <iomanip>
+#include <sstream>
 #include <stdio.h>
 #include <unistd.h>
-#include <cstdio>
 
 using namespace llvm;
 
@@ -1644,14 +1644,13 @@ bool isDefedByImplicitDef(MachineInstr *UseMI, const LiveInterval &LI,
 
 // Get number of live vregs at MI, summarized for FP and GPR regs.
 static void getNumLive(MachineInstr &MI, LiveIntervals *LIS,
-                       MachineRegisterInfo &MRI,
-                       const TargetRegisterInfo *TRI,
-                       std::vector<Register> &HandledVRegs,
-                       unsigned &NumLiveFP, unsigned &NumLiveGPR) {
+                       MachineRegisterInfo &MRI, const TargetRegisterInfo *TRI,
+                       std::vector<Register> &HandledVRegs, unsigned &NumLiveFP,
+                       unsigned &NumLiveGPR) {
   MachineFunction *MF = MI.getParent()->getParent();
   const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
-  unsigned MIIdx = LIS->getSlotIndexes()->getInstructionIndex(MI)
-    .getRegSlot().getIndex();
+  unsigned MIIdx =
+      LIS->getSlotIndexes()->getInstructionIndex(MI).getRegSlot().getIndex();
   std::vector<unsigned> RCCounts(TRI->getNumRegClasses(), 0);
   NumLiveFP = NumLiveGPR = 0;
 
@@ -1695,9 +1694,9 @@ static void getNumLive(MachineInstr &MI, LiveIntervals *LIS,
     NumLiveGPR += (NumGPRLoHi - E) / 2;
 }
 
-void ScheduleDAGMILive::countRegOverlaps(RegOverlaps &RO,
-                                         MachineBasicBlock::iterator FirstItr,
-                                         MachineBasicBlock::iterator EndItr) const {
+void ScheduleDAGMILive::countRegOverlaps(
+    RegOverlaps &RO, MachineBasicBlock::iterator FirstItr,
+    MachineBasicBlock::iterator EndItr) const {
   std::vector<Register> HandledVRegs;
   for (unsigned I = 0, E = MRI.getNumVirtRegs(); I != E; ++I)
     HandledVRegs.push_back(I);
@@ -1744,8 +1743,7 @@ static void countProperties(Properties &P, ScheduleDAGMILive *DAG) {
     if (FPDef) {
       P.FPDefs++;
       P.FPTotLat += SU->Latency;
-    }
-    else if (IsDef) {
+    } else if (IsDef) {
       P.OtherDefs++;
       P.OtherLat += SU->Latency;
     }
@@ -1770,10 +1768,9 @@ std::string getRegionID(unsigned CurrRegionIdx, const MachineBasicBlock *MBB) {
   const Function *F = &MBB->getParent()->getFunction();
   std::string TmpStr;
   raw_string_ostream OS(TmpStr);
-  OS << F->getParent()->getModuleIdentifier() << "OOOOOOO"
-     << F->getName() << "OOOOOOO"
-     << "MBB" << MBB->getNumber()
-     << "Region" << CurrRegionIdx;
+  OS << F->getParent()->getModuleIdentifier() << "OOOOOOO" << F->getName()
+     << "OOOOOOO"
+     << "MBB" << MBB->getNumber() << "Region" << CurrRegionIdx;
   return OS.str();
 }
 
@@ -1803,8 +1800,7 @@ void ScheduleDAGMILive::showIntervals(std::string Msg,
       if ((pos = R.find("-")) != std::string::npos) {
         Start = std::stoi(R.substr(0, pos));
         End = std::stoi(R.substr(pos + 1, R.length() - 1));
-      }
-      else
+      } else
         Start = End = std::stoi(R);
       for (unsigned i = Start; i <= End; i++)
         VRegsToShow.push_back(i);
@@ -1833,18 +1829,22 @@ void ScheduleDAGMILive::showIntervals(std::string Msg,
   }
 
   std::vector<unsigned> Colors;
-  Colors.push_back(31);  Colors.push_back(32);  Colors.push_back(33);
-  Colors.push_back(34);  Colors.push_back(35);  Colors.push_back(36);
+  Colors.push_back(31);
+  Colors.push_back(32);
+  Colors.push_back(33);
+  Colors.push_back(34);
+  Colors.push_back(35);
+  Colors.push_back(36);
   Colors.push_back(37);
 
-  std::map<MachineInstr*, unsigned> NodeNum;
+  std::map<MachineInstr *, unsigned> NodeNum;
   for (unsigned Idx = 0, End = SUnits.size(); Idx != End; ++Idx) {
     const SUnit *SU = &SUnits[Idx];
     NodeNum[SU->getInstr()] = Idx;
   }
 
   // Print reg numbers vertically.
-  for(unsigned i = 0; i < 5; i++) {
+  for (unsigned i = 0; i < 5; i++) {
     for (auto Reg : VRegsToShow) {
       const LiveInterval &LI = LIS->getInterval(Register::index2VirtReg(Reg));
       if (LI.empty())
@@ -1859,8 +1859,8 @@ void ScheduleDAGMILive::showIntervals(std::string Msg,
   }
 
   for (auto &I : *MBB) {
-    unsigned CurrIdx = LIS->getSlotIndexes()->getInstructionIndex(I)
-      .getRegSlot().getIndex();
+    unsigned CurrIdx =
+        LIS->getSlotIndexes()->getInstructionIndex(I).getRegSlot().getIndex();
     // Indicate liveness for each vreg.
     for (auto Reg : VRegsToShow) {
       const LiveInterval &LI = LIS->getInterval(Register::index2VirtReg(Reg));
@@ -1869,7 +1869,7 @@ void ScheduleDAGMILive::showIntervals(std::string Msg,
       dbgs() << "\033[" << Colors[Reg % Colors.size()] << "m";
       bool reads, writes;
       std::tie(reads, writes) =
-        I.readsWritesVirtualRegister(Register::index2VirtReg(Reg));
+          I.readsWritesVirtualRegister(Register::index2VirtReg(Reg));
       if (writes)
         dbgs() << "\xE2\x94\xA2";
       else if (reads)
@@ -1909,11 +1909,11 @@ void ScheduleDAGMILive::showIntervals(std::string Msg,
 // different policies, compare and decide per region and write this file).
 static cl::opt<bool> SCHEDFILE("schedfile", cl::init(false));
 using std::string;
-string getcwd( void ) {
-   char buff[PATH_MAX];
-   getcwd( buff, PATH_MAX );
-   string cwd( buff );
-   return cwd;
+string getcwd(void) {
+  char buff[PATH_MAX];
+  getcwd(buff, PATH_MAX);
+  string cwd(buff);
+  return cwd;
 }
 string getNextSchedPolicy() {
   string CWD = getcwd();
@@ -1943,7 +1943,7 @@ string getNextSchedPolicy() {
 /// ScheduleDAGMILive then it will want to override this virtual method in order
 /// to update any specialized state.
 void ScheduleDAGMILive::schedule() {
-  GenericScheduler *Strategy = ((GenericScheduler*) SchedImpl.get());
+  GenericScheduler *Strategy = ((GenericScheduler *)SchedImpl.get());
   if (SCHEDFILE) {
     string NextPolicy = getNextSchedPolicy();
     string RegionID = getRegionID(CurrRegionIdx, BB);
@@ -1951,22 +1951,18 @@ void ScheduleDAGMILive::schedule() {
       dbgs() << RegionID << " BOTTOMUP\n";
       Strategy->RegionPolicy.OnlyTopDown = false;
       Strategy->RegionPolicy.OnlyBottomUp = true;
-    }
-    else if (NextPolicy.compare("TOPDW") == 0) {
+    } else if (NextPolicy.compare("TOPDW") == 0) {
       dbgs() << RegionID << " TOPDOWN\n";
       Strategy->RegionPolicy.OnlyTopDown = true;
       Strategy->RegionPolicy.OnlyBottomUp = false;
-    }
-    else if (NextPolicy.compare("BIDIR") == 0) {
+    } else if (NextPolicy.compare("BIDIR") == 0) {
       dbgs() << RegionID << " BIDIR\n";
       Strategy->RegionPolicy.OnlyTopDown = false;
       Strategy->RegionPolicy.OnlyBottomUp = false;
-    }
-    else if (NextPolicy.compare("NOSCH") == 0) {
+    } else if (NextPolicy.compare("NOSCH") == 0) {
       dbgs() << RegionID << " NOSCH\n";
       return;
-    }
-    else
+    } else
       assert(0 && "Bad read of next sched policy.");
   }
 
@@ -1999,12 +1995,12 @@ void ScheduleDAGMILive::schedule() {
   const SUnit *RegionLast = &SUnits[SUnits.size() - 1];
   bool IsFirstInMBB = RegionFirst->getInstr() == BB->begin();
   MachineBasicBlock::iterator BeforeRegItr;
-  MachineBasicBlock::iterator
-    AfterRegItr = std::next(RegionLast->getInstr()->getIterator());
+  MachineBasicBlock::iterator AfterRegItr =
+      std::next(RegionLast->getInstr()->getIterator());
   if (!IsFirstInMBB)
     BeforeRegItr = std::prev(RegionFirst->getInstr()->getIterator());
-  MachineBasicBlock::iterator
-    FirstItr = IsFirstInMBB ? BB->begin() : std::next(BeforeRegItr);
+  MachineBasicBlock::iterator FirstItr =
+      IsFirstInMBB ? BB->begin() : std::next(BeforeRegItr);
   RegOverlaps OL0, OL1;
   if (SchedPrintPressures)
     countRegOverlaps(OL0, FirstItr, AfterRegItr);
@@ -2047,8 +2043,8 @@ void ScheduleDAGMILive::schedule() {
     countProperties(P, this);
     printRegionID(CurrRegionIdx, BB);
     dbgs() << "_GPR" << OL1.GPR - OL0.GPR << "_FP" << OL1.FP - OL0.FP;
-    dbgs() << "_GPRMax" << OL1.GPRMax - OL0.GPRMax << "_FPMax" <<
-      OL1.FPMax - OL0.FPMax;
+    dbgs() << "_GPRMax" << OL1.GPRMax - OL0.GPRMax << "_FPMax"
+           << OL1.FPMax - OL0.FPMax;
     dbgs() << "_Size" << SUnits.size();
     dbgs() << "_Stores" << P.NumStores << "_Loads" << P.NumLoads;
     dbgs() << "_FPDefs" << P.FPDefs << "_OtherDefs" << P.OtherDefs;
@@ -3272,8 +3268,9 @@ void SchedBoundary::bumpNode(SUnit *SU) {
   unsigned IncMOps = SchedModel->getNumMicroOps(SU->getInstr());
   // Allow for input order by skipping this assertion:
   // assert(
-  //     (CurrMOps == 0 || (CurrMOps + IncMOps) <= SchedModel->getIssueWidth()) &&
-  //     "Cannot schedule this instruction's MicroOps in the current cycle.");
+  //     (CurrMOps == 0 || (CurrMOps + IncMOps) <= SchedModel->getIssueWidth())
+  //     && "Cannot schedule this instruction's MicroOps in the current
+  //     cycle.");
 
   unsigned ReadyCycle = (isTop() ? SU->TopReadyCycle : SU->BotReadyCycle);
   LLVM_DEBUG(dbgs() << "  Ready @" << ReadyCycle << "c\n");
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index b0677843c..db82d8aeb 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2878,8 +2878,7 @@ bool RAGreedy::hasVirtRegAlloc() {
 std::string getFunctionID(const Function *F) {
   std::string TmpStr;
   raw_string_ostream OS(TmpStr);
-  OS << F->getParent()->getModuleIdentifier() << "OOOOOOO"
-     << F->getName();
+  OS << F->getParent()->getModuleIdentifier() << "OOOOOOO" << F->getName();
   return OS.str();
 }
 static cl::opt<bool> RegallocStats("regalloc-stats", cl::init(false));
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.h b/llvm/lib/CodeGen/RegAllocGreedy.h
index 3df92f7ad..46eaa3fc6 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.h
+++ b/llvm/lib/CodeGen/RegAllocGreedy.h
@@ -288,6 +288,7 @@ private:
 
   unsigned NumFPSpilled;
   unsigned NumOtherSpilled;
+
 public:
   RAGreedy(RequiredAnalyses &Analyses, const RegAllocFilterFunc F = nullptr);
 
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index 10abf95e7..f0e2e6411 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -481,8 +481,8 @@ bool TargetRegisterInfo::isPrioRC(unsigned RegClassID) const {
   return PrioRegClasses.count(RegClassID);
 }
 
-bool TargetRegisterInfo::
-isPrioVirtReg(Register Reg, const MachineRegisterInfo *MRI) const {
+bool TargetRegisterInfo::isPrioVirtReg(Register Reg,
+                                       const MachineRegisterInfo *MRI) const {
   initializePrioRegClasses();
   return (Reg.isVirtual() &&
           PrioRegClasses.count(MRI->getRegClass(Reg)->getID()));

s-barannikov · 2025-04-21T02:11:35Z

Can you copy this to discource? It is the usual (and more convenient) place to discuss such things.
https://discourse.llvm.org/

JonPsson1 · 2025-04-24T08:44:01Z

Can you copy this to discource? It is the usual (and more convenient) place to discuss such things. https://discourse.llvm.org/

Thanks.
https://discourse.llvm.org/t/improving-pre-ra-machinescheduler/86028

-sched-print-pressures

ba24da9

-schedfile -sched-show-ints -regalloc-stats

JonPsson1 mentioned this pull request Apr 20, 2025

[SystemZ] Add a SystemZ specific pre-RA scheduling strategy. #135076

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental patch for understanding the pre-RA MachineScheduler #136483

Experimental patch for understanding the pre-RA MachineScheduler #136483

Uh oh!

JonPsson1 commented Apr 20, 2025

Uh oh!

github-actions bot commented Apr 20, 2025

Uh oh!

s-barannikov commented Apr 21, 2025

Uh oh!

JonPsson1 commented Apr 24, 2025

Uh oh!

Uh oh!

Experimental patch for understanding the pre-RA MachineScheduler #136483

Are you sure you want to change the base?

Experimental patch for understanding the pre-RA MachineScheduler #136483

Uh oh!

Conversation

JonPsson1 commented Apr 20, 2025

Uh oh!

github-actions bot commented Apr 20, 2025

Uh oh!

s-barannikov commented Apr 21, 2025

Uh oh!

JonPsson1 commented Apr 24, 2025

Uh oh!

Uh oh!