[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker #139881

igogo-x86 · 2025-05-14T11:37:02Z

Currently, when VPSlotTracker is initialized with a VPlan, its assignName method calls printAsOperand on each underlying instruction. Each such call recomputes slot numbers for the entire function, leading to O(N × M) complexity, where M is the number of instructions in the loop and N is the number of instructions in the function.

This results in slow debug output for large loops. For example, printing costs of all instructions becomes O(M² × N), which is especially painful when enabling verbose dumps.

This patch improves debugging performance by caching slot numbers using ModuleSlotTracker. It avoids redundant recomputation and makes debug output significantly faster.

igogo-x86 · 2025-05-14T11:37:13Z

Example - https://godbolt.org/z/z9T5qaafb

llvmbot · 2025-05-14T11:37:39Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Igor Kirillov (igogo-x86)

Changes

Currently, when VPSlotTracker is initialized with a VPlan, its assignName method calls printAsOperand on each underlying instruction. Each such call recomputes slot numbers for the entire function, leading to O(N × M) complexity, where M is the number of instructions in the loop and N is the number of instructions in the function.

This results in slow debug output for large loops. For example, printing costs of all instructions becomes O(M² × N), which is especially painful when enabling verbose dumps.

This patch improves debugging performance by caching slot numbers using ModuleSlotTracker. It avoids redundant recomputation and makes debug output significantly faster.

Full diff: https://github.com/llvm/llvm-project/pull/139881.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+4-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanHelpers.h (+13-1)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 167aff737d3fd..a1567fab4eff2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1441,7 +1441,10 @@ void VPSlotTracker::assignName(const VPValue *V) {
   std::string Name;
   if (UV) {
     raw_string_ostream S(Name);
-    UV->printAsOperand(S, false);
+    if (MST)
+      UV->printAsOperand(S, false, *MST);
+    else
+      UV->printAsOperand(S, false);
   } else
     Name = VPI->getName();
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
index 1d42c8f5f3737..0aa0133f4fe1c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanHelpers.h
@@ -23,6 +23,7 @@
 #include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/ModuleSlotTracker.h"
 #include "llvm/Support/InstructionCost.h"
 
 namespace llvm {
@@ -382,14 +383,25 @@ class VPSlotTracker {
   /// Number to assign to the next VPValue without underlying value.
   unsigned NextSlot = 0;
 
+  /// Cache slot indexes to avoid recomputing them on each printAsOperand call.
+  std::unique_ptr<ModuleSlotTracker> MST;
+
   void assignName(const VPValue *V);
   void assignNames(const VPlan &Plan);
   void assignNames(const VPBasicBlock *VPBB);
 
 public:
   VPSlotTracker(const VPlan *Plan = nullptr) {
-    if (Plan)
+    if (Plan) {
+      // This check is required to support unit tests with incomplete IR.
+      if (Function *F =
+              Plan->getScalarHeader()->getIRBasicBlock()->getParent()) {
+        Module *M = F->getParent();
+        MST = std::make_unique<ModuleSlotTracker>(M);
+        MST->incorporateFunction(*F);
+      }
       assignNames(*Plan);
+    }
   }
 
   /// Returns the name assigned to \p V, if there is one, otherwise try to

david-arm · 2025-05-14T12:49:43Z

llvm/lib/Transforms/Vectorize/VPlanHelpers.h

+      if (Function *F =
+              Plan->getScalarHeader()->getIRBasicBlock()->getParent()) {
+        Module *M = F->getParent();
+        MST = std::make_unique<ModuleSlotTracker>(M);


It looks like we're invoking this from these functions:

raw_ostream &llvm::operator<<(raw_ostream &OS, const VPRecipeBase &R) { const VPBasicBlock *Parent = R.getParent(); VPSlotTracker SlotTracker(Parent ? Parent->getPlan() : nullptr); R.print(OS, "", SlotTracker); return OS; } void VPValue::dump() const { const VPRecipeBase *Instr = dyn_cast_or_null<VPRecipeBase>(this->Def); VPSlotTracker SlotTracker( (Instr && Instr->getParent()) ? Instr->getParent()->getPlan() : nullptr); print(dbgs(), SlotTracker); dbgs() << "\n"; } void VPDef::dump() const { const VPRecipeBase *Instr = dyn_cast_or_null<VPRecipeBase>(this); VPSlotTracker SlotTracker( (Instr && Instr->getParent()) ? Instr->getParent()->getPlan() : nullptr); print(dbgs(), "", SlotTracker); dbgs() << "\n"; } void VPBlockBase::print(raw_ostream &O) const { VPSlotTracker SlotTracker(getPlan()); print(O, "", SlotTracker); }

Isn't this actually making the general case a lot slower? It looks like incorporateFunction could be quite expensive. Wouldn't it be faster to only create this once, perhaps by adding the ModuleSlotTracker to VPlan and guarded by #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)?

incorporateFunction doesn't do any job beyond creating SlotTracker and assigning F. Any index calculation is delayed until printAsOperand is called in assignName.

Also, I don't think VPSlotTracker exists outside of debug/assert build - all print-related functions are already guarde by this define:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

The version of incorporateFunction I'm looking at is this:

void ModuleSlotTracker::incorporateFunction(const Function &F) { // Using getMachine() may lazily create the slot tracker. if (!getMachine()) return; // Nothing to do if this is the right function already. if (this->F == &F) return; if (this->F) Machine->purgeFunction(); Machine->incorporateFunction(&F); this->F = &F; }

which looks like it's doing more than setting a couple of values?

IIUC this may slow down the case where numbering the function isn't neccessary, e.g. because all values are named or we only have VPValues w/o underlying IR. Could we delay incorporateFunction until it is definitely needed, i.e. printing an unnamed IR value?

Then we should get the best of both worlds, because otherwise printAsOperand would call incorporateFunction each time.

Okay, made it genuinely lazy

Currently, when VPSlotTracker is initialized with a VPlan, its assignName method calls printAsOperand on each underlying instruction. Each such call recomputes slot numbers for the entire function, leading to O(N × M) complexity, where M is the number of instructions in the loop and N is the number of instructions in the function. This results in slow debug output for large loops. For example, printing costs of all instructions becomes O(M² × N), which is especially painful when enabling verbose dumps. This patch improves debugging performance by caching slot numbers using ModuleSlotTracker. It avoids redundant recomputation and makes debug output significantly faster.

david-arm · 2025-05-16T14:08:36Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+      auto *IUV = cast<Instruction>(UV);
+      // This check is required to support unit tests with incomplete IR.
+      if (IUV->getParent()) {
+        MST = std::make_unique<ModuleSlotTracker>(IUV->getModule());


I'm not sure how useful this level of caching really is, because we're still throwing away the cached object for every dump of VPValue. Although in fairness I think that also seems to be true for VPValue2Name, since that member variable is not static.

Can you not just make MST a static member that's initialised to nullptr and lazily updated to std::make_unique<ModuleSlotTracker>(nullptr)?

Sorry, I think I've misunderstood how the VPSlotTracker is most commonly used. I guess the performance we're most worried is on the path where we print the vplan, right?

LLVM_DUMP_METHOD void VPlan::print(raw_ostream &O) const { VPSlotTracker SlotTracker(this); O << "VPlan '" << getName() << "' {";

in which case the slot tracker is only instantiated once and passed down the hierarchy. Please ignore my comment and sorry for the noise! I just happened to notice that some dump() function also create a slot tracker, but perhaps these are rarely called.

The main slowdown comes from VPRecipeBase::cost, which prints debug output like:

dbgs() << "Cost of " << RecipeCost << " for VF " << VF << ": "; dump();

or:

Cost of 1 for VF 2: WIDEN ir<%1547> = add nsw ir<%1546>, ir<%10>

This is where we create and destroy a new VPSlotTracker for each line of output. VPSlotTracker assigns names to all VPValues in the VPlan every time it’s instantiated. That does not look okay, but I guess we can live with that.

The real issue is that for each underlying Value, VPSlotTracker internally creates a SlotTracker, which in turn traverses all instructions in the function. That makes dumping a value quadratic in complexity. The time spent initializing SlotTracker itself is minor compared to the much larger cost of running VPSlotTracker::assignNames.

david-arm · 2025-05-16T14:09:22Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+      // This check is required to support unit tests with incomplete IR.
+      if (IUV->getParent()) {
+        MST = std::make_unique<ModuleSlotTracker>(IUV->getModule());
+        MST->incorporateFunction(*IUV->getFunction());


Is an instruction that has a parent (i.e. a basic block) also guaranteed to have a function? I assume so, but just curious.

david-arm · 2025-05-16T14:14:00Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

@@ -1441,7 +1441,23 @@ void VPSlotTracker::assignName(const VPValue *V) {
  std::string Name;
  if (UV) {
    raw_string_ostream S(Name);
-    UV->printAsOperand(S, false);
+    if (MST) {
+      UV->printAsOperand(S, false, *MST);


Perhaps it might be easier to have just two case if you rewrite like this:

if (!isa<Instruction>(UV) || UV->hasName()) UV->printAsOperand(S, false); else UV->printAsOperand(S, false, getModuleTracker(cast<Instruction>(UV)));

where a new helper function getModuleTracker can do the logic to lazily construct the module tracker. What do you think?

Might be worth pulling out the code that gets the name for a VPValue, can simplify have an early exit for the cases where no ModuleSlotTracker is needed and then create it if needed and use it.

@david-arm I’m not sure that actually helps much—MST is only used here and isn’t reused, so keeping the logic in one place feels simpler to me.

@fhahn, could you clarify what exactly you mean by “pulling out the code that gets the name for a VPValue”? I’m not sure which part you’re referring to or what the intended simplification would be.

I was thinking if having something like

getNameForVPValue() { if (UV->hasName()) { UV->printAsOperand(S, false, *MST); return Name; } if (!MST) { code to init slot tracker } UV->printAsOperand(S, false, *MST); return Name; }

igogo-x86 requested review from fhahn and david-arm May 14, 2025 11:37

llvmbot added vectorizers llvm:transforms labels May 14, 2025

david-arm reviewed May 14, 2025

View reviewed changes

igogo-x86 added 2 commits May 16, 2025 11:01

Initialize MST lazily

d8fe7f8

igogo-x86 force-pushed the vpslot-tracker-speedup branch from d9e8c34 to d8fe7f8 Compare May 16, 2025 12:02

david-arm reviewed May 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker #139881

[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker #139881

igogo-x86 commented May 14, 2025

Uh oh!

igogo-x86 commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025 •

edited

Loading

Uh oh!

david-arm May 14, 2025

Uh oh!

igogo-x86 May 14, 2025

Uh oh!

david-arm May 14, 2025

Uh oh!

fhahn May 14, 2025

Uh oh!

igogo-x86 May 16, 2025

Uh oh!

david-arm May 16, 2025

Uh oh!

david-arm May 16, 2025

Uh oh!

igogo-x86 May 19, 2025

Uh oh!

david-arm May 16, 2025

Uh oh!

david-arm May 16, 2025

Uh oh!

fhahn May 16, 2025

Uh oh!

igogo-x86 May 19, 2025

Uh oh!

fhahn May 27, 2025

Uh oh!

Uh oh!

[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker #139881

Are you sure you want to change the base?

[VPlan] Speed up VPSlotTracker by using ModuleSlotTracker #139881

Conversation

igogo-x86 commented May 14, 2025

Uh oh!

igogo-x86 commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented May 14, 2025 •

edited

Loading