Skip to content

[GVN] Look through select/phi when determining underlying object #99509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 22, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jul 18, 2024

This addresses an optimization regression in Rust we have observed after #82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables.

This patch fixes this by introducing a new helper getUnderlyingObjectAggressive() which basically does what getUnderlyingObjects() does, just specialized to the case where we must arrive at a single underlying object in the end, and with a limit on the number of inspected values.

Doing this more expensive underlying object check has no measurable compile-time impact on CTMark.

@llvmbot
Copy link
Member

llvmbot commented Jul 18, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

This addresses an optimization regression in Rust we have observed after #82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables.

This patch fixes this by introducing a new helper
getUnderlyingObjectThroughPhisAndSelects() (let me know if you have a better name...) which basically does what getUnderlyingObjects() does, just specialized to the case where we must arrive at a single underlying object in the end, and with a limit on the number of inspected values.

Doing this more expensive underlying object check has no measurable compile-time impact on CTMark.


Full diff: https://github.com/llvm/llvm-project/pull/99509.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ValueTracking.h (+5)
  • (modified) llvm/lib/Analysis/Loads.cpp (+2-3)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+42)
  • (modified) llvm/test/Transforms/GVN/condprop.ll (+3-3)
diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index 2c2f965a3cd6f..c689fd4851f12 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -736,6 +736,11 @@ inline Value *getUnderlyingObject(Value *V, unsigned MaxLookup = 6) {
   return const_cast<Value *>(getUnderlyingObject(VConst, MaxLookup));
 }
 
+/// Like getUnderlyingObject(), but will also look through phi/select nodes
+/// to establish a single underlying object.
+const Value *getUnderlyingObjectThroughPhisAndSelects(const Value *V,
+                                                      unsigned MaxLookup = 6);
+
 /// This method is similar to getUnderlyingObject except that it can
 /// look through phi and select instructions and return multiple objects.
 ///
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 1d54a66705a2a..cdb226a769074 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -743,9 +743,8 @@ static bool isPointerAlwaysReplaceable(const Value *From, const Value *To,
   if (isa<Constant>(To) &&
       isDereferenceablePointer(To, Type::getInt8Ty(To->getContext()), DL))
     return true;
-  if (getUnderlyingObject(From) == getUnderlyingObject(To))
-    return true;
-  return false;
+  return getUnderlyingObjectThroughPhisAndSelects(From) ==
+         getUnderlyingObjectThroughPhisAndSelects(To);
 }
 
 bool llvm::canReplacePointersInUseIfEqual(const Use &U, const Value *To,
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 535a248a5f1a2..369c821dd7335 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6609,6 +6609,48 @@ void llvm::getUnderlyingObjects(const Value *V,
   } while (!Worklist.empty());
 }
 
+const Value *
+llvm::getUnderlyingObjectThroughPhisAndSelects(const Value *V,
+                                               unsigned MaxLookup) {
+  const unsigned MaxVisited = 8;
+
+  SmallPtrSet<const Value *, 8> Visited;
+  SmallVector<const Value *, 8> Worklist;
+  Worklist.push_back(V);
+  const Value *Object = nullptr, *FirstObject = nullptr;
+  do {
+    const Value *P = Worklist.pop_back_val();
+    P = getUnderlyingObject(P, MaxLookup);
+
+    if (!FirstObject)
+      FirstObject = P;
+
+    if (!Visited.insert(P).second)
+      continue;
+
+    if (Visited.size() == MaxVisited)
+      return FirstObject;
+
+    if (auto *SI = dyn_cast<SelectInst>(P)) {
+      Worklist.push_back(SI->getTrueValue());
+      Worklist.push_back(SI->getFalseValue());
+      continue;
+    }
+
+    if (auto *PN = dyn_cast<PHINode>(P)) {
+      append_range(Worklist, PN->incoming_values());
+      continue;
+    }
+
+    if (!Object)
+      Object = P;
+    else if (Object != P)
+      return FirstObject;
+  } while (!Worklist.empty());
+
+  return Object;
+}
+
 /// This is the function that does the work of looking through basic
 /// ptrtoint+arithmetic+inttoptr sequences.
 static const Value *getUnderlyingObjectFromInt(const Value *V) {
diff --git a/llvm/test/Transforms/GVN/condprop.ll b/llvm/test/Transforms/GVN/condprop.ll
index 02f4bb97d7ebd..3babdd8173a21 100644
--- a/llvm/test/Transforms/GVN/condprop.ll
+++ b/llvm/test/Transforms/GVN/condprop.ll
@@ -813,7 +813,7 @@ define void @select_same_obj(i1 %c, ptr %p, i64 %x) {
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[EXIT:%.*]]
 ; CHECK:       if:
 ; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
-; CHECK-NEXT:    call void @use_ptr(ptr [[P3]])
+; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
 ; CHECK-NEXT:    ret void
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
@@ -902,7 +902,7 @@ define void @phi_same_obj(i1 %c, ptr %p, i64 %x) {
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF2:%.*]], label [[EXIT:%.*]]
 ; CHECK:       if2:
 ; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
-; CHECK-NEXT:    call void @use_ptr(ptr [[P3]])
+; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
 ; CHECK-NEXT:    ret void
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
@@ -975,7 +975,7 @@ define void @phi_same_obj_cycle(i1 %c, ptr %p, i64 %x) {
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp eq ptr [[P_IV]], [[P]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[LOOP_LATCH]]
 ; CHECK:       if:
-; CHECK-NEXT:    call void @use_ptr(ptr [[P_IV]])
+; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
 ; CHECK-NEXT:    call void @use_ptr(ptr [[P]])
 ; CHECK-NEXT:    br label [[LOOP_LATCH]]
 ; CHECK:       loop.latch:

@goldsteinn
Copy link
Contributor

goldsteinn commented Jul 18, 2024

getUnderlyingObjectThroughPhisAndSelects() (let me know if you have a better name...)

maybe getUnderlyingObjectAggressive to leave more room for future extensions :)

}

if (auto *PN = dyn_cast<PHINode>(P)) {
append_range(Worklist, PN->incoming_values());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there should be a check that PN->getNumIncominig() < MaxVisisted. Seems a bit inefficient to essentially wait until you have analyzed MaxVisited edges to give up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As implemented, a phi with more than MaxVisited could be accepted as long as some of the arguments reduce to the same object (as returned by getUnderlyingObject).

Though I'd be open to make this a more strict limit on the number of getUnderlyingObject calls, regardless of what they return.

I mainly want to make sure we don't run into pathological cases. The cost of this analysis seems to be negligible for the average case.

continue;
}

if (auto *PN = dyn_cast<PHINode>(P)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the loop case, there's also a variant of llvm::getUnderlyingObjects( that tries to identify PHIs that track the same underlying object I think (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ValueTracking.cpp#L6589)

Would that help/Should this be unified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we don't need that isSameUnderlyingObjectInLoop() logic as long as we're reducing down to a single underlying object. What this handles is the case where you have underlying objects from different loop iterations that have the same SSA value -- this is the same basic issue as "value equal in potential cycles" in BasicAA.

However, if we're reducing to a single object, then this can't happen, because we have both a contribution from the pre-header and the loop. The loop contribution has to reduce back to the loop phi to end up with a single object. If it reduces to a loop variant value, then we'd have at least two underlying objects.

nikic added 2 commits July 18, 2024 18:18
This addresses an optimization regression in Rust we have observed
after llvm#82458. We now only
perform pointer replacement if they have the same underlying object.
However, getUnderlyingObject() by default only looks through
linear chains, not selects/phis. In particular, this means that
we miss cases involving involving pointer induction variables.

This patch fixes this by introducing a new helper
getUnderlyingObjectThroughPhisAndSelects() (let me know if you
have a better name...) which basically does what getUnderlyingObjects()
does, just specialized to the case where we must arrive at a single
underlying object in the end, and with a limit on the number of
inspected values.

Doing this more expensive underlying object check has no measurable
compile-time impact on CTMark.
@nikic nikic force-pushed the gvn-pointer-repl branch from bbbfa04 to 291ffd2 Compare July 18, 2024 16:18
@nikic
Copy link
Contributor Author

nikic commented Jul 18, 2024

getUnderlyingObjectThroughPhisAndSelects() (let me know if you have a better name...)

maybe getUnderlyingObjectAggressive to leave more room for future extensions :)

Sounds reasonable, done!

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 18, 2024
@tschuett
Copy link

Could you please update getUnderlyingObjectThroughPhisAndSelects() in the header.

const Value *Object = nullptr, *FirstObject = nullptr;
do {
const Value *P = Worklist.pop_back_val();
P = getUnderlyingObject(P, MaxLookup);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a budget for the second parameter? We could end up with MaxVisited * MaxLookup .

@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 18, 2024

Compile-time impact:

Top 5 improvements:
  faiss/HNSW.cpp.ll 3445921402 3333911686 -3.25%
  php/jis0208.ll 308288236 299253584 -2.93%
  spike/regnames.ll 488153333 475353793 -2.62%
  faiss/IndexAdditiveQuantizerFastScan.cpp.ll 434431627 423208324 -2.58%
  lightgbm/boosting.cpp.ll 2999781046 2927390287 -2.41%
Top 5 regressions:
  hermes/StackPromotion.cpp.ll 5074724986 5817835860 +14.64%
  openjdk/ad_x86_gen.ll 18909114243 20121837367 +6.41%
  postgres/strftime.ll 404414332 421153688 +4.14%
  lightgbm/linear_tree_learner.cpp.ll 9902392573 10299377803 +4.01%
  meshlab/filter_screened_poisson.cpp.ll 26246454218 27101073771 +3.26%
Overall: 0.00777903

@nikic
Copy link
Contributor Author

nikic commented Jul 18, 2024

Top 5 regressions:
  hermes/StackPromotion.cpp.ll 5074724986 5817835860 +14.64%
  openjdk/ad_x86_gen.ll 18909114243 20121837367 +6.41%
  postgres/strftime.ll 404414332 421153688 +4.14%
  lightgbm/linear_tree_learner.cpp.ll 9902392573 10299377803 +4.01%
  meshlab/filter_screened_poisson.cpp.ll 26246454218 27101073771 +3.26%

I checked all of these, and time spent in getUnderlyingObjectsAggressive() was always tiny (think sub-0.01% level). So any regressions here are due to second-order effects (i.e. pointer replacement enabling other optimizations that take more time etc).

@@ -736,6 +736,10 @@ inline Value *getUnderlyingObject(Value *V, unsigned MaxLookup = 6) {
return const_cast<Value *>(getUnderlyingObject(VConst, MaxLookup));
}

/// Like getUnderlyingObject(), but will try harder to find a single underlying
/// object. In particular, this function also looks through selects and phis.
const Value *getUnderlyingObjectAggressive(const Value *V);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you still expose a parameter to control the aggressiveness? Aggressive without a parameter is kind of scary. E.g. the size of the visited set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not expose unnecessary options. If we have reason to have different levels of aggressiveness for different callers in the future, we will add the option at the time. Until then, cutoffs should be centrally controlled.

@nikic
Copy link
Contributor Author

nikic commented Jul 22, 2024

Ping, I'd like to merge this one before the release branches.

P = getUnderlyingObject(P);

if (!FirstObject)
FirstObject = P;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pull this out of the loop if possible?

This would make it clear that we shouldn't end up in a situation where e.g. getUnderlyingObject returns nullptr for the initial pointer, and then we initialize FirstObject with the underlying object for one of the arms of a select and may end up returning an underlying object that just matches one side of the select.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally agree, but I'm not sure how to structure the code to do this without either repeating the getUnderlyingObject() call or duplicating a bunch of logic. Any ideas?

(Note that getUnderlyingObject cannot return null.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the nicest but:

 auto SetNextP = [&]() -> bool {
    if(Worklist.empty())
       return false;
    P = getUnderlyingObject(Worklist.pop_back_val());
    return true;
 };
 FirstObject = P = getUnderlyingObject(V);
 do {
     if (!Visited.insert(P).second)
     ...
  } while(SetNextP());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that's harder to follow than the current logic :(

I pushed a possible variant at f160e9c, which does repeat the getUnderlyingObject() call, but only in the case where FirstObject is actually needed. Maybe this is fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the capturing lambda isn't nice.
I definetly do not prefer f160e9c, that really obfuscates the whole "cant be null" aspect.

Maybe just moving the flag? I.e

// Cant be null
FirstObject = getUnderlying(V);
// Instead of a seperate `First` bool, you could also give `P` scope outside of the loop and do `P == nullptr` as `First`.
bool First= true;
do {
  P = First ? FirstObject : getUnderlying(Worklist.pop_back_val());
  First = false;
}

Its basically same special case as you had but moved around to make the non-null condition clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like cac98c1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, latest version looks good.

    if (!FirstObject)
       FirstObject = P;

can be removed from the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that wasn't supposed to be there anymore... Dropped in 44714f0.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@nikic nikic merged commit 32cd189 into llvm:main Jul 22, 2024
5 of 7 checks passed
@nikic nikic deleted the gvn-pointer-repl branch July 22, 2024 14:22
dianqk added a commit that referenced this pull request Jul 23, 2024
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
)

Summary:
This addresses an optimization regression in Rust we have observed after
#82458. We now only perform
pointer replacement if they have the same underlying object. However,
getUnderlyingObject() by default only looks through linear chains, not
selects/phis. In particular, this means that we miss cases involving
involving pointer induction variables.

This patch fixes this by introducing a new helper
getUnderlyingObjectAggressive() which basically does what
getUnderlyingObjects() does, just specialized to the case where we must
arrive at a single underlying object in the end, and with a limit on the
number of inspected values.

Doing this more expensive underlying object check has no measurable
compile-time impact on CTMark.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251674
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…gressive` (#100102)

Summary:
Thanks to #99509, we can fix
rust-lang/rust#119573 too.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants