-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[GVN] Look through select/phi when determining underlying object #99509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Nikita Popov (nikic) ChangesThis addresses an optimization regression in Rust we have observed after #82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables. This patch fixes this by introducing a new helper Doing this more expensive underlying object check has no measurable compile-time impact on CTMark. Full diff: https://github.com/llvm/llvm-project/pull/99509.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index 2c2f965a3cd6f..c689fd4851f12 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -736,6 +736,11 @@ inline Value *getUnderlyingObject(Value *V, unsigned MaxLookup = 6) {
return const_cast<Value *>(getUnderlyingObject(VConst, MaxLookup));
}
+/// Like getUnderlyingObject(), but will also look through phi/select nodes
+/// to establish a single underlying object.
+const Value *getUnderlyingObjectThroughPhisAndSelects(const Value *V,
+ unsigned MaxLookup = 6);
+
/// This method is similar to getUnderlyingObject except that it can
/// look through phi and select instructions and return multiple objects.
///
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 1d54a66705a2a..cdb226a769074 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -743,9 +743,8 @@ static bool isPointerAlwaysReplaceable(const Value *From, const Value *To,
if (isa<Constant>(To) &&
isDereferenceablePointer(To, Type::getInt8Ty(To->getContext()), DL))
return true;
- if (getUnderlyingObject(From) == getUnderlyingObject(To))
- return true;
- return false;
+ return getUnderlyingObjectThroughPhisAndSelects(From) ==
+ getUnderlyingObjectThroughPhisAndSelects(To);
}
bool llvm::canReplacePointersInUseIfEqual(const Use &U, const Value *To,
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 535a248a5f1a2..369c821dd7335 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6609,6 +6609,48 @@ void llvm::getUnderlyingObjects(const Value *V,
} while (!Worklist.empty());
}
+const Value *
+llvm::getUnderlyingObjectThroughPhisAndSelects(const Value *V,
+ unsigned MaxLookup) {
+ const unsigned MaxVisited = 8;
+
+ SmallPtrSet<const Value *, 8> Visited;
+ SmallVector<const Value *, 8> Worklist;
+ Worklist.push_back(V);
+ const Value *Object = nullptr, *FirstObject = nullptr;
+ do {
+ const Value *P = Worklist.pop_back_val();
+ P = getUnderlyingObject(P, MaxLookup);
+
+ if (!FirstObject)
+ FirstObject = P;
+
+ if (!Visited.insert(P).second)
+ continue;
+
+ if (Visited.size() == MaxVisited)
+ return FirstObject;
+
+ if (auto *SI = dyn_cast<SelectInst>(P)) {
+ Worklist.push_back(SI->getTrueValue());
+ Worklist.push_back(SI->getFalseValue());
+ continue;
+ }
+
+ if (auto *PN = dyn_cast<PHINode>(P)) {
+ append_range(Worklist, PN->incoming_values());
+ continue;
+ }
+
+ if (!Object)
+ Object = P;
+ else if (Object != P)
+ return FirstObject;
+ } while (!Worklist.empty());
+
+ return Object;
+}
+
/// This is the function that does the work of looking through basic
/// ptrtoint+arithmetic+inttoptr sequences.
static const Value *getUnderlyingObjectFromInt(const Value *V) {
diff --git a/llvm/test/Transforms/GVN/condprop.ll b/llvm/test/Transforms/GVN/condprop.ll
index 02f4bb97d7ebd..3babdd8173a21 100644
--- a/llvm/test/Transforms/GVN/condprop.ll
+++ b/llvm/test/Transforms/GVN/condprop.ll
@@ -813,7 +813,7 @@ define void @select_same_obj(i1 %c, ptr %p, i64 %x) {
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.*]], label [[EXIT:%.*]]
; CHECK: if:
; CHECK-NEXT: call void @use_ptr(ptr [[P]])
-; CHECK-NEXT: call void @use_ptr(ptr [[P3]])
+; CHECK-NEXT: call void @use_ptr(ptr [[P]])
; CHECK-NEXT: ret void
; CHECK: exit:
; CHECK-NEXT: ret void
@@ -902,7 +902,7 @@ define void @phi_same_obj(i1 %c, ptr %p, i64 %x) {
; CHECK-NEXT: br i1 [[CMP]], label [[IF2:%.*]], label [[EXIT:%.*]]
; CHECK: if2:
; CHECK-NEXT: call void @use_ptr(ptr [[P]])
-; CHECK-NEXT: call void @use_ptr(ptr [[P3]])
+; CHECK-NEXT: call void @use_ptr(ptr [[P]])
; CHECK-NEXT: ret void
; CHECK: exit:
; CHECK-NEXT: ret void
@@ -975,7 +975,7 @@ define void @phi_same_obj_cycle(i1 %c, ptr %p, i64 %x) {
; CHECK-NEXT: [[CMP:%.*]] = icmp eq ptr [[P_IV]], [[P]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.*]], label [[LOOP_LATCH]]
; CHECK: if:
-; CHECK-NEXT: call void @use_ptr(ptr [[P_IV]])
+; CHECK-NEXT: call void @use_ptr(ptr [[P]])
; CHECK-NEXT: call void @use_ptr(ptr [[P]])
; CHECK-NEXT: br label [[LOOP_LATCH]]
; CHECK: loop.latch:
|
maybe |
} | ||
|
||
if (auto *PN = dyn_cast<PHINode>(P)) { | ||
append_range(Worklist, PN->incoming_values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there should be a check that PN->getNumIncominig() < MaxVisisted
. Seems a bit inefficient to essentially wait until you have analyzed MaxVisited
edges to give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As implemented, a phi with more than MaxVisited could be accepted as long as some of the arguments reduce to the same object (as returned by getUnderlyingObject).
Though I'd be open to make this a more strict limit on the number of getUnderlyingObject calls, regardless of what they return.
I mainly want to make sure we don't run into pathological cases. The cost of this analysis seems to be negligible for the average case.
continue; | ||
} | ||
|
||
if (auto *PN = dyn_cast<PHINode>(P)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the loop case, there's also a variant of llvm::getUnderlyingObjects(
that tries to identify PHIs that track the same underlying object I think (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ValueTracking.cpp#L6589)
Would that help/Should this be unified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we don't need that isSameUnderlyingObjectInLoop() logic as long as we're reducing down to a single underlying object. What this handles is the case where you have underlying objects from different loop iterations that have the same SSA value -- this is the same basic issue as "value equal in potential cycles" in BasicAA.
However, if we're reducing to a single object, then this can't happen, because we have both a contribution from the pre-header and the loop. The loop contribution has to reduce back to the loop phi to end up with a single object. If it reduces to a loop variant value, then we'd have at least two underlying objects.
This addresses an optimization regression in Rust we have observed after llvm#82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables. This patch fixes this by introducing a new helper getUnderlyingObjectThroughPhisAndSelects() (let me know if you have a better name...) which basically does what getUnderlyingObjects() does, just specialized to the case where we must arrive at a single underlying object in the end, and with a limit on the number of inspected values. Doing this more expensive underlying object check has no measurable compile-time impact on CTMark.
Sounds reasonable, done! |
Could you please update |
llvm/lib/Analysis/ValueTracking.cpp
Outdated
const Value *Object = nullptr, *FirstObject = nullptr; | ||
do { | ||
const Value *P = Worklist.pop_back_val(); | ||
P = getUnderlyingObject(P, MaxLookup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a budget for the second parameter? We could end up with MaxVisited * MaxLookup
.
Compile-time impact:
|
I checked all of these, and time spent in getUnderlyingObjectsAggressive() was always tiny (think sub-0.01% level). So any regressions here are due to second-order effects (i.e. pointer replacement enabling other optimizations that take more time etc). |
@@ -736,6 +736,10 @@ inline Value *getUnderlyingObject(Value *V, unsigned MaxLookup = 6) { | |||
return const_cast<Value *>(getUnderlyingObject(VConst, MaxLookup)); | |||
} | |||
|
|||
/// Like getUnderlyingObject(), but will try harder to find a single underlying | |||
/// object. In particular, this function also looks through selects and phis. | |||
const Value *getUnderlyingObjectAggressive(const Value *V); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you still expose a parameter to control the aggressiveness? Aggressive without a parameter is kind of scary. E.g. the size of the visited set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should not expose unnecessary options. If we have reason to have different levels of aggressiveness for different callers in the future, we will add the option at the time. Until then, cutoffs should be centrally controlled.
Ping, I'd like to merge this one before the release branches. |
llvm/lib/Analysis/ValueTracking.cpp
Outdated
P = getUnderlyingObject(P); | ||
|
||
if (!FirstObject) | ||
FirstObject = P; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pull this out of the loop if possible?
This would make it clear that we shouldn't end up in a situation where e.g. getUnderlyingObject returns nullptr for the initial pointer, and then we initialize FirstObject with the underlying object for one of the arms of a select and may end up returning an underlying object that just matches one side of the select.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally agree, but I'm not sure how to structure the code to do this without either repeating the getUnderlyingObject() call or duplicating a bunch of logic. Any ideas?
(Note that getUnderlyingObject cannot return null.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the nicest but:
auto SetNextP = [&]() -> bool {
if(Worklist.empty())
return false;
P = getUnderlyingObject(Worklist.pop_back_val());
return true;
};
FirstObject = P = getUnderlyingObject(V);
do {
if (!Visited.insert(P).second)
...
} while(SetNextP());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that's harder to follow than the current logic :(
I pushed a possible variant at f160e9c, which does repeat the getUnderlyingObject() call, but only in the case where FirstObject is actually needed. Maybe this is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the capturing lambda isn't nice.
I definetly do not prefer f160e9c, that really obfuscates the whole "cant be null" aspect.
Maybe just moving the flag? I.e
// Cant be null
FirstObject = getUnderlying(V);
// Instead of a seperate `First` bool, you could also give `P` scope outside of the loop and do `P == nullptr` as `First`.
bool First= true;
do {
P = First ? FirstObject : getUnderlying(Worklist.pop_back_val());
First = false;
}
Its basically same special case as you had but moved around to make the non-null condition clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like cac98c1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, latest version looks good.
if (!FirstObject)
FirstObject = P;
can be removed from the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that wasn't supposed to be there anymore... Dropped in 44714f0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…gressive` (#100102) Thanks to #99509, we can fix rust-lang/rust#119573 too.
) Summary: This addresses an optimization regression in Rust we have observed after #82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables. This patch fixes this by introducing a new helper getUnderlyingObjectAggressive() which basically does what getUnderlyingObjects() does, just specialized to the case where we must arrive at a single underlying object in the end, and with a limit on the number of inspected values. Doing this more expensive underlying object check has no measurable compile-time impact on CTMark. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251674
…gressive` (#100102) Summary: Thanks to #99509, we can fix rust-lang/rust#119573 too. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251024
This addresses an optimization regression in Rust we have observed after #82458. We now only perform pointer replacement if they have the same underlying object. However, getUnderlyingObject() by default only looks through linear chains, not selects/phis. In particular, this means that we miss cases involving involving pointer induction variables.
This patch fixes this by introducing a new helper getUnderlyingObjectAggressive() which basically does what getUnderlyingObjects() does, just specialized to the case where we must arrive at a single underlying object in the end, and with a limit on the number of inspected values.
Doing this more expensive underlying object check has no measurable compile-time impact on CTMark.