Skip to content

[LAA] Make Ptr argument optional in isNoWrap. #127410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 24 additions & 17 deletions llvm/lib/Analysis/LoopAccessAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -800,8 +800,13 @@ getStrideFromAddRec(const SCEVAddRecExpr *AR, const Loop *Lp, Type *AccessTy,
Value *Ptr, PredicatedScalarEvolution &PSE) {
// The access function must stride over the innermost loop.
if (Lp != AR->getLoop()) {
LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not striding over innermost loop "
<< *Ptr << " SCEV: " << *AR << "\n");
LLVM_DEBUG({
dbgs() << "LAA: Bad stride - Not striding over innermost loop ";
if (Ptr)
dbgs() << *Ptr << " ";

dbgs() << "SCEV: " << *AR << "\n";
});
return std::nullopt;
}

Expand All @@ -811,8 +816,12 @@ getStrideFromAddRec(const SCEVAddRecExpr *AR, const Loop *Lp, Type *AccessTy,
// Calculate the pointer stride and check if it is constant.
const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
if (!C) {
LLVM_DEBUG(dbgs() << "LAA: Bad stride - Not a constant strided " << *Ptr
<< " SCEV: " << *AR << "\n");
LLVM_DEBUG({
dbgs() << "LAA: Bad stride - Not a constant strided ";
if (Ptr)
dbgs() << *Ptr << " ";
dbgs() << "SCEV: " << *AR << "\n";
});
return std::nullopt;
}

Expand All @@ -839,29 +848,29 @@ getStrideFromAddRec(const SCEVAddRecExpr *AR, const Loop *Lp, Type *AccessTy,
static bool isNoWrapGEP(Value *Ptr, PredicatedScalarEvolution &PSE,
const Loop *L);

/// Check whether \p AR is a non-wrapping AddRec, or if \p Ptr is a non-wrapping
/// GEP.
/// Check whether \p AR is a non-wrapping AddRec. If \p Ptr is not nullptr, use
/// informating from the IR pointer value to determine no-wrap.
static bool isNoWrap(PredicatedScalarEvolution &PSE, const SCEVAddRecExpr *AR,
Value *Ptr, Type *AccessTy, const Loop *L, bool Assume,
std::optional<int64_t> Stride = std::nullopt) {
// FIXME: This should probably only return true for NUW.
if (AR->getNoWrapFlags(SCEV::NoWrapMask))
return true;

if (PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))
if (Ptr && PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))
return true;

// The address calculation must not wrap. Otherwise, a dependence could be
// inverted.
if (isNoWrapGEP(Ptr, PSE, L))
if (Ptr && isNoWrapGEP(Ptr, PSE, L))
return true;

// An nusw getelementptr that is an AddRec cannot wrap. If it would wrap,
// the distance between the previously accessed location and the wrapped
// location will be larger than half the pointer index type space. In that
// case, the GEP would be poison and any memory access dependent on it would
// be immediate UB when executed.
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
if (auto *GEP = dyn_cast_if_present<GetElementPtrInst>(Ptr);
GEP && GEP->hasNoUnsignedSignedWrap())
return true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stride check could be moved to the top and then we could have an early return false if Ptr is nullptr, but most of the checks above are simpler/cheaper than the stride check

Expand All @@ -877,7 +886,7 @@ static bool isNoWrap(PredicatedScalarEvolution &PSE, const SCEVAddRecExpr *AR,
return true;
}

if (Assume) {
if (Ptr && Assume) {
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
LLVM_DEBUG(dbgs() << "LAA: Pointer may wrap:\n"
<< "LAA: Pointer: " << *Ptr << "\n"
Expand Down Expand Up @@ -1119,6 +1128,7 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,

SmallVector<PointerIntPair<const SCEV *, 1, bool>> TranslatedPtrs =
findForkedPointer(PSE, StridesMap, Ptr, TheLoop);
assert(!TranslatedPtrs.empty() && "must have some translated pointers");

/// Check whether all pointers can participate in a runtime bounds check. They
/// must either be invariant or AddRecs. If ShouldCheckWrap is true, they also
Expand All @@ -1144,13 +1154,10 @@ bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,

// When we run after a failing dependency check we have to make sure
// we don't have wrapping pointers.
if (ShouldCheckWrap) {
// Skip wrap checking when translating pointers.
if (TranslatedPtrs.size() > 1)
return false;

if (!isNoWrap(PSE, AR, Ptr, AccessTy, TheLoop, Assume))
return false;
if (ShouldCheckWrap &&
!isNoWrap(PSE, AR, TranslatedPtrs.size() == 1 ? Ptr : nullptr, AccessTy,
Copy link
Member

@Meinersbur Meinersbur Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A direct translation would be

Suggested change
!isNoWrap(PSE, AR, TranslatedPtrs.size() == 1 ? Ptr : nullptr, AccessTy,
!isNoWrap(PSE, AR, TranslatedPtrs.size() <= 1 ? Ptr : nullptr, AccessTy,

why exactly 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is exactly one, it means that we have translated the IR pointer directly to a corresponding SCEV, and we can use the IR value in the analysis.

There could be more translated pointers, e.g. if the IR pointer is something like %p = select %c, ptr %a, ptr %b. There's logic to look through things like select/phi and it may return multiple SCEVs, one for each possible option. For the select mentioned earlier, TranslatedPointers may contain 2 entries, one with the SCEV for %a and one with the SCEV for %b.

In that case we do not try to find IR expressions corresponding to the SCEVs, and most of the logic looking at the IR value would not be safe, as the IR value may never be executed (i.e. we cannot assume it being poison would cause UB)

Copy link
Member

@Meinersbur Meinersbur Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is was specifically thinking about the case TranslatedPtrs.empty(). Before the patch, isNoWrap would have been skipped. Now it is executed with Ptr==nullptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. It should never be empty. I added an assert, as it would be incorrect if it would be empty, meaning we couldn't translate the input pointer.

If it is empty, then we should never enter the enclosing loop. I'd prefer to leave it checking == 1 as this is the only case where it should be safe to use the pointer. WDYT?

TheLoop, Assume)) {
return false;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,52 @@ exit:
define void @dependency_check_and_runtime_checks_needed_select_of_ptr_add_recs(ptr %a, ptr %b, ptr %c, i64 %offset, i64 %n) {
; CHECK-LABEL: 'dependency_check_and_runtime_checks_needed_select_of_ptr_add_recs'
; CHECK-NEXT: loop:
; CHECK-NEXT: Report: cannot check memory dependencies at runtime
; CHECK-NEXT: Memory dependences are safe with run-time checks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain where the behavioral change comes from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%select = select i1 %cmp, ptr %gep.b, ptr %gep.c will be translated to 2 AddRecs, one for %gep.b, and one for %gep.c. Before the patch, we would bail out if we need to check dependencies and there are multiple TranslatedPtrs for a source pointer.

With the patch, we now use can determine no-wrap for both of the AddRecs, without needing to look at the IR pointers.

; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:
; CHECK-NEXT: Check 0:
; CHECK-NEXT: Comparing group ([[GRP5:0x[0-9a-f]+]]):
; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
; CHECK-NEXT: Against group ([[GRP6:0x[0-9a-f]+]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Check 1:
; CHECK-NEXT: Comparing group ([[GRP5]]):
; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
; CHECK-NEXT: Against group ([[GRP7:0x[0-9a-f]+]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Check 2:
; CHECK-NEXT: Comparing group ([[GRP5]]):
; CHECK-NEXT: %gep.a.iv = getelementptr inbounds float, ptr %a, i64 %iv
; CHECK-NEXT: Against group ([[GRP8:0x[0-9a-f]+]]):
; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Check 3:
; CHECK-NEXT: Comparing group ([[GRP6]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Against group ([[GRP7]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Check 4:
; CHECK-NEXT: Comparing group ([[GRP6]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Against group ([[GRP8]]):
; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Check 5:
; CHECK-NEXT: Comparing group ([[GRP7]]):
; CHECK-NEXT: %select = select i1 %cmp, ptr %gep.b, ptr %gep.c
; CHECK-NEXT: Against group ([[GRP8]]):
; CHECK-NEXT: %gep.a.iv.off = getelementptr inbounds float, ptr %a, i64 %iv.offset
; CHECK-NEXT: Grouped accesses:
; CHECK-NEXT: Group [[GRP5]]:
; CHECK-NEXT: (Low: %a High: ((4 * %n) + %a))
; CHECK-NEXT: Member: {%a,+,4}<nuw><%loop>
; CHECK-NEXT: Group [[GRP6]]:
; CHECK-NEXT: (Low: %b High: ((4 * %n) + %b))
; CHECK-NEXT: Member: {%b,+,4}<%loop>
; CHECK-NEXT: Group [[GRP7]]:
; CHECK-NEXT: (Low: %c High: ((4 * %n) + %c))
; CHECK-NEXT: Member: {%c,+,4}<%loop>
; CHECK-NEXT: Group [[GRP8]]:
; CHECK-NEXT: (Low: ((4 * %offset) + %a) High: ((4 * %offset) + (4 * %n) + %a))
; CHECK-NEXT: Member: {((4 * %offset) + %a),+,4}<%loop>
; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:
Expand Down