-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LoopInterchange] Fix overflow in cost calculation #111807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Sjoerd Meijer (sjoerdmeijer) ChangesIf the iteration count is really large, e.g. UINT_MAX, then the cost calculation can overflows and trigger an assert. So saturate the cost to INT_MAX if this is the case (the cost value is kept in a signed integer). This fixes #104761 Full diff: https://github.com/llvm/llvm-project/pull/111807.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/LoopCacheAnalysis.cpp b/llvm/lib/Analysis/LoopCacheAnalysis.cpp
index 7ca9f15ad5fca0..3e03b5ba268cff 100644
--- a/llvm/lib/Analysis/LoopCacheAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopCacheAnalysis.cpp
@@ -328,6 +328,8 @@ CacheCostTy IndexedReference::computeRefCost(const Loop &L,
const SCEV *TripCount =
computeTripCount(*AR->getLoop(), *Sizes.back(), SE);
Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());
+ // For the multiplication result to fit, request a type twice as wide.
+ WiderType = WiderType->getExtendedType();
RefCost = SE.getMulExpr(SE.getNoopOrZeroExtend(RefCost, WiderType),
SE.getNoopOrZeroExtend(TripCount, WiderType));
}
@@ -338,8 +340,11 @@ CacheCostTy IndexedReference::computeRefCost(const Loop &L,
assert(RefCost && "Expecting a valid RefCost");
// Attempt to fold RefCost into a constant.
+ // CacheCostTy is a signed integer, but the tripcount value can be large
+ // and may not fit, so saturate/limit the value to the maximum signed
+ // integer value.
if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
- return ConstantCost->getValue()->getZExtValue();
+ return (CacheCostTy)ConstantCost->getValue()->getLimitedValue(~0ULL >> 1);
LLVM_DEBUG(dbgs().indent(4)
<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
@@ -712,7 +717,12 @@ CacheCost::computeLoopCacheCost(const Loop &L,
CacheCostTy LoopCost = 0;
for (const ReferenceGroupTy &RG : RefGroups) {
CacheCostTy RefGroupCost = computeRefGroupCacheCost(RG, L);
- LoopCost += RefGroupCost * TripCountsProduct;
+
+ // Saturate the cost to INT MAX if the value can overflow.
+ if (RefGroupCost > (std::numeric_limits<int64_t>::max() / TripCountsProduct))
+ LoopCost = std::numeric_limits<int64_t>::max();
+ else
+ LoopCost += RefGroupCost * TripCountsProduct;
}
LLVM_DEBUG(dbgs().indent(2) << "Loop '" << L.getName()
diff --git a/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll b/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll
new file mode 100644
index 00000000000000..7a823b85ce32e7
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll
@@ -0,0 +1,44 @@
+; REQUIRES: asserts
+
+; RUN: opt < %s -passes=loop-interchange -S -debug 2>&1 | FileCheck %s
+
+; For a loop with a very large iteration count, make sure the cost
+; calculation does not overflow:
+;
+; void a(int b) {
+; for (int c;; c += b)
+; for (long d = 0; d < -3ULL; d += 2ULL)
+; A[c][d][d] = 0;
+; }
+
+; CHECK: Access is not consecutive: RefCost=922337203685477580700
+; CHECK: Loop 'for.cond' has cost=9223372036854775807
+; CHECK: TripCount=9223372036854775807
+; CHECK: Access is not consecutive: RefCost=9223372036854775807
+
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+
+@A = local_unnamed_addr global [11 x [11 x [11 x i32]]] zeroinitializer, align 16
+
+define void @foo(i32 noundef %b) {
+entry:
+ %0 = sext i32 %b to i64
+ br label %for.cond
+
+for.cond:
+ %indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]
+ br label %for.body
+
+for.cond.cleanup:
+ %indvars.iv.next = add nsw i64 %indvars.iv, %0
+ br label %for.cond
+
+for.body:
+ %d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ]
+ %arrayidx3 = getelementptr inbounds [11 x [11 x [11 x i32]]], ptr @A, i64 0, i64 %indvars.iv, i64 %d.010, i64 %d.010
+ store i32 0, ptr %arrayidx3, align 4
+ %add = add nuw i64 %d.010, 2
+ %cmp = icmp ult i64 %d.010, -5
+ br i1 %cmp, label %for.body, label %for.cond.cleanup
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to test this more targeted via an analysis test in llvm/test/Analysis/LoopCacheAnalysis/
?
aec5a3a
to
39ac95d
Compare
Thanks for the suggestion, that worked, have moved it. |
39ac95d
to
3265392
Compare
Would it make sense to use |
Thanks for the suggestion, I think that would make a lot of sense, which might as simple as aliasing CacheCostTy to InstructionCost (it's just an int at the moment: |
3265392
to
6983261
Compare
Now using InstructionCost. |
Friendly ping |
The change looks good to me. |
@@ -31,7 +32,7 @@ class ScalarEvolution; | |||
class SCEV; | |||
class TargetTransformInfo; | |||
|
|||
using CacheCostTy = int64_t; | |||
using CacheCostTy = InstructionCost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to now just obscure the concrete type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I personally don't mind the CacheCostTy
alias here, think it is quite appropriately named. Thinking about it, maybe it's actually a better name than using InstructionCost here? But I don't have strong opinions so certainly don't mind a search/replace of CacheCostTy with InstructionCost assuming that's what you're suggesting, is that right @fhahn ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I find it more inconvenient to hide the actual type behind another level of indirection, but changing it in this patch would unnecessarily increase the diff.
if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost)) | ||
return ConstantCost->getValue()->getZExtValue(); | ||
return ConstantCost->getValue()->getLimitedValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed or sufficient to let wrapping handle by InstructionCost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is unfortunately still needed. Helper getZExtValue()
returns an uint64_t and checks if the value fits in this number of bits.
br label %for.cond | ||
|
||
for.cond: | ||
%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ] | |
%iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ] |
%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ] | ||
br label %for.body | ||
|
||
for.cond.cleanup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for.cond.cleanup: | |
outer.latch: |
can also move after inner loop to make things easier to read
%indvars.iv.next = add nsw i64 %indvars.iv, %0 | ||
br label %for.cond | ||
|
||
for.body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for.body: | |
inner.loop: |
br label %for.cond | ||
|
||
for.body: | ||
%d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ] | |
%inner.iv = phi i64 [ 0, %for.cond ], [ %add, %for.body ] |
%0 = sext i32 %b to i64 | ||
br label %for.cond | ||
|
||
for.cond: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for.cond: | |
outer.header: |
If the iteration count is really large, e.g. UINT_MAX, then the cost calculation can overflows and trigger an assert. So saturate the cost to INT_MAX if this is the case (the cost value is kept in a signed integer). This fixes llvm#104761
6983261
to
dfa195c
Compare
Have addressed the comments about the test case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Personally I don't have any strong opinion on using CacheCostTy instead of InstructionCost.
@fhahn do you have any other comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -31,7 +32,7 @@ class ScalarEvolution; | |||
class SCEV; | |||
class TargetTransformInfo; | |||
|
|||
using CacheCostTy = int64_t; | |||
using CacheCostTy = InstructionCost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I find it more inconvenient to hide the actual type behind another level of indirection, but changing it in this patch would unnecessarily increase the diff.
This is a work in progress patch to enable loop-interchange by default and is a continuation of the RFC: https://discourse.llvm.org/t/enabling-loop-interchange/82589 Basically, we promised to fix any compile-time and correctness issues in the different components involved here (loop-interchange and dependence analaysis.) before discussing enabling interchange by default. We think are close to complete this; I would like to explain where we are and wanted to check if there are any thoughts or concerns. A quick overview of the correctness and compile-time improvements that we have made include: Correctness: - [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345) - [LoopInterchange] Fix overflow in cost calculation (llvm#111807) - [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj - [DA] disambiguate evolution of base addresses (llvm#116628) Compile-times: - [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973) - [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128) - [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247) And in terms of remaining work, we think we are very close to fixing these depenence analysis issues: - [DA] do not handle array accesses of different offsets (llvm#123436) - [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630) - [DA] use NSW arithmetic llvm#116632 The compile-time increase with a geomean increase of 0.19% looks good (after committing llvm#124247), I think: stage1-O3: Benchmark kimwitu++ +0.10% sqlite3 +0.14% consumer-typeset +0.07% Bullet +0.06% tramp3d-v4 +0.21% mafft +0.39% ClamAVi +0.06% lencod +0.61% SPASS +0.17% 7zip +0.08% geomean +0.19% See also: http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au We might want to look into lencod to see if we can improve more, but not sure it is strictly necessary.
If the iteration count is really large, e.g. UINT_MAX, then the cost calculation can overflows and trigger an assert. So saturate the cost to INT_MAX if this is the case (the cost value is kept in a signed integer).
This fixes #104761