Skip to content

[LoopInterchange] Fix overflow in cost calculation #111807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

sjoerdmeijer
Copy link
Collaborator

If the iteration count is really large, e.g. UINT_MAX, then the cost calculation can overflows and trigger an assert. So saturate the cost to INT_MAX if this is the case (the cost value is kept in a signed integer).

This fixes #104761

@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Sjoerd Meijer (sjoerdmeijer)

Changes

If the iteration count is really large, e.g. UINT_MAX, then the cost calculation can overflows and trigger an assert. So saturate the cost to INT_MAX if this is the case (the cost value is kept in a signed integer).

This fixes #104761


Full diff: https://github.com/llvm/llvm-project/pull/111807.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/LoopCacheAnalysis.cpp (+12-2)
  • (added) llvm/test/Transforms/LoopInterchange/refcost-overflow.ll (+44)
diff --git a/llvm/lib/Analysis/LoopCacheAnalysis.cpp b/llvm/lib/Analysis/LoopCacheAnalysis.cpp
index 7ca9f15ad5fca0..3e03b5ba268cff 100644
--- a/llvm/lib/Analysis/LoopCacheAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopCacheAnalysis.cpp
@@ -328,6 +328,8 @@ CacheCostTy IndexedReference::computeRefCost(const Loop &L,
       const SCEV *TripCount =
           computeTripCount(*AR->getLoop(), *Sizes.back(), SE);
       Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());
+      // For the multiplication result to fit, request a type twice as wide.
+      WiderType = WiderType->getExtendedType();
       RefCost = SE.getMulExpr(SE.getNoopOrZeroExtend(RefCost, WiderType),
                               SE.getNoopOrZeroExtend(TripCount, WiderType));
     }
@@ -338,8 +340,11 @@ CacheCostTy IndexedReference::computeRefCost(const Loop &L,
   assert(RefCost && "Expecting a valid RefCost");
 
   // Attempt to fold RefCost into a constant.
+  // CacheCostTy is a signed integer, but the tripcount value can be large
+  // and may not fit, so saturate/limit the value to the maximum signed
+  // integer value.
   if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
-    return ConstantCost->getValue()->getZExtValue();
+    return (CacheCostTy)ConstantCost->getValue()->getLimitedValue(~0ULL >> 1);
 
   LLVM_DEBUG(dbgs().indent(4)
              << "RefCost is not a constant! Setting to RefCost=InvalidCost "
@@ -712,7 +717,12 @@ CacheCost::computeLoopCacheCost(const Loop &L,
   CacheCostTy LoopCost = 0;
   for (const ReferenceGroupTy &RG : RefGroups) {
     CacheCostTy RefGroupCost = computeRefGroupCacheCost(RG, L);
-    LoopCost += RefGroupCost * TripCountsProduct;
+
+    // Saturate the cost to INT MAX if the value can overflow.
+    if (RefGroupCost > (std::numeric_limits<int64_t>::max() / TripCountsProduct))
+      LoopCost = std::numeric_limits<int64_t>::max();
+    else
+      LoopCost += RefGroupCost * TripCountsProduct;
   }
 
   LLVM_DEBUG(dbgs().indent(2) << "Loop '" << L.getName()
diff --git a/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll b/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll
new file mode 100644
index 00000000000000..7a823b85ce32e7
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/refcost-overflow.ll
@@ -0,0 +1,44 @@
+; REQUIRES: asserts
+
+; RUN: opt < %s -passes=loop-interchange -S -debug 2>&1 | FileCheck %s
+
+; For a loop with a very large iteration count, make sure the cost
+; calculation does not overflow:
+;
+; void a(int b) {
+;   for (int c;; c += b)
+;     for (long d = 0; d < -3ULL; d += 2ULL)
+;       A[c][d][d] = 0;
+; }
+
+; CHECK:  Access is not consecutive: RefCost=922337203685477580700
+; CHECK:  Loop 'for.cond' has cost=9223372036854775807
+; CHECK:  TripCount=9223372036854775807
+; CHECK:  Access is not consecutive: RefCost=9223372036854775807
+
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+
+@A = local_unnamed_addr global [11 x [11 x [11 x i32]]] zeroinitializer, align 16
+
+define void @foo(i32 noundef %b) {
+entry:
+  %0 = sext i32 %b to i64
+  br label %for.cond
+
+for.cond:
+  %indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]
+  br label %for.body
+
+for.cond.cleanup:
+  %indvars.iv.next = add nsw i64 %indvars.iv, %0
+  br label %for.cond
+
+for.body:
+  %d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ]
+  %arrayidx3 = getelementptr inbounds [11 x [11 x [11 x i32]]], ptr @A, i64 0, i64 %indvars.iv, i64 %d.010, i64 %d.010
+  store i32 0, ptr %arrayidx3, align 4
+  %add = add nuw i64 %d.010, 2
+  %cmp = icmp ult i64 %d.010, -5
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+}

Copy link

github-actions bot commented Oct 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test this more targeted via an analysis test in llvm/test/Analysis/LoopCacheAnalysis/?

@sjoerdmeijer sjoerdmeijer force-pushed the interchange-zext-assert branch from aec5a3a to 39ac95d Compare October 10, 2024 11:04
@sjoerdmeijer
Copy link
Collaborator Author

Is it possible to test this more targeted via an analysis test in llvm/test/Analysis/LoopCacheAnalysis/?

Thanks for the suggestion, that worked, have moved it.

@sjoerdmeijer sjoerdmeijer force-pushed the interchange-zext-assert branch from 39ac95d to 3265392 Compare October 10, 2024 11:20
@paulwalker-arm
Copy link
Collaborator

Would it make sense to use InstructionCost which already accounts for saturation?

@sjoerdmeijer
Copy link
Collaborator Author

Would it make sense to use InstructionCost which already accounts for saturation?

Thanks for the suggestion, I think that would make a lot of sense, which might as simple as aliasing CacheCostTy to InstructionCost (it's just an int at the moment: using CacheCostTy = int64_t;). Will look into it.

@sjoerdmeijer sjoerdmeijer force-pushed the interchange-zext-assert branch from 3265392 to 6983261 Compare October 14, 2024 08:58
@sjoerdmeijer
Copy link
Collaborator Author

Now using InstructionCost.

@sjoerdmeijer
Copy link
Collaborator Author

Friendly ping

@sebpop
Copy link
Contributor

sebpop commented Nov 4, 2024

The change looks good to me.

@@ -31,7 +32,7 @@ class ScalarEvolution;
class SCEV;
class TargetTransformInfo;

using CacheCostTy = int64_t;
using CacheCostTy = InstructionCost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to now just obscure the concrete type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I personally don't mind the CacheCostTy alias here, think it is quite appropriately named. Thinking about it, maybe it's actually a better name than using InstructionCost here? But I don't have strong opinions so certainly don't mind a search/replace of CacheCostTy with InstructionCost assuming that's what you're suggesting, is that right @fhahn ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find it more inconvenient to hide the actual type behind another level of indirection, but changing it in this patch would unnecessarily increase the diff.

if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
return ConstantCost->getValue()->getZExtValue();
return ConstantCost->getValue()->getLimitedValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed or sufficient to let wrapping handle by InstructionCost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is unfortunately still needed. Helper getZExtValue() returns an uint64_t and checks if the value fits in this number of bits.

br label %for.cond

for.cond:
%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]
%iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]

%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond.cleanup ], [ 0, %entry ]
br label %for.body

for.cond.cleanup:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.cond.cleanup:
outer.latch:

can also move after inner loop to make things easier to read

%indvars.iv.next = add nsw i64 %indvars.iv, %0
br label %for.cond

for.body:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.body:
inner.loop:

br label %for.cond

for.body:
%d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%d.010 = phi i64 [ 0, %for.cond ], [ %add, %for.body ]
%inner.iv = phi i64 [ 0, %for.cond ], [ %add, %for.body ]

%0 = sext i32 %b to i64
br label %for.cond

for.cond:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for.cond:
outer.header:

If the iteration count is really large, e.g. UINT_MAX, then the cost
calculation can overflows and trigger an assert. So saturate the cost to
INT_MAX if this is the case (the cost value is kept in a signed integer).

This fixes llvm#104761
@sjoerdmeijer sjoerdmeijer force-pushed the interchange-zext-assert branch from 6983261 to dfa195c Compare November 7, 2024 13:58
@sjoerdmeijer
Copy link
Collaborator Author

Have addressed the comments about the test case.
As mentioned, I haven't renamed the type, but if you think it's better I am happy to rename things.

@mshockwave mshockwave self-requested a review November 9, 2024 01:13
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Personally I don't have any strong opinion on using CacheCostTy instead of InstructionCost.

@fhahn do you have any other comment?

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -31,7 +32,7 @@ class ScalarEvolution;
class SCEV;
class TargetTransformInfo;

using CacheCostTy = int64_t;
using CacheCostTy = InstructionCost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find it more inconvenient to hide the actual type behind another level of indirection, but changing it in this patch would unnecessarily increase the diff.

@sjoerdmeijer sjoerdmeijer merged commit 2e6deb1 into llvm:main Nov 14, 2024
8 checks passed
@sjoerdmeijer sjoerdmeijer deleted the interchange-zext-assert branch November 14, 2024 08:40
sjoerdmeijer added a commit to sjoerdmeijer/llvm-project that referenced this pull request Jan 29, 2025
This is a work in progress patch to enable loop-interchange by default
and is a continuation of the RFC:

https://discourse.llvm.org/t/enabling-loop-interchange/82589

Basically, we promised to fix any compile-time and correctness issues in
the different components involved here (loop-interchange and dependence
analaysis.) before discussing enabling interchange by default. We think
are close to complete this; I would like to explain where we are and
wanted to check if there are any thoughts or concerns. A quick overview
of the correctness and compile-time improvements that we have made include:

Correctness:
- [LoopInterchange] Remove 'S' Scalar Dependencies (llvm#119345)
- [LoopInterchange] Fix overflow in cost calculation (llvm#111807)
- [LoopInterchange] Handle LE and GE correctly (PR llvm#124901) @kasuga-fj
- [DA] disambiguate evolution of base addresses (llvm#116628)

Compile-times:
- [LoopInterchange] Constrain number of load/stores in a loop (llvm#118973)
- [LoopInterchange] Bail out early if minimum loop nest is not met (llvm#115128)
- [LoopInterchange] Hoist isComputableLoopNest() in the control flow (llvm#124247)

And in terms of remaining work, we think we are very close to fixing
these depenence analysis issues:
- [DA] do not handle array accesses of different offsets (llvm#123436)
- [DA] Dependence analysis does not handle array accesses of different sizes (llvm#116630)
- [DA] use NSW arithmetic llvm#116632

The compile-time increase with a geomean increase of 0.19% looks good
(after committing llvm#124247), I think:

  stage1-O3:
  Benchmark
  kimwitu++        +0.10%
  sqlite3          +0.14%
  consumer-typeset +0.07%
  Bullet           +0.06%
  tramp3d-v4       +0.21%
  mafft            +0.39%
  ClamAVi          +0.06%
  lencod           +0.61%
  SPASS            +0.17%
  7zip             +0.08%
  geomean          +0.19%

See also:
http://llvm-compile-time-tracker.com/compare.php?from=19a7fe03b4f58c4f73ea91d5e63bc4c6e61f987b&to=b24f1367d68ee675ea93ecda4939208c6b68ae4b&stat=instructions%3Au

We might want to look into lencod to see if we can improve more, but not
sure it is strictly necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LoopInterchange] Assertion `getActiveBits() <= 64 && "Too many bits for uint64_t"
7 participants