Skip to content

[llvm][ipsccp/sccp] Strengthen range analysis in SCCPSolver #111716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

grigorypas
Copy link
Contributor

Resolves #108906

Currently, when a PHI node is visited excessively, the corresponding ValueLatticeElement is marked as "overdefined". This pull request proposes a more refined approach that leverages additional information from the control flow to improve the precision of the analysis.

Key ideas are:

  1. GuaranteedBoundsPropagator: When a variable x passes through a conditional branch (e.g., x < 10), we can now infer that x will always be less than 10 in that branch and use that instead of "overdefined".

  2. MonotonicityTracker: If a variable x undergoes a series of transformations within a loop that are either always increasing or always decreasing, we can determine a lower or upper bound for x based on its initial value.

cc: @dcci, @nikic, @fhahn

Copy link

github-actions bot commented Oct 9, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2024

@llvm/pr-subscribers-function-specialization

@llvm/pr-subscribers-llvm-transforms

Author: Grigory Pastukhov (grigorypas)

Changes

Resolves #108906

Currently, when a PHI node is visited excessively, the corresponding ValueLatticeElement is marked as "overdefined". This pull request proposes a more refined approach that leverages additional information from the control flow to improve the precision of the analysis.

Key ideas are:

  1. GuaranteedBoundsPropagator: When a variable x passes through a conditional branch (e.g., x < 10), we can now infer that x will always be less than 10 in that branch and use that instead of "overdefined".

  2. MonotonicityTracker: If a variable x undergoes a series of transformations within a loop that are either always increasing or always decreasing, we can determine a lower or upper bound for x based on its initial value.

cc: @dcci, @nikic, @fhahn


Patch is 30.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111716.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SCCPSolver.cpp (+344-3)
  • (modified) llvm/test/Transforms/SCCP/conditions-ranges.ll (+2-2)
  • (added) llvm/test/Transforms/SCCP/loop-removal.ll (+207)
  • (modified) llvm/test/Transforms/SCCP/undef-resolve.ll (+3-10)
  • (modified) llvm/test/Transforms/SCCP/widening.ll (+2-2)
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 101d60525f4161..f76638a8c56fd6 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -383,6 +383,125 @@ void SCCPSolver::inferArgAttributes() const {
   }
 }
 
+// GuaranteedBoundsPropagator is a class that propagates
+// guaranteed bounds for values. Typically, a ValueLatticeElement
+// associated with a frequently visited Phi node is marked as "overdefined" to
+// prevent excessive iterations. However, GuaranteedBoundsPropagator enhances
+// this by propagating guaranteed bounds up to the Phi node, potentially
+// improving precision.
+// Consider a scenario where a variable 'x' is evaluated in a branch with the
+// condition 'x < 10'. In this case, we can confidently assert that 'x' will not
+// exceed 10. GuaranteedBoundsPropagator leverages this information by
+// propagating such guaranteed bounds up to the relevant Phi node. If all
+// incoming values to the Phi node have guaranteed bounds, the union of these
+// bounds will represent the guaranteed bounds for the Phi node itself. Once
+// these bounds are established for the Phi node, they can be propagated further
+// to other values that depend on this Phi node.
+// However, if not all incoming branches to the Phi node have been explored or
+// are active, the bounds for the Phi node cannot be fully guaranteed. In such
+// cases, the propagator may still apply the best available bounds to the Phi
+// node instead of marking it as overdefined. These bounds remain valid unless
+// new branches become active. If any active incoming branches lack guaranteed
+// bounds, the Phi node's state need to be adjusted to overdefined.
+class GuaranteedBoundsPropagator {
+  DenseMap<Value *, ConstantRange> GuaranteedBounds;
+
+public:
+  std::optional<ConstantRange> getGuaranteedBounds(Value *V) {
+    if (!V->getType()->isIntegerTy())
+      return {};
+    if (Constant *C = dyn_cast<Constant>(V))
+      return C->toConstantRange();
+    auto It = GuaranteedBounds.find(V);
+    if (It == GuaranteedBounds.end())
+      return {};
+    auto &Range = It->second;
+    if (Range.isFullSet())
+      return {};
+    return Range;
+  }
+
+  void insertOrUpdate(Value *V, const ConstantRange &CR) {
+    if (CR.isFullSet())
+      return;
+    auto It = GuaranteedBounds.find(V);
+    if (It == GuaranteedBounds.end()) {
+      GuaranteedBounds.insert({V, CR});
+    } else {
+      It->second = CR;
+    }
+  }
+
+  // If ImposedCR is not full set, then we update guaranteed bounds
+  // of OutputValue. If in addition InputValue has guaranteed bounds,
+  // we update the guaranteed bounds of OutputValue to be the intersection
+  // of the two.
+  void processConditionalBranch(Value *OutputValue, Value *InputValue,
+                                const ConstantRange &ImposedCR);
+
+  // Updates the guaranteed bounds of the corresponding value if the
+  // operands have guaranteed bounds.
+  void processBinaryOp(Instruction *I);
+
+  // If guaranteed bounds from all incoming edges are known, the union of all
+  // of the bounds is returned. The flag \p IsBoundGuaranteed is set to true.
+  // If all the incoming edges were not explored yet, but the ones that were
+  // all have guaranteed bounds, the union of the bounds is returned and the
+  // flag \p IsBoundGuaranteed is set to false. If some of the incoming edges
+  // do not have guaranteed bounds (or we failed to calculate union),
+  // the function returns std::nullopt.
+  std::optional<ConstantRange> processPhiNode(
+      PHINode *PN,
+      const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches);
+};
+
+// The MonotonicityTracker class evaluates whether a variable x, which is
+// subject to various functional transformations within a loop, maintains a
+// consistent pattern of monotonicity. Monotonicity, in this context, refers
+// to the property of a sequence where its elements consistently increase
+// or decrease. When a variable x enters a loop and is transformed by a series
+// of operations before it merges back at a PHI node, this class checks if all
+// transformations applied to x preserve the same type of monotonicity
+// (either increasing or decreasing). If the monotonicity is consistent and
+// increasing, for instance, it can be inferred that the value of x at the end
+// of the loop will always be greater than its initial value at the start
+// of the loop.
+class MonotonicityTracker {
+public:
+  enum class Monotonicity {
+    UNKNOWN,
+    CONSTANT,
+    INCREASING,
+    DECREASING,
+  };
+
+  Monotonicity getMonotonicity(Value *V) const {
+    if (isa<Constant>(V))
+      return Monotonicity::CONSTANT;
+    auto It = MonotonicityMap.find(V);
+    if (It == MonotonicityMap.end())
+      return Monotonicity::UNKNOWN;
+    return It->second;
+  }
+
+  void processSSACopyIntrinsic(CallBase &CB) {
+    Value *Op = CB.getOperand(0);
+    Monotonicity OpMonotonicity = getMonotonicity(Op);
+    if (OpMonotonicity == Monotonicity::UNKNOWN)
+      return;
+    MonotonicityMap[&CB] = OpMonotonicity;
+  }
+
+  void processPhiNode(
+      PHINode &PN,
+      const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches);
+
+  void processBinaryOp(Instruction &I);
+
+private:
+  DenseMap<Value *, Monotonicity> MonotonicityMap;
+};
+
 /// Helper class for SCCPSolver. This implements the instruction visitor and
 /// holds all the state.
 class SCCPInstVisitor : public InstVisitor<SCCPInstVisitor> {
@@ -450,6 +569,10 @@ class SCCPInstVisitor : public InstVisitor<SCCPInstVisitor> {
 
   DenseMap<Value *, SmallSetVector<User *, 2>> AdditionalUsers;
 
+  GuaranteedBoundsPropagator BoundsPropagator;
+
+  MonotonicityTracker MonotonicityTrackerObj;
+
   LLVMContext &Ctx;
 
 private:
@@ -915,6 +1038,11 @@ class SCCPInstVisitor : public InstVisitor<SCCPInstVisitor> {
     }
     Invalidated.clear();
   }
+
+  bool mergeInWithBoundsOverrideForPhi(
+      PHINode &PN,
+      const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches,
+      ValueLatticeElement &PhiState);
 };
 
 } // namespace llvm
@@ -1223,6 +1351,65 @@ bool SCCPInstVisitor::isEdgeFeasible(BasicBlock *From, BasicBlock *To) const {
   return KnownFeasibleEdges.count(Edge(From, To));
 }
 
+bool SCCPInstVisitor::mergeInWithBoundsOverrideForPhi(
+    PHINode &PN,
+    const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches,
+    ValueLatticeElement &PhiState) {
+  if (!PhiState.isConstantRange(false))
+    return false;
+  auto &OldState = getValueState(&PN);
+  if (!OldState.isConstantRange(false))
+    return false;
+  ConstantRange OldStateRange = OldState.getConstantRange();
+  ConstantRange NewStateRange = PhiState.getConstantRange();
+  unsigned NumActiveIncoming = IncomingValuesFromActiveBranches.size();
+  if (OldStateRange == NewStateRange ||
+      OldState.getNumRangeExtensions() <= NumActiveIncoming)
+    return false;
+
+  // At this point we visited Phi node too many times and need
+  // to extend the range
+  MonotonicityTracker::Monotonicity Monotonicity =
+      MonotonicityTrackerObj.getMonotonicity(&PN);
+
+  // Monotonicity cannot be CONSTANT, otherwise the range should not
+  // have changed
+  if (Monotonicity == MonotonicityTracker::Monotonicity::INCREASING &&
+      OldStateRange.getUpper() == NewStateRange.getUpper()) {
+    mergeInValue(&PN, PhiState);
+    return true;
+  }
+
+  if (Monotonicity == MonotonicityTracker::Monotonicity::DECREASING &&
+      OldStateRange.getLower() == NewStateRange.getLower()) {
+    mergeInValue(&PN, PhiState);
+    return true;
+  }
+
+  auto OptionalBestBounds =
+      BoundsPropagator.processPhiNode(&PN, IncomingValuesFromActiveBranches);
+  if (!OptionalBestBounds)
+    return false;
+  if (Monotonicity == MonotonicityTracker::Monotonicity::INCREASING) {
+    mergeInValue(
+        &PN, ValueLatticeElement::getRange(ConstantRange(
+                 NewStateRange.getLower(), OptionalBestBounds->getUpper())));
+    return true;
+  }
+
+  if (Monotonicity == MonotonicityTracker::Monotonicity::DECREASING &&
+      OptionalBestBounds) {
+    mergeInValue(
+        &PN, ValueLatticeElement::getRange(ConstantRange(
+                 OptionalBestBounds->getLower(), NewStateRange.getUpper())));
+    return true;
+  }
+
+  PhiState = ValueLatticeElement::getRange(*OptionalBestBounds);
+  mergeInValue(&PN, PhiState);
+  return true;
+}
+
 // visit Implementations - Something changed in this instruction, either an
 // operand made a transition, or the instruction is newly executable.  Change
 // the value type of I to reflect these changes if appropriate.  This method
@@ -1255,6 +1442,7 @@ void SCCPInstVisitor::visitPHINode(PHINode &PN) {
     return (void)markOverdefined(&PN);
 
   unsigned NumActiveIncoming = 0;
+  SmallVector<Value *, 8> IncomingValuesFromActiveBranches;
 
   // Look at all of the executable operands of the PHI node.  If any of them
   // are overdefined, the PHI becomes overdefined as well.  If they are all
@@ -1265,7 +1453,7 @@ void SCCPInstVisitor::visitPHINode(PHINode &PN) {
   for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i) {
     if (!isEdgeFeasible(PN.getIncomingBlock(i), PN.getParent()))
       continue;
-
+    IncomingValuesFromActiveBranches.push_back(PN.getIncomingValue(i));
     ValueLatticeElement IV = getValueState(PN.getIncomingValue(i));
     PhiState.mergeIn(IV);
     NumActiveIncoming++;
@@ -1273,6 +1461,15 @@ void SCCPInstVisitor::visitPHINode(PHINode &PN) {
       break;
   }
 
+  MonotonicityTrackerObj.processPhiNode(PN, IncomingValuesFromActiveBranches);
+
+  // If we have visited this PHI node too many times, we first check
+  // if there is a known bounds we could use. If not, the state will
+  // be maked as overdefined.
+  if (mergeInWithBoundsOverrideForPhi(PN, IncomingValuesFromActiveBranches,
+                                      PhiState))
+    return;
+
   // We allow up to 1 range extension per active incoming value and one
   // additional extension. Note that we manually adjust the number of range
   // extensions to match the number of active incoming values. This helps to
@@ -1280,7 +1477,7 @@ void SCCPInstVisitor::visitPHINode(PHINode &PN) {
   // incoming values are equal.
   mergeInValue(&PN, PhiState,
                ValueLatticeElement::MergeOptions().setMaxWidenSteps(
-                   NumActiveIncoming + 1));
+                   NumActiveIncoming + 2));
   ValueLatticeElement &PhiStateRef = getValueState(&PN);
   PhiStateRef.setNumRangeExtensions(
       std::max(NumActiveIncoming, PhiStateRef.getNumRangeExtensions()));
@@ -1546,6 +1743,8 @@ void SCCPInstVisitor::visitBinaryOperator(Instruction &I) {
   if (V1State.isOverdefined() && V2State.isOverdefined())
     return (void)markOverdefined(&I);
 
+  MonotonicityTrackerObj.processBinaryOp(I);
+
   // If either of the operands is a constant, try to fold it to a constant.
   // TODO: Use information from notconstant better.
   if ((V1State.isConstant() || V2State.isConstant())) {
@@ -1585,6 +1784,8 @@ void SCCPInstVisitor::visitBinaryOperator(Instruction &I) {
     R = A.overflowingBinaryOp(BO->getOpcode(), B, OBO->getNoWrapKind());
   else
     R = A.binaryOp(BO->getOpcode(), B);
+
+  BoundsPropagator.processBinaryOp(&I);
   mergeInValue(&I, ValueLatticeElement::getRange(R));
 
   // TODO: Currently we do not exploit special values that produce something
@@ -1849,6 +2050,7 @@ void SCCPInstVisitor::handleCallResult(CallBase &CB) {
 
   if (auto *II = dyn_cast<IntrinsicInst>(&CB)) {
     if (II->getIntrinsicID() == Intrinsic::ssa_copy) {
+      MonotonicityTrackerObj.processSSACopyIntrinsic(CB);
       if (ValueState[&CB].isOverdefined())
         return;
 
@@ -1880,9 +2082,12 @@ void SCCPInstVisitor::handleCallResult(CallBase &CB) {
             ConstantRange::getFull(DL.getTypeSizeInBits(CopyOf->getType()));
 
         // Get the range imposed by the condition.
-        if (CondVal.isConstantRange())
+        if (CondVal.isConstantRange()) {
           ImposedCR = ConstantRange::makeAllowedICmpRegion(
               Pred, CondVal.getConstantRange());
+          if (BoundsPropagator.getGuaranteedBounds(OtherOp))
+            BoundsPropagator.processConditionalBranch(&CB, CopyOf, ImposedCR);
+        }
 
         // Combine range info for the original value with the new range from the
         // condition.
@@ -2252,3 +2457,139 @@ void SCCPSolver::markFunctionUnreachable(Function *F) {
 void SCCPSolver::visit(Instruction *I) { Visitor->visit(I); }
 
 void SCCPSolver::visitCall(CallInst &I) { Visitor->visitCall(I); }
+
+void GuaranteedBoundsPropagator::processConditionalBranch(
+    Value *OutputValue, Value *InputValue, const ConstantRange &ImposedCR) {
+  auto OptionalInputValueBounds = getGuaranteedBounds(InputValue);
+  if (OptionalInputValueBounds)
+    insertOrUpdate(OutputValue,
+                   ImposedCR.intersectWith(*OptionalInputValueBounds));
+  else
+    insertOrUpdate(OutputValue, ImposedCR);
+}
+
+void GuaranteedBoundsPropagator::processBinaryOp(Instruction *I) {
+  auto *BO = cast<BinaryOperator>(I);
+  assert(BO && "Expected binary op");
+  auto OptionalLHSBounds = getGuaranteedBounds(BO->getOperand(0));
+  auto OptionalRHSBounds = getGuaranteedBounds(BO->getOperand(1));
+  if (!OptionalLHSBounds || !OptionalRHSBounds)
+    return;
+  ConstantRange R =
+      ConstantRange::getEmpty(I->getType()->getScalarSizeInBits());
+  if (auto *OBO = dyn_cast<OverflowingBinaryOperator>(BO))
+    R = OptionalLHSBounds->overflowingBinaryOp(
+        BO->getOpcode(), *OptionalRHSBounds, OBO->getNoWrapKind());
+  else
+    R = OptionalLHSBounds->binaryOp(BO->getOpcode(), *OptionalRHSBounds);
+  insertOrUpdate(I, R);
+}
+
+std::optional<ConstantRange> GuaranteedBoundsPropagator::processPhiNode(
+    PHINode *PN,
+    const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches) {
+  auto OptionalExistingBounds = getGuaranteedBounds(PN);
+  if (OptionalExistingBounds)
+    return *OptionalExistingBounds;
+
+  ConstantRange R =
+      ConstantRange::getEmpty(PN->getType()->getScalarSizeInBits());
+  for (Value *IncomingValue : IncomingValuesFromActiveBranches) {
+    auto OptionalIncomingBounds = getGuaranteedBounds(IncomingValue);
+    if (!OptionalIncomingBounds)
+      return {};
+    // TODO: Handle disjoint ranges in the future, if needed.
+    auto OptionalUnion = R.exactUnionWith(*OptionalIncomingBounds);
+    if (!OptionalUnion)
+      return {};
+    R = *OptionalUnion;
+  }
+  if (PN->getNumIncomingValues() == IncomingValuesFromActiveBranches.size())
+    insertOrUpdate(PN, R);
+  return R;
+}
+
+void MonotonicityTracker::processPhiNode(
+    PHINode &PN,
+    const SmallVector<Value *, 8> &IncomingValuesFromActiveBranches) {
+  Monotonicity MI = Monotonicity::UNKNOWN;
+  // If all incoming values have the same monotonicity, then the phi
+  // node will be assinged the same monotonicity as well. However, if
+  // there are different monotonicities or unknown for some incoming values,
+  // then the phi node will be assigned UNKNOWN.
+  for (Value *IncomingValue : IncomingValuesFromActiveBranches) {
+    auto IncomingMonotonicity = getMonotonicity(IncomingValue);
+    if (IncomingMonotonicity == Monotonicity::UNKNOWN) {
+      MI = IncomingMonotonicity;
+      break;
+    }
+    if (IncomingMonotonicity == Monotonicity::CONSTANT &&
+        MI != Monotonicity::UNKNOWN)
+      continue;
+    if (MI != IncomingMonotonicity && MI != Monotonicity::UNKNOWN &&
+        MI != Monotonicity::CONSTANT) {
+      MI = Monotonicity::UNKNOWN;
+      break;
+    }
+    if (MI == Monotonicity::UNKNOWN || MI == Monotonicity::CONSTANT)
+      MI = IncomingMonotonicity;
+  }
+  MonotonicityMap[&PN] = MI;
+}
+
+static bool increasingOrConst(MonotonicityTracker::Monotonicity M) {
+  return M == MonotonicityTracker::Monotonicity::INCREASING ||
+         M == MonotonicityTracker::Monotonicity::CONSTANT;
+}
+
+static bool decreasingOrConst(MonotonicityTracker::Monotonicity M) {
+  return M == MonotonicityTracker::Monotonicity::DECREASING ||
+         M == MonotonicityTracker::Monotonicity::CONSTANT;
+}
+
+void MonotonicityTracker::processBinaryOp(Instruction &I) {
+  auto *BO = dyn_cast<OverflowingBinaryOperator>(&I);
+  if (!I.getType()->isIntegerTy())
+    return;
+  if (!BO || !(BO->hasNoSignedWrap() || BO->hasNoUnsignedWrap()))
+    return;
+  if (isa<Constant>(BO->getOperand(0)) && isa<Constant>(BO->getOperand(1))) {
+    MonotonicityMap[&I] = Monotonicity::CONSTANT;
+    return;
+  }
+  if (!isa<Constant>(BO->getOperand(0)) && !isa<Constant>(BO->getOperand(1))) {
+    MonotonicityMap[&I] = Monotonicity::UNKNOWN;
+    return;
+  }
+  // Here one operand must be a constant and the other must be a variable.
+  ConstantInt *C;
+  Value *NonConstOp;
+  if (isa<Constant>(BO->getOperand(0))) {
+    C = cast<ConstantInt>(BO->getOperand(0));
+    NonConstOp = BO->getOperand(1);
+  } else {
+    C = cast<ConstantInt>(BO->getOperand(1));
+    NonConstOp = BO->getOperand(0);
+  }
+  Monotonicity NonConstOpMonotonicity = getMonotonicity(NonConstOp);
+  if (BO->getOpcode() == Instruction::Add) {
+    if (increasingOrConst(NonConstOpMonotonicity) && !C->isNegative()) {
+      MonotonicityMap[&I] = Monotonicity::INCREASING;
+      return;
+    } else if (decreasingOrConst(NonConstOpMonotonicity) && C->isNegative()) {
+      MonotonicityMap[&I] = Monotonicity::DECREASING;
+      return;
+    }
+  }
+  // Handle subtraction only if it is in the form of "x - constant".
+  if (BO->getOpcode() == Instruction::Sub &&
+      !isa<Constant>(BO->getOperand(0))) {
+    if (decreasingOrConst(NonConstOpMonotonicity) && !C->isNegative()) {
+      MonotonicityMap[&I] = Monotonicity::DECREASING;
+      return;
+    } else if (increasingOrConst(NonConstOpMonotonicity) && C->isNegative()) {
+      MonotonicityMap[&I] = Monotonicity::INCREASING;
+      return;
+    }
+  }
+}
diff --git a/llvm/test/Transforms/SCCP/conditions-ranges.ll b/llvm/test/Transforms/SCCP/conditions-ranges.ll
index bb3764160f7246..829598c7b14f61 100644
--- a/llvm/test/Transforms/SCCP/conditions-ranges.ll
+++ b/llvm/test/Transforms/SCCP/conditions-ranges.ll
@@ -675,7 +675,7 @@ define void @loop_1() {
 ; CHECK:       for.cond11:
 ; CHECK-NEXT:    br label [[FOR_COND_CLEANUP13]]
 ; CHECK:       for.cond.cleanup13:
-; CHECK-NEXT:    [[INC27]] = add nsw i32 [[I_0]], 1
+; CHECK-NEXT:    [[INC27]] = add nuw nsw i32 [[I_0]], 1
 ; CHECK-NEXT:    br label [[FOR_COND]]
 ;
 entry:
@@ -718,7 +718,7 @@ define void @loop() {
 ; CHECK-NEXT:    [[CMP12:%.*]] = icmp slt i32 [[J_0]], 2
 ; CHECK-NEXT:    br i1 [[CMP12]], label [[FOR_BODY14]], label [[FOR_COND_CLEANUP13]]
 ; CHECK:       for.cond.cleanup13:
-; CHECK-NEXT:    [[INC27]] = add nsw i32 [[I_0]], 1
+; CHECK-NEXT:    [[INC27]] = add nuw nsw i32 [[I_0]], 1
 ; CHECK-NEXT:    br label [[FOR_COND]]
 ; CHECK:       for.body14:
 ; CHECK-NEXT:    [[INC]] = add nuw nsw i32 [[J_0]], 1
diff --git a/llvm/test/Transforms/SCCP/loop-removal.ll b/llvm/test/Transforms/SCCP/loop-removal.ll
new file mode 100644
index 00000000000000..5a933d22742c9f
--- /dev/null
+++ b/llvm/test/Transforms/SCCP/loop-removal.ll
@@ -0,0 +1,207 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=ipsccp -S | FileCheck %s
+
+define i32 @foo() {
+; CHECK-LABEL: @foo(
+; CHECK-NEXT:  init:
+; CHECK-NEXT:    br label %[[OUTER_LOOP_CONTROL:.*]]
+; CHECK:       outer.loop.control:
+; CHECK-NEXT:    [[X_0:%.*]] = phi i32 [ 0, [[INIT:%.*]] ], [ [[X_OUTER:%.*]], [[OUTER_LOOP_INC:%.*]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp slt i32 [[X_0]], 10
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[INNER_LOOP_CONTROL:.*]], label %[[EXIT:.*]]
+; CHECK:       inner.loop.control:
+; CHECK-NEXT:    br label [[OUTER_LOOP_INC]]
+; CHECK:       outer.loop.inc:
+; CHECK-NEXT:    [[X_OUTER]] = add nuw nsw i32 [[X_0]], 2
+; CHECK-NEXT:    br label %[[OUTER_LOOP_CONTROL]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i32 [[X_0]]
+;
+init:
+  br label %outer.loop.control
+
+outer.loop.control:                                                ; preds = %init, %outer.loop.inc
+  %x.0 = phi i32 [ 0, %init ], [ %x.outer, %outer.loop.inc ]
+  %0 = icmp slt i32 %x.0, 10
+  br i1 %0, label %inner.loop.control, label %exit
+
+inner.loop.control:                                                ; preds = %outer.loop.control, %inn...
[truncated]

@dcci dcci requested a review from nikic October 9, 2024 17:14
@nikic
Copy link
Contributor

nikic commented Oct 9, 2024

Can you please drop the last commit from this PR? It will be easier to review if it doesn't mix two features.

@dtcxzyw dtcxzyw requested review from fhahn and dcci October 9, 2024 23:52

inner.loop.control: ; preds = %outer.loop.control, %inner.loop.body
%x.1 = phi i32 [ %x.0, %outer.loop.control ], [ %x.inner, %inner.loop.body ]
%1 = icmp sgt i32 %x.1, 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should eliminate this comparison in ConstraintElimination.

The constraint system

%x.0 s< 10
%x.1 s<= %x.0 (%x.1 is monotonically decreasing)
%x.1 s> 20

is unsatisfiable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. I was under the impression that ConstraintElimination pass cannot handle loops well. I will take a closer look at ConstraintElimination pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstraintElimination analyses loops and induction variables with SCEV. Related code:

void State::addInfoForInductions(BasicBlock &BB) {
auto *L = LI.getLoopFor(&BB);
if (!L || L->getHeader() != &BB)
return;
Value *A;
Value *B;
CmpInst::Predicate Pred;
if (!match(BB.getTerminator(),
m_Br(m_ICmp(Pred, m_Value(A), m_Value(B)), m_Value(), m_Value())))
return;
PHINode *PN = dyn_cast<PHINode>(A);
if (!PN) {
Pred = CmpInst::getSwappedPredicate(Pred);
std::swap(A, B);
PN = dyn_cast<PHINode>(A);
}
if (!PN || PN->getParent() != &BB || PN->getNumIncomingValues() != 2 ||
!SE.isSCEVable(PN->getType()))
return;
BasicBlock *InLoopSucc = nullptr;
if (Pred == CmpInst::ICMP_NE)
InLoopSucc = cast<BranchInst>(BB.getTerminator())->getSuccessor(0);
else if (Pred == CmpInst::ICMP_EQ)
InLoopSucc = cast<BranchInst>(BB.getTerminator())->getSuccessor(1);
else
return;
if (!L->contains(InLoopSucc) || !L->isLoopExiting(&BB) || InLoopSucc == &BB)
return;
auto *AR = dyn_cast_or_null<SCEVAddRecExpr>(SE.getSCEV(PN));
BasicBlock *LoopPred = L->getLoopPredecessor();
if (!AR || AR->getLoop() != L || !LoopPred)
return;
const SCEV *StartSCEV = AR->getStart();
Value *StartValue = nullptr;
if (auto *C = dyn_cast<SCEVConstant>(StartSCEV)) {
StartValue = C->getValue();
} else {
StartValue = PN->getIncomingValueForBlock(LoopPred);
assert(SE.getSCEV(StartValue) == StartSCEV && "inconsistent start value");
}
DomTreeNode *DTN = DT.getNode(InLoopSucc);
auto IncUnsigned = SE.getMonotonicPredicateType(AR, CmpInst::ICMP_UGT);
auto IncSigned = SE.getMonotonicPredicateType(AR, CmpInst::ICMP_SGT);
bool MonotonicallyIncreasingUnsigned =
IncUnsigned && *IncUnsigned == ScalarEvolution::MonotonicallyIncreasing;
bool MonotonicallyIncreasingSigned =
IncSigned && *IncSigned == ScalarEvolution::MonotonicallyIncreasing;
// If SCEV guarantees that AR does not wrap, PN >= StartValue can be added
// unconditionally.
if (MonotonicallyIncreasingUnsigned)
WorkList.push_back(
FactOrCheck::getConditionFact(DTN, CmpInst::ICMP_UGE, PN, StartValue));
if (MonotonicallyIncreasingSigned)
WorkList.push_back(
FactOrCheck::getConditionFact(DTN, CmpInst::ICMP_SGE, PN, StartValue));
APInt StepOffset;
if (auto *C = dyn_cast<SCEVConstant>(AR->getStepRecurrence(SE)))
StepOffset = C->getAPInt();
else
return;
// Make sure the bound B is loop-invariant.
if (!L->isLoopInvariant(B))
return;
// Handle negative steps.
if (StepOffset.isNegative()) {
// TODO: Extend to allow steps > -1.
if (!(-StepOffset).isOne())
return;
// AR may wrap.
// Add StartValue >= PN conditional on B <= StartValue which guarantees that
// the loop exits before wrapping with a step of -1.
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_UGE, StartValue, PN,
ConditionTy(CmpInst::ICMP_ULE, B, StartValue)));
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_SGE, StartValue, PN,
ConditionTy(CmpInst::ICMP_SLE, B, StartValue)));
// Add PN > B conditional on B <= StartValue which guarantees that the loop
// exits when reaching B with a step of -1.
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_UGT, PN, B,
ConditionTy(CmpInst::ICMP_ULE, B, StartValue)));
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_SGT, PN, B,
ConditionTy(CmpInst::ICMP_SLE, B, StartValue)));
return;
}
// Make sure AR either steps by 1 or that the value we compare against is a
// GEP based on the same start value and all offsets are a multiple of the
// step size, to guarantee that the induction will reach the value.
if (StepOffset.isZero() || StepOffset.isNegative())
return;
if (!StepOffset.isOne()) {
// Check whether B-Start is known to be a multiple of StepOffset.
const SCEV *BMinusStart = SE.getMinusSCEV(SE.getSCEV(B), StartSCEV);
if (isa<SCEVCouldNotCompute>(BMinusStart) ||
!SE.getConstantMultiple(BMinusStart).urem(StepOffset).isZero())
return;
}
// AR may wrap. Add PN >= StartValue conditional on StartValue <= B which
// guarantees that the loop exits before wrapping in combination with the
// restrictions on B and the step above.
if (!MonotonicallyIncreasingUnsigned)
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_UGE, PN, StartValue,
ConditionTy(CmpInst::ICMP_ULE, StartValue, B)));
if (!MonotonicallyIncreasingSigned)
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_SGE, PN, StartValue,
ConditionTy(CmpInst::ICMP_SLE, StartValue, B)));
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_ULT, PN, B,
ConditionTy(CmpInst::ICMP_ULE, StartValue, B)));
WorkList.push_back(FactOrCheck::getConditionFact(
DTN, CmpInst::ICMP_SLT, PN, B,
ConditionTy(CmpInst::ICMP_SLE, StartValue, B)));
// Try to add condition from header to the exit blocks. When exiting either
// with EQ or NE in the header, we know that the induction value must be u<=
// B, as other exits may only exit earlier.
assert(!StepOffset.isNegative() && "induction must be increasing");
assert((Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
"unsupported predicate");
ConditionTy Precond = {CmpInst::ICMP_ULE, StartValue, B};
SmallVector<BasicBlock *> ExitBBs;
L->getExitBlocks(ExitBBs);
for (BasicBlock *EB : ExitBBs) {
WorkList.emplace_back(FactOrCheck::getConditionFact(
DT.getNode(EB), CmpInst::ICMP_ULE, A, B, Precond));
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But in the original issue (#108906) nested while loops are not SCEVable. Do you think that the code below should be handled by ConstraintElimination? My approach does remove the dead loop.

long patatino() {
    long x = 0;
    for (int i = 0; i < 5; ++i) {
        while (x < 10) {
            if (x % 2 == 0) {
                x += 2;
            } else {
                x += 1;
            }
            // Dead while loop
            while ((x > 20) && (i % 3 == 0) && (x % 5 == 0)) {
                x -= 5;
            }
        }
    }
    return x;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't want to pay for the complexity to handle random-generated cases :(
cc @nikic for more guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and appreciate your comment -- I think, if anything, the testcase shows a hole/gap in the range analysis infra in LLVM. I don't have a strong opinion on how we should fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in the original issue (#108906) nested while loops are not SCEVable.

@grigorypas IIRC ConstraintElimination only handles monotonically increasing indvars. It would be easy to support monotonically decreasing cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #108906, I agree that we should not try to optimize such cases. If I ever find the time, I'll write a new section for the developer policy that specifies requirements for fuzzer-generated issue.

As for the change itself, I think the idea behind it is fairly reasonable, and I expect that it will help more realistic cases as well. Unfortunately, we're currently pretty bad with bounds checks elimination in loops for a combination of different reasons (SCEV being glacially slow with context sensitive facts, LVI going to overdefined in cycles, ConstraintElimination only having very basic loop reasoning, etc).

I see some code size changes from this patch: http://llvm-compile-time-tracker.com/compare.php?from=eaea5f6f952b6059cebfe87ea9800a3a6516f9ed&to=6358352b3b4a9912e4dd312fad057b02833a7faa&stat=size-text And also significant compile-time regressions, so we'll have to see if we can make it cheap enough: http://llvm-compile-time-tracker.com/compare.php?from=eaea5f6f952b6059cebfe87ea9800a3a6516f9ed&to=6358352b3b4a9912e4dd312fad057b02833a7faa&stat=instructions%3Au I haven't had time to look at the implementation yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #108906, I agree that we should not try to optimize such cases. If I ever find the time, I'll write a new section for the developer policy that specifies requirements for fuzzer-generated issue.

While I agree that LLVM shouldn't try to optimize crazy cases, I still do believe the code in #108906 is not that unreasonable to optimize. I found out several instances in real codebases where a line of dead code was preventing other optimizations to kick in, causing code bloat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on a more general note, I think a range analysis infra shouldn't necessarily be concerned with how the code looks like as input, as long as it has reasonable capabilities to reason about equalities and inequalities. I looked at some point and found out the limitation you point out (for LVI/CE), and I wonder if instead of fixing these problems one by one we might want to rethink the bigger picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization for a loop with dead code
5 participants