Skip to content

[-Wcompletion-handler] Fix a non-termination issue #78380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 26, 2024

Conversation

ziqingluo-90
Copy link
Contributor

The analysis goes to infinity on the example below:

// Suppose f is a callable that should be called exactly once:
do {
    escape(f);    
    f(); 
} while (cond);

A simplified CFG will look like this:

B0:  loop entry -> B1
B1:  loop body -> B0, B2
B2:  exit

The analysis goes backwards so it starts from B2 with output state NotCalled. Then B1's output state is DefinitelyCalled (Note that escape is visited but it only overrides error states). Next, B0 outputs DefinitelyCalled as well. Then B1 is visited again: the input is the join over DefinitelyCalled and NotCalled resulting in MaybeCalled; it gets updated to Reported at f() and Escape later at escape(f). So B0 now outputs Escape as well. Then again, the input of B1 is the join over Escape and NotCalled resulting in NotCalled, and so on to the infinity.

The key issue is that the join operator defines a partial order over the states, in which Escape < any error states. Meanwhile, the analyzer allows Escape to override error states during a CFG Block visit. Therefore, the output state of a CFG block can change from a "greater" state to a "smaller" state (i.e., it does not change monotonically).

My humble understand of why to design the states this way is that if Escape happens on a single path, we do not report anything on that path for being less noisy while Escape on one path should not prevent the analyzer from warning the other paths.
To preserve such an idea, I propose to use sets of states as CFG block outputs. The set is finite and grows in one direction, so it guarantees termination. For obtaining an input from such an output, a single state can be "extracted" from a state set. The extraction is basically a join over the set and it will let Escape to override error states.

The Called-Once dataflow analysis could never terminate as a
consequence of non-monotonic update on states.  States of kind Escape
can override states leading to non-monotonic update.

The fix uses finite state sets instead of single states to represent
CFG block outputs, which grows in uni-direction.  To preserve the
behavior of the analyzer, a single state can be extracted from a state
set. The extraction follows the idea that Escape can override error
states within one block but obeys to the partial-order for join.

rdar://119671856
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis labels Jan 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-analysis

Author: Ziqing Luo (ziqingluo-90)

Changes

The analysis goes to infinity on the example below:

// Suppose f is a callable that should be called exactly once:
do {
    escape(f);    
    f(); 
} while (cond);

A simplified CFG will look like this:

B0:  loop entry -&gt; B1
B1:  loop body -&gt; B0, B2
B2:  exit

The analysis goes backwards so it starts from B2 with output state NotCalled. Then B1's output state is DefinitelyCalled (Note that escape is visited but it only overrides error states). Next, B0 outputs DefinitelyCalled as well. Then B1 is visited again: the input is the join over DefinitelyCalled and NotCalled resulting in MaybeCalled; it gets updated to Reported at f() and Escape later at escape(f). So B0 now outputs Escape as well. Then again, the input of B1 is the join over Escape and NotCalled resulting in NotCalled, and so on to the infinity.

The key issue is that the join operator defines a partial order over the states, in which Escape &lt; any error states. Meanwhile, the analyzer allows Escape to override error states during a CFG Block visit. Therefore, the output state of a CFG block can change from a "greater" state to a "smaller" state (i.e., it does not change monotonically).

My humble understand of why to design the states this way is that if Escape happens on a single path, we do not report anything on that path for being less noisy while Escape on one path should not prevent the analyzer from warning the other paths.
To preserve such an idea, I propose to use sets of states as CFG block outputs. The set is finite and grows in one direction, so it guarantees termination. For obtaining an input from such an output, a single state can be "extracted" from a state set. The extraction is basically a join over the set and it will let Escape to override error states.


Full diff: https://github.com/llvm/llvm-project/pull/78380.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/CalledOnceCheck.cpp (+110-10)
  • (modified) clang/test/SemaObjC/warn-called-once.m (+10)
diff --git a/clang/lib/Analysis/CalledOnceCheck.cpp b/clang/lib/Analysis/CalledOnceCheck.cpp
index 04c5f6aa9c7450..3fab93b1d09cdf 100644
--- a/clang/lib/Analysis/CalledOnceCheck.cpp
+++ b/clang/lib/Analysis/CalledOnceCheck.cpp
@@ -163,7 +163,7 @@ class ParameterStatus {
     NotVisited = 0x8, /* 1000 */
     // We already reported a violation and stopped tracking calls for this
     // parameter.
-    Reported = 0x15, /* 1111 */
+    Reported = 0xF, /* 1111 */
     LLVM_MARK_AS_BITMASK_ENUM(/* LargestValue = */ Reported)
   };
 
@@ -215,6 +215,55 @@ class ParameterStatus {
   const Expr *Call = nullptr;
 };
 
+// Represents a set of `ParameterStatus`s collected in a single CFGBlock.
+class ParamStatusSet {
+private:
+  // The kinds of `States` spans from 0x0 to 0x15, so 16 bits are enough:
+  llvm::BitVector Set{16, false};
+  // Following the exisitng idea: if there are multiple calls of interest in one
+  // block, recording one of them is good enough.
+  const Expr *Call = nullptr;
+
+public:
+  ParamStatusSet() {}
+
+  // Add a new `ParameterStatus` to the set.  Returns true iff the adding status
+  // was new to the set.
+  bool add(ParameterStatus PS) {
+    assert(PS.getKind() != ParameterStatus::Kind::NotVisited &&
+           "the status cannot be NotVisited after visiting a block");
+    if (Set.test(PS.getKind()))
+      return false;
+    Set.set(PS.getKind());
+    if (PS.seenAnyCalls())
+      Call = &PS.getCall();
+    return true;
+  }
+
+  // Get one `ParameterStatus` to represent the set of them.  The idea is to
+  // take a join on them but let ESCAPE overrides error statuses.
+  ParameterStatus summaryStatus() {
+    unsigned Summary = 0x0;
+
+    for (unsigned Idx :
+         llvm::make_range(Set.set_bits_begin(), Set.set_bits_end()))
+      Summary |= Idx;
+    assert(Summary == 0x0 || Summary == 0x1 || Summary == 0x3 ||
+           Summary == 0x5 || Summary == 0x7 ||
+           Summary == 0xF && "expecting a defined value");
+
+    ParameterStatus::Kind SummaryKind = ParameterStatus::Kind(Summary);
+
+    if (SummaryKind > ParameterStatus::Kind::NON_ERROR_STATUS &&
+        Set.test(ParameterStatus::Kind::Escaped)) {
+      SummaryKind = ParameterStatus::Kind::Escaped;
+    }
+    if (ParameterStatus::seenAnyCalls(SummaryKind))
+      return {SummaryKind, Call};
+    return {SummaryKind};
+  }
+};
+
 /// State aggregates statuses of all tracked parameters.
 class State {
 public:
@@ -274,6 +323,53 @@ class State {
 
 private:
   ParamSizedVector<ParameterStatus> ParamData;
+
+  friend class CFGBlockOutputState;
+
+  State(ParamSizedVector<ParameterStatus> ParamData) : ParamData(ParamData) {}
+
+  unsigned size() const { return ParamData.size(); }
+};
+
+// A different kind of "state" in addition to `State`.  `CFGBlockOutputState`
+// are created for dealing with the non-termination issue due to `State`s are
+// not being updated monotonically at the output of each CFGBlock.
+// A `CFGBlockOutputState` is in fact a finite set of `State`s that
+// grows monotonically.  One can extract a `State` from a `CFGBlockOutputState`.
+// Note that the `State`-extraction  does NOT guarantee monotone but it
+// respects the existing semantics.  Specifically, ESCAPE is "greater than"
+// other error states in a single path but is "less than" them at JOIN.
+//
+// `CFGBlockOutputState` will be used to terminate the fix-point computation.
+class CFGBlockOutputState {
+private:
+  ParamSizedVector<ParamStatusSet> StatusSets;
+
+public:
+  CFGBlockOutputState(unsigned Size) : StatusSets{Size} {};
+
+  // Update this `CFGBlockOutputState` with a newly computed `State`.  Return
+  // true iff `CFGBlockOutputState` changed.
+  bool update(const State &NewState) {
+    bool Changed = false;
+
+    for (unsigned Idx = 0; Idx < NewState.size(); ++Idx) {
+      if (StatusSets[Idx].add(NewState.getStatusFor(Idx))) {
+        Changed = true;
+      }
+    }
+    return Changed;
+  }
+
+  // Return a `State` that represents the `CFGBlockOutputState`.
+  State getState() {
+    ParamSizedVector<ParameterStatus> ParamData{StatusSets.size()};
+
+    for (unsigned Idx = 0; Idx < ParamData.size(); ++Idx) {
+      ParamData[Idx] = StatusSets[Idx].summaryStatus();
+    }
+    return State{ParamData};
+  }
 };
 
 /// A simple class that finds DeclRefExpr in the given expression.
@@ -639,6 +735,8 @@ class CalledOnceChecker : public ConstStmtVisitor<CalledOnceChecker> {
     if (size() != 0) {
       States =
           CFGSizedVector<State>(FunctionCFG.getNumBlockIDs(), State(size()));
+      CFGBlockOutputStates = CFGSizedVector<CFGBlockOutputState>(
+          FunctionCFG.getNumBlockIDs(), CFGBlockOutputState(size()));
     }
   }
 
@@ -1305,17 +1403,17 @@ class CalledOnceChecker : public ConstStmtVisitor<CalledOnceChecker> {
   }
   /// \}
 
-  /// Assign status to the given basic block.
-  ///
-  /// Returns true when the stored status changed.
+  /// Update output state for the CFGBlock.
+  /// Returns true when the stored state changed.
   bool assignState(const CFGBlock *BB, const State &ToAssign) {
-    State &Current = getState(BB);
-    if (Current == ToAssign) {
-      return false;
-    }
+    CFGBlockOutputState &OldOutputState =
+        CFGBlockOutputStates[BB->getBlockID()];
 
-    Current = ToAssign;
-    return true;
+    if (OldOutputState.update(ToAssign)) {
+      getState(BB) = OldOutputState.getState();
+      return true;
+    }
+    return false;
   }
 
   /// Join all incoming statuses for the given basic block.
@@ -1692,6 +1790,8 @@ class CalledOnceChecker : public ConstStmtVisitor<CalledOnceChecker> {
   State CurrentState;
   ParamSizedVector<const ParmVarDecl *> TrackedParams;
   CFGSizedVector<State> States;
+  // The mapping from each `CFGBlock` to its `CFGBlockOutputState`:
+  CFGSizedVector<CFGBlockOutputState> CFGBlockOutputStates;
 };
 
 } // end anonymous namespace
diff --git a/clang/test/SemaObjC/warn-called-once.m b/clang/test/SemaObjC/warn-called-once.m
index dbe8dc1cf1ae17..f23e41e00ee298 100644
--- a/clang/test/SemaObjC/warn-called-once.m
+++ b/clang/test/SemaObjC/warn-called-once.m
@@ -1194,6 +1194,16 @@ - (void)test_escape_after_branch:(int)cond
   escape(handler);
 }
 
+- (void)test_termination:(int)cond
+                  withCompletion:(void (^)(void))handler {
+  // The code below was able to cause non-termination but should be
+  // fixed now:
+  do {
+    escape(handler);    
+    handler();    // expected-warning{{completion handler is called twice}} expected-note{{previous call is here; set to nil to indicate it cannot be called afterwards}}
+  } while (cond);
+}
+
 typedef void (^DeferredBlock)(void);
 static inline void DefferedCallback(DeferredBlock *inBlock) { (*inBlock)(); }
 #define _DEFERCONCAT(a, b) a##b

@SavchenkoValeriy
Copy link
Member

SavchenkoValeriy commented Jan 17, 2024

Hey @ziqingluo-90, thank you for such a detailed description of the problem. It helped me to get up to speed quickly. 🙏
However, I must say that I find this change too intrusive a this point, while not being 100% convinced that it is necessary.

Considering these rules (even be it for joining multiple paths) https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/CalledOnceCheck.cpp#L119-L145 we shouldn't override Reported with Escaped. At the same time, we do that within a single block thanks to this code:

if (CurrentParamStatus.isErrorStatus()) {

We do consider Reported to be an error state, and overwrite it. Maybe changing this logic and excluding Reported from that condition (i.e. x.isErrorStatus() && x != Reported) is the way to go? The reason for that is that we already found a problem on that path and the fact that it escaped before it shouldn't affect our conclusion. Another reason is more formal, Reported is supposed to be a top element of the lattice and this logic defies it.

Thanks again!

The Called-Once dataflow analysis could never terminate as a
consequence of non-monotonic update on states.  States of kind Escape
can override states leading to non-monotonic update.

This fix disallows the `Escape` state to override the `Reported`
state.

rdar://119671856
Copy link

github-actions bot commented Jan 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ziqingluo-90
Copy link
Contributor Author

@SavchenkoValeriy, sorry for the late response.
Your suggested fix works for my minimal example as well as my local project where this bug was originated.
And I failed to come up with a new counter-example for the fix so I believe it will also prevent future non-termination issues.
I have updated the PR with respect to the simpler fix.

Copy link
Member

@SavchenkoValeriy SavchenkoValeriy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you for fixing the problem and for describing it so in detail.

@ziqingluo-90 ziqingluo-90 merged commit 2a06850 into llvm:main Jan 26, 2024
ziqingluo-90 added a commit to ziqingluo-90/apple-llvm-project that referenced this pull request Jan 30, 2024
The Called-Once dataflow analysis could never terminate as a
consequence of non-monotonic update on states.  States of kind Escape
can override states leading to non-monotonic update.

This fix disallows the `Escape` state to override the `Reported`
state.

rdar://119671856
(cherry picked from commit 2a06850)
ziqingluo-90 added a commit to swiftlang/llvm-project that referenced this pull request Jan 31, 2024
The Called-Once dataflow analysis could never terminate as a
consequence of non-monotonic update on states.  States of kind Escape
can override states leading to non-monotonic update.

This fix disallows the `Escape` state to override the `Reported`
state.

rdar://119671856
(cherry picked from commit 2a06850)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants