Skip to content

[clang][nullability] Don't discard expression state before end of full-expression. #82611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 21 additions & 4 deletions clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,36 @@ class ControlFlowContext {
return BlockReachable[B.getBlockID()];
}

/// Returns whether `B` contains an expression that is consumed in a
/// different block than `B` (i.e. the parent of the expression is in a
/// different block).
/// This happens if there is control flow within a full-expression (triggered
/// by `&&`, `||`, or the conditional operator). Note that the operands of
/// these operators are not the only expressions that can be consumed in a
/// different block. For example, in the function call
/// `f(&i, cond() ? 1 : 0)`, `&i` is in a different block than the `CallExpr`.
bool containsExprConsumedInDifferentBlock(const CFGBlock &B) const {
return ContainsExprConsumedInDifferentBlock.contains(&B);
}

private:
ControlFlowContext(const Decl &D, std::unique_ptr<CFG> Cfg,
llvm::DenseMap<const Stmt *, const CFGBlock *> StmtToBlock,
llvm::BitVector BlockReachable)
ControlFlowContext(
const Decl &D, std::unique_ptr<CFG> Cfg,
llvm::DenseMap<const Stmt *, const CFGBlock *> StmtToBlock,
llvm::BitVector BlockReachable,
llvm::DenseSet<const CFGBlock *> ContainsExprConsumedInDifferentBlock)
: ContainingDecl(D), Cfg(std::move(Cfg)),
StmtToBlock(std::move(StmtToBlock)),
BlockReachable(std::move(BlockReachable)) {}
BlockReachable(std::move(BlockReachable)),
ContainsExprConsumedInDifferentBlock(
std::move(ContainsExprConsumedInDifferentBlock)) {}

/// The `Decl` containing the statement used to construct the CFG.
const Decl &ContainingDecl;
std::unique_ptr<CFG> Cfg;
llvm::DenseMap<const Stmt *, const CFGBlock *> StmtToBlock;
llvm::BitVector BlockReachable;
llvm::DenseSet<const CFGBlock *> ContainsExprConsumedInDifferentBlock;
};

} // namespace dataflow
Expand Down
11 changes: 10 additions & 1 deletion clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,14 @@ class Environment {
bool equivalentTo(const Environment &Other,
Environment::ValueModel &Model) const;

/// How to treat expression state (`ExprToLoc` and `ExprToVal`) in a join.
/// If the join happens within a full expression, expression state should be
/// kept; otherwise, we can discard it.
enum ExprJoinBehavior {
DiscardExprState,
KeepExprState,
};

/// Joins two environments by taking the intersection of storage locations and
/// values that are stored in them. Distinct values that are assigned to the
/// same storage locations in `EnvA` and `EnvB` are merged using `Model`.
Expand All @@ -248,7 +256,8 @@ class Environment {
///
/// `EnvA` and `EnvB` must use the same `DataflowAnalysisContext`.
static Environment join(const Environment &EnvA, const Environment &EnvB,
Environment::ValueModel &Model);
Environment::ValueModel &Model,
ExprJoinBehavior ExprBehavior);

/// Widens the environment point-wise, using `PrevEnv` as needed to inform the
/// approximation.
Expand Down
38 changes: 37 additions & 1 deletion clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,38 @@ static llvm::BitVector findReachableBlocks(const CFG &Cfg) {
return BlockReachable;
}

static llvm::DenseSet<const CFGBlock *>
buildContainsExprConsumedInDifferentBlock(
const CFG &Cfg,
const llvm::DenseMap<const Stmt *, const CFGBlock *> &StmtToBlock) {
llvm::DenseSet<const CFGBlock *> Result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could be a bitset that contains the block ids. Although, since we probably expect this set to be sparse/small, maybe the current solution is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'm not sure which way this goes. This feature doesn't appear to be a bit performance issue one way or the other, so I'm inclined to keep it as is.


auto CheckChildExprs = [&Result, &StmtToBlock](const Stmt *S,
const CFGBlock *Block) {
for (const Stmt *Child : S->children()) {
if (!isa<Expr>(Child))
continue;
const CFGBlock *ChildBlock = StmtToBlock.lookup(Child);
if (ChildBlock != Block)
Result.insert(ChildBlock);
}
};

for (const CFGBlock *Block : Cfg) {
if (Block == nullptr)
continue;

for (const CFGElement &Element : *Block)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we early terminate as soon as we found an expression that is consumed in a different block? Also, in case we do early termination, would it make sense to go backwards and start with the terminator conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we early terminate as soon as we found an expression that is consumed in a different block?

That doesn't work, because we insert ChildBlock, not Block. This block might contain expressions from various other blocks, and we want to insert all of those blocks into Result, so we can't terminate the loop early.

if (auto S = Element.getAs<CFGStmt>())
CheckChildExprs(S->getStmt(), Block);

if (const Stmt *TerminatorCond = Block->getTerminatorCondition())
CheckChildExprs(TerminatorCond, Block);
}

return Result;
}

llvm::Expected<ControlFlowContext>
ControlFlowContext::build(const FunctionDecl &Func) {
if (!Func.doesThisDeclarationHaveABody())
Expand Down Expand Up @@ -140,8 +172,12 @@ ControlFlowContext::build(const Decl &D, Stmt &S, ASTContext &C) {

llvm::BitVector BlockReachable = findReachableBlocks(*Cfg);

llvm::DenseSet<const CFGBlock *> ContainsExprConsumedInDifferentBlock =
buildContainsExprConsumedInDifferentBlock(*Cfg, StmtToBlock);

return ControlFlowContext(D, std::move(Cfg), std::move(StmtToBlock),
std::move(BlockReachable));
std::move(BlockReachable),
std::move(ContainsExprConsumedInDifferentBlock));
}

} // namespace dataflow
Expand Down
28 changes: 24 additions & 4 deletions clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,24 @@ static llvm::DenseMap<const ValueDecl *, StorageLocation *> intersectDeclToLoc(
return Result;
}

// Performs a join on either `ExprToLoc` or `ExprToVal`.
// The maps must be consistent in the sense that any entries for the same
// expression must map to the same location / value. This is the case if we are
// performing a join for control flow within a full-expression (which is the
// only case when this function should be used).
template <typename MapT> MapT joinExprMaps(const MapT &Map1, const MapT &Map2) {
MapT Result = Map1;

for (const auto &Entry : Map2) {
[[maybe_unused]] auto [It, Inserted] = Result.insert(Entry);
// If there was an existing entry, its value should be the same as for the
// entry we were trying to insert.
assert(It->second == Entry.second);
}

return Result;
}

// Whether to consider equivalent two values with an unknown relation.
//
// FIXME: this function is a hack enabling unsoundness to support
Expand Down Expand Up @@ -627,7 +645,8 @@ LatticeJoinEffect Environment::widen(const Environment &PrevEnv,
}

Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
Environment::ValueModel &Model) {
Environment::ValueModel &Model,
ExprJoinBehavior ExprBehavior) {
assert(EnvA.DACtx == EnvB.DACtx);
assert(EnvA.ThisPointeeLoc == EnvB.ThisPointeeLoc);
assert(EnvA.CallStack == EnvB.CallStack);
Expand Down Expand Up @@ -675,9 +694,10 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB,
JoinedEnv.LocToVal =
joinLocToVal(EnvA.LocToVal, EnvB.LocToVal, EnvA, EnvB, JoinedEnv, Model);

// We intentionally leave `JoinedEnv.ExprToLoc` and `JoinedEnv.ExprToVal`
// empty, as we never need to access entries in these maps outside of the
// basic block that sets them.
if (ExprBehavior == KeepExprState) {
JoinedEnv.ExprToVal = joinExprMaps(EnvA.ExprToVal, EnvB.ExprToVal);
JoinedEnv.ExprToLoc = joinExprMaps(EnvA.ExprToLoc, EnvB.ExprToLoc);
}

return JoinedEnv;
}
Expand Down
32 changes: 25 additions & 7 deletions clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -221,18 +221,21 @@ class PrettyStackTraceCFGElement : public llvm::PrettyStackTraceEntry {
// Avoids unneccesary copies of the environment.
class JoinedStateBuilder {
AnalysisContext &AC;
Environment::ExprJoinBehavior JoinBehavior;
std::vector<const TypeErasedDataflowAnalysisState *> All;
std::deque<TypeErasedDataflowAnalysisState> Owned;

TypeErasedDataflowAnalysisState
join(const TypeErasedDataflowAnalysisState &L,
const TypeErasedDataflowAnalysisState &R) {
return {AC.Analysis.joinTypeErased(L.Lattice, R.Lattice),
Environment::join(L.Env, R.Env, AC.Analysis)};
Environment::join(L.Env, R.Env, AC.Analysis, JoinBehavior)};
}

public:
JoinedStateBuilder(AnalysisContext &AC) : AC(AC) {}
JoinedStateBuilder(AnalysisContext &AC,
Environment::ExprJoinBehavior JoinBehavior)
: AC(AC), JoinBehavior(JoinBehavior) {}

void addOwned(TypeErasedDataflowAnalysisState State) {
Owned.push_back(std::move(State));
Expand All @@ -248,12 +251,12 @@ class JoinedStateBuilder {
// initialize the state of each basic block differently.
return {AC.Analysis.typeErasedInitialElement(), AC.InitEnv.fork()};
if (All.size() == 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guard on ExprBehavior as well, like if (All.size() == 1 && ExprBehavior == DiscardExprState)? Then, could you drop the FIXME and "if desired"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guard on ExprBehavior as well, like if (All.size() == 1 && ExprBehavior == DiscardExprState)?

We would also need to return the existing state in the case All.size() == 1 && ExprBehavior == KeepExprState, right? IOW, the code would become something like this:

    if (All.size() == 1) {
      if (ExprBehavior == KeepExprState)
        return All[0];
      // Join the environment with itself so that we discard expression state.
      // FIXME: We could consider writing special-case code for this that only
      // does the discarding, but it's not clear if this is worth it.
      return {All[0]->Lattice, Environment::join(All[0]->Env, All[0]->Env,
                                                 AC.Analysis, DiscardExprState)};
    }

And it doesn't deal with the FIXME, as we still need some way to discard the expression state.

I'm not sure that, all in all, changing the code in this way would be preferable.

The upside is that we're now a bit more efficient in the case where All.size() == 1 and we want to keep expression state.

The downside is that we're treating KeepExprState specially when join() could have handled it for us. Right now, join() is the only place that needs to care about what to do for KeepExprState versus DiscardExprState, and it would be nice to keep the behavior localized in this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with keeping as is for reducing complexity.

// Join the environment with itself so that we discard the entries from
// `ExprToLoc` and `ExprToVal`.
// Join the environment with itself so that we discard expression state if
// desired.
// FIXME: We could consider writing special-case code for this that only
// does the discarding, but it's not clear if this is worth it.
return {All[0]->Lattice,
Environment::join(All[0]->Env, All[0]->Env, AC.Analysis)};
return {All[0]->Lattice, Environment::join(All[0]->Env, All[0]->Env,
AC.Analysis, JoinBehavior)};

auto Result = join(*All[0], *All[1]);
for (unsigned I = 2; I < All.size(); ++I)
Expand Down Expand Up @@ -307,7 +310,22 @@ computeBlockInputState(const CFGBlock &Block, AnalysisContext &AC) {
}
}

JoinedStateBuilder Builder(AC);
// If any of the predecessor blocks contains an expression consumed in a
// different block, we need to keep expression state.
// Note that in this case, we keep expression state for all predecessors,
// rather than only those predecessors that actually contain an expression
// consumed in a different block. While this is potentially suboptimal, it's
// actually likely, if we have control flow within a full expression, that
// all predecessors have expression state consumed in a different block.
Environment::ExprJoinBehavior JoinBehavior = Environment::DiscardExprState;
for (const CFGBlock *Pred : Preds) {
if (Pred && AC.CFCtx.containsExprConsumedInDifferentBlock(*Pred)) {
JoinBehavior = Environment::KeepExprState;
break;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a property of the CFG, can we compute this just once per block (rather than every time we visit the block)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out -- essentially, I added a method to ControlFlowContext that allows querying what the ExprJoinBehavior should be for a given block. However, I'm not convinced that the result was actually better that what we have now.

ControlFlowContext is essentially an "embellished" CFG -- it contains the CFG along with additional information derived from the CFG that the CFG itself doesn't contain. (BTW, what do you think of renaming ControlFlowContext to EmbellishedCFG? I think it describes the class better, and it would leave us with one fewer Context, of which we have confusingly many.)

I would argue that exprJoinBehaviorForBlock() isn't a direct property of the CFG block -- yes, it's a property that we associate with the block, but that property only makes sense in the context of how we join input states for the block, and as such seems more closely associated with computeBlockInputState() and the Environment than with the ControlFlowContext.

On the other hand, containsExprConsumedInDifferentBlock() does express a very direct property of the block. It seems more natural to have this be the property that is exposed by ControlFlowContext, and to have computeBlockInputState() derive from this the information it needs on how to join states from the predecessor blocks.

Yes, this means we compute this information each time we visit the block, but this appears pretty insignificant compared to the work we do actually processing the block. I ran the Crubit nullability check benchmarks with the draft change I mentioned, and the speedup is too small to distinguish from measurement noise at the CPU time level, though a small significant change can be seen at the cycles level. I don't think this justifies changing the current design, which I think has better separation of concerns.

Copy link
Collaborator

@ymand ymand Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: EmbellishedCFG -- sure.

Ok, I'm fine leaving as is, given the benchmarks. But, I would think there's a simple compromise (say for a followup patch) of just computing it locally in this file (as now), but caching it in AnalysisContext (which is local to that file), since its a fixed property of the block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure this is worth it - this would add additional stored state (even if it's non-mutable), and I'm not sure this is worth it for the small gain we get. Will commit as-is, and we can discuss whether we want to create a followup patch for this.


JoinedStateBuilder Builder(AC, JoinBehavior);
for (const CFGBlock *Pred : Preds) {
// Skip if the `Block` is unreachable or control flow cannot get past it.
if (!Pred || Pred->hasNoReturnElement())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ TEST_F(EnvironmentTest, JoinRecords) {
Env2.setValue(Loc, Val2);

Environment::ValueModel Model;
Environment EnvJoined = Environment::join(Env1, Env2, Model);
Environment EnvJoined =
Environment::join(Env1, Env2, Model, Environment::DiscardExprState);
auto *JoinedVal = cast<RecordValue>(EnvJoined.getValue(Loc));
EXPECT_NE(JoinedVal, &Val1);
EXPECT_NE(JoinedVal, &Val2);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -244,62 +244,98 @@ TEST_F(DiscardExprStateTest, WhileStatement) {
EXPECT_NE(NotEqOpState.Env.getValue(NotEqOp), nullptr);

// In the block that calls `foo(p)`, the value for `p != nullptr` is discarded
// because it is not consumed by this block.
// because it is not consumed outside the block it is in.
const auto &CallFooState = blockStateForStmt(BlockStates, CallFoo);
EXPECT_EQ(CallFooState.Env.getValue(NotEqOp), nullptr);
}

TEST_F(DiscardExprStateTest, BooleanOperator) {
std::string Code = R"(
bool target(bool b1, bool b2) {
return b1 && b2;
void f();
void target(bool b1, bool b2) {
if (b1 && b2)
f();
}
)";
auto BlockStates = llvm::cantFail(runAnalysis<NoopAnalysis>(
Code, [](ASTContext &C) { return NoopAnalysis(C); }));

const auto &AndOp =
matchNode<BinaryOperator>(binaryOperator(hasOperatorName("&&")));
const auto &Return = matchNode<ReturnStmt>(returnStmt());
const auto &CallF =
matchNode<CallExpr>(callExpr(callee(functionDecl(hasName("f")))));

// In the block that evaluates the LHS of the `&&` operator, the LHS is
// associated with a value, while the right-hand side is not (unsurprisingly,
// as it hasn't been evaluated yet).
const auto &LHSState = blockStateForStmt(BlockStates, *AndOp.getLHS());
auto *LHSValue = cast<BoolValue>(LHSState.Env.getValue(*AndOp.getLHS()));
ASSERT_NE(LHSValue, nullptr);
EXPECT_NE(LHSValue, nullptr);
EXPECT_EQ(LHSState.Env.getValue(*AndOp.getRHS()), nullptr);

// In the block that evaluates the RHS, the RHS is associated with a
// value. The value for the LHS has been discarded as it is not consumed by
// this block.
// In the block that evaluates the RHS, both the LHS and RHS are associated
// with values, as they are both subexpressions of the `&&` operator, which
// is evaluated in a later block.
const auto &RHSState = blockStateForStmt(BlockStates, *AndOp.getRHS());
EXPECT_EQ(RHSState.Env.getValue(*AndOp.getLHS()), nullptr);
auto *RHSValue = cast<BoolValue>(RHSState.Env.getValue(*AndOp.getRHS()));
ASSERT_NE(RHSValue, nullptr);

// In the block that evaluates the return statement, the expression `b1 && b2`
// is associated with a value (and check that it's the right one).
// The expressions `b1` and `b2` are _not_ associated with a value in this
// block, even though they are consumed by the block, because:
// * This block has two prececessor blocks (the one that evaluates `b1` and
// the one that evaluates `b2`).
// * `b1` is only associated with a value in the block that evaluates `b1` but
// not the block that evalutes `b2`, so the join operation discards the
// value for `b1`.
// * `b2` is only associated with a value in the block that evaluates `b2` but
// not the block that evaluates `b1`, the the join operation discards the
// value for `b2`.
// Nevertheless, the analysis generates the correct formula for `b1 && b2`
// because the transfer function for the `&&` operator retrieves the values
// for its operands from the environments for the blocks that compute the
// operands, rather than from the environment for the block that contains the
// `&&`.
const auto &ReturnState = blockStateForStmt(BlockStates, Return);
EXPECT_EQ(ReturnState.Env.getValue(*AndOp.getLHS()), nullptr);
EXPECT_EQ(ReturnState.Env.getValue(*AndOp.getRHS()), nullptr);
EXPECT_EQ(ReturnState.Env.getValue(AndOp),
&ReturnState.Env.makeAnd(*LHSValue, *RHSValue));
EXPECT_EQ(RHSState.Env.getValue(*AndOp.getLHS()), LHSValue);
auto *RHSValue = RHSState.Env.get<BoolValue>(*AndOp.getRHS());
EXPECT_NE(RHSValue, nullptr);

// In the block that evaluates `b1 && b2`, the `&&` as well as its operands
// are associated with values.
const auto &AndOpState = blockStateForStmt(BlockStates, AndOp);
EXPECT_EQ(AndOpState.Env.getValue(*AndOp.getLHS()), LHSValue);
EXPECT_EQ(AndOpState.Env.getValue(*AndOp.getRHS()), RHSValue);
EXPECT_EQ(AndOpState.Env.getValue(AndOp),
&AndOpState.Env.makeAnd(*LHSValue, *RHSValue));

// In the block that calls `f()`, none of `b1`, `b2`, or `b1 && b2` should be
// associated with values.
const auto &CallFState = blockStateForStmt(BlockStates, CallF);
EXPECT_EQ(CallFState.Env.getValue(*AndOp.getLHS()), nullptr);
EXPECT_EQ(CallFState.Env.getValue(*AndOp.getRHS()), nullptr);
EXPECT_EQ(CallFState.Env.getValue(AndOp), nullptr);
}

TEST_F(DiscardExprStateTest, ConditionalOperator) {
std::string Code = R"(
void f(int*, int);
void g();
bool cond();

void target() {
int i = 0;
if (cond())
f(&i, cond() ? 1 : 0);
g();
}
)";
auto BlockStates = llvm::cantFail(runAnalysis<NoopAnalysis>(
Code, [](ASTContext &C) { return NoopAnalysis(C); }));

const auto &AddrOfI =
matchNode<UnaryOperator>(unaryOperator(hasOperatorName("&")));
const auto &CallF =
matchNode<CallExpr>(callExpr(callee(functionDecl(hasName("f")))));
const auto &CallG =
matchNode<CallExpr>(callExpr(callee(functionDecl(hasName("g")))));

// In the block that evaluates `&i`, it should obviously have a value.
const auto &AddrOfIState = blockStateForStmt(BlockStates, AddrOfI);
auto *AddrOfIVal = AddrOfIState.Env.get<PointerValue>(AddrOfI);
EXPECT_NE(AddrOfIVal, nullptr);

// Because of the conditional operator, the `f(...)` call is evaluated in a
// different block than `&i`, but `&i` still needs to have a value here
// because it's a subexpression of the call.
const auto &CallFState = blockStateForStmt(BlockStates, CallF);
EXPECT_NE(&CallFState, &AddrOfIState);
EXPECT_EQ(CallFState.Env.get<PointerValue>(AddrOfI), AddrOfIVal);

// In the block that calls `g()`, `&i` should no longer be associated with a
// value.
const auto &CallGState = blockStateForStmt(BlockStates, CallG);
EXPECT_EQ(CallGState.Env.get<PointerValue>(AddrOfI), nullptr);
}

struct NonConvergingLattice {
Expand Down