-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer][NFC] Introduce framework for checker families #139256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The checker classes (i.e. classes derived from `CheckerBase` via the utility template `Checker<...>`) act as intermediates between the user and the analyzer engine, so they have two interfaces: - On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports. - On the backend side, they can handle various checker callbacks and they "leave a mark" on the `ExplodedNode`s that are created by them. (These `ProgramPointTag` marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.) In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of `mutable std::unique_ptr<BugType>` members and redundant data members (when a checker used its custom `CheckNames` array and ignored the inherited single `Name`). My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with `{parent checker object, index of part}` pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member. This is implemented by separating the functionality of `CheckerBase` into two new classes: `CheckerFrontend` and `CheckerBackend`. The name `CheckerBase` is kept (as a class derived from both `CheckerFrontend` and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and `Checker<...>` continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation: - The class `CheckerFamily<...>` works exactly like `Checker<...>` but inherits from `CheckerBackend` instead of `CheckerBase`, so it won't have a superfluous single `Name` member. - Classes deriving from `CheckerFamily` can freely own multiple `CheckerFrontend` data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the `BugType`s that they can emit. In this scheme each `CheckerFamily` needs to override the pure virtual method `ProgramPointTag::getTagDescription()` which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.) I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this `CheckerFamily` framework.
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Donát Nagy (NagyDonat) ChangesThe checker classes (i.e. classes derived from
In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with This is implemented by separating the functionality of
In this scheme each I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this Patch is 32.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139256.diff 11 Files Affected:
diff --git a/clang/include/clang/Analysis/ProgramPoint.h b/clang/include/clang/Analysis/ProgramPoint.h
index c40aa3d8ffb72..d81b8e845cb48 100644
--- a/clang/include/clang/Analysis/ProgramPoint.h
+++ b/clang/include/clang/Analysis/ProgramPoint.h
@@ -40,8 +40,8 @@ class ProgramPointTag {
ProgramPointTag(void *tagKind = nullptr) : TagKind(tagKind) {}
virtual ~ProgramPointTag();
- /// The description of this program point which will be displayed when the
- /// ExplodedGraph is dumped in DOT format for debugging.
+ /// The description of this program point which will be dumped for debugging
+ /// purposes. Do not use in user-facing output!
virtual StringRef getTagDescription() const = 0;
/// Used to implement 'isKind' in subclasses.
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
index 8e1d25b3eefa1..33d37febc7327 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
@@ -643,9 +643,10 @@ class BugReporter {
/// reports.
virtual void emitReport(std::unique_ptr<BugReport> R);
- void EmitBasicReport(const Decl *DeclWithIssue, const CheckerBase *Checker,
- StringRef BugName, StringRef BugCategory,
- StringRef BugStr, PathDiagnosticLocation Loc,
+ void EmitBasicReport(const Decl *DeclWithIssue,
+ const CheckerFrontend *Checker, StringRef BugName,
+ StringRef BugCategory, StringRef BugStr,
+ PathDiagnosticLocation Loc,
ArrayRef<SourceRange> Ranges = {},
ArrayRef<FixItHint> Fixits = {});
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 3a635e0d0125a..b05360904f86d 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -27,11 +27,7 @@ class BugReporter;
class BugType {
private:
- struct CheckerPartRef {
- const CheckerBase *Checker;
- CheckerPartIdx Idx;
- };
- using CheckerNameInfo = std::variant<CheckerNameRef, CheckerPartRef>;
+ using CheckerNameInfo = std::variant<CheckerNameRef, const CheckerFrontend *>;
const CheckerNameInfo CheckerName;
const std::string Description;
@@ -43,7 +39,7 @@ class BugType {
public:
// Straightforward constructor where the checker name is specified directly.
// TODO: As far as I know all applications of this constructor involve ugly
- // hacks that could be avoided by switching to a different constructor.
+ // hacks that could be avoided by switching to the other constructor.
// When those are all eliminated, this constructor should be removed to
// eliminate the `variant` and simplify this class.
BugType(CheckerNameRef CheckerName, StringRef Desc,
@@ -52,18 +48,11 @@ class BugType {
SuppressOnSink(SuppressOnSink) {}
// Constructor that can be called from the constructor of a checker object.
// At that point the checker name is not yet available, but we can save a
- // pointer to the checker and later use that to query the name.
- BugType(const CheckerBase *Checker, StringRef Desc,
+ // pointer to the checker and use that to query the name.
+ BugType(const CheckerFrontend *CF, StringRef Desc,
StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
- : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
- Category(Cat), SuppressOnSink(SuppressOnSink) {}
- // Constructor that can be called from the constructor of a checker object
- // when it has multiple parts with separate names. We save the name and the
- // part index to be able to query the name of that part later.
- BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
- StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
- : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
- Category(Cat), SuppressOnSink(SuppressOnSink) {}
+ : CheckerName(CF), Description(Desc), Category(Cat),
+ SuppressOnSink(SuppressOnSink) {}
virtual ~BugType() = default;
StringRef getDescription() const { return Description; }
@@ -72,8 +61,7 @@ class BugType {
if (const auto *CNR = std::get_if<CheckerNameRef>(&CheckerName))
return *CNR;
- auto [Checker, Idx] = std::get<CheckerPartRef>(CheckerName);
- return Checker->getName(Idx);
+ return std::get<const CheckerFrontend *>(CheckerName)->getName();
}
/// isSuppressOnSink - Returns true if bug reports associated with this bug
@@ -82,6 +70,24 @@ class BugType {
bool isSuppressOnSink() const { return SuppressOnSink; }
};
+/// Trivial convenience class for the common case when a certain checker
+/// frontend always uses the same bug type. This way instead of writing
+/// ```
+/// CheckerFrontend LongCheckerFrontendName;
+/// BugType LongCheckerFrontendNameBug{LongCheckerFrontendName, "..."};
+/// ```
+/// we can use `CheckerFrontendWithBugType LongCheckerFrontendName{"..."}`.
+class CheckerFrontendWithBugType : public CheckerFrontend {
+ BugType BT;
+
+public:
+ CheckerFrontendWithBugType(StringRef Desc,
+ StringRef Cat = categories::LogicError,
+ bool SuppressOnSink = false)
+ : BT(this, Desc, Cat, SuppressOnSink) {}
+ const BugType &getBT() const { return BT; }
+};
+
} // namespace ento
} // end clang namespace
diff --git a/clang/include/clang/StaticAnalyzer/Core/Checker.h b/clang/include/clang/StaticAnalyzer/Core/Checker.h
index a54c5bee612f6..45b0398f3aca5 100644
--- a/clang/include/clang/StaticAnalyzer/Core/Checker.h
+++ b/clang/include/clang/StaticAnalyzer/Core/Checker.h
@@ -484,83 +484,87 @@ class Call {
} // end eval namespace
-class CheckerBase : public ProgramPointTag {
- /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
- /// multiple user-facing checkers that have separate names and can be enabled
- /// separately, but are backed by the same singleton checker object.
- SmallVector<std::optional<CheckerNameRef>, 1> RegisteredNames;
-
- friend class ::clang::ento::CheckerManager;
+/// A `CheckerFrontend` instance is what the user recognizes as "one checker":
+/// it has a public canonical name (injected from the `CheckerManager`), can be
+/// enabled or disabled, can have associated checker options and can be printed
+/// as the "source" of bug reports.
+/// The singleton instance of a simple `Checker<...>` is-a `CheckerFrontend`
+/// (for historical reasons, to preserve old straightforward code), while the
+/// singleton instance of a `CheckerFamily<...>` class owns multiple
+/// `CheckerFrontend` instances as data members.
+/// Modeling checkers that are hidden from the user but can be enabled or
+/// disabled separately (as dependencies of other checkers) are also considered
+/// to be `CheckerFrontend`s.
+class CheckerFrontend {
+ /// The `Name` is nullopt if and only if the checker is disabled.
+ std::optional<CheckerNameRef> Name;
public:
- CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
- assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
- std::optional<CheckerNameRef> Name = RegisteredNames[Idx];
- assert(Name && "Requested checker part is not registered!");
- return *Name;
- }
-
- bool isPartEnabled(CheckerPartIdx Idx) const {
- return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
- }
-
- void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
- // Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
- assert(Idx < 256 && "Checker part identifiers should be small integers.");
-
- if (Idx >= RegisteredNames.size())
- RegisteredNames.resize(Idx + 1, std::nullopt);
-
- assert(!RegisteredNames[Idx] && "Repeated registration of checker a part!");
- RegisteredNames[Idx] = Name;
- }
-
- StringRef getTagDescription() const override {
- // When the ExplodedGraph is dumped for debugging (in DOT format), this
- // method is called to attach a description to nodes created by this
- // checker _class_. Ideally this should be recognizable identifier of the
- // whole class, but for this debugging purpose it's sufficient to use the
- // name of the first registered checker part.
- for (const auto &OptName : RegisteredNames)
- if (OptName)
- return *OptName;
-
- return "Unregistered checker";
+ void enable(CheckerManager &Mgr) {
+ assert(!Name && "Checker part registered twice!");
+ Name = Mgr.getCurrentCheckerName();
}
+ bool isEnabled() const { return static_cast<bool>(Name); }
+ CheckerNameRef getName() const { return *Name; }
+};
+/// `CheckerBackend` is an abstract base class that serves as the common
+/// ancestor of all the `Checker<...>` and `CheckerFamily<...>` classes which
+/// can create `ExplodedNode`s (by acting as a `ProgramPointTag`) and can be
+/// registered to handle various checker callbacks. (Moreover the debug
+/// callback `printState` is also introduced here.)
+class CheckerBackend : public ProgramPointTag {
+public:
/// Debug state dump callback, see CheckerManager::runCheckersForPrintState.
/// Default implementation does nothing.
virtual void printState(raw_ostream &Out, ProgramStateRef State,
const char *NL, const char *Sep) const;
};
-/// Dump checker name to stream.
-raw_ostream& operator<<(raw_ostream &Out, const CheckerBase &Checker);
-
-/// Tag that can use a checker name as a message provider
-/// (see SimpleProgramPointTag).
-class CheckerProgramPointTag : public SimpleProgramPointTag {
+/// The non-templated common ancestor of all the simple `Checker<...>` classes.
+class CheckerBase : public CheckerFrontend, public CheckerBackend {
public:
- CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
- CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+ /// Attached to nodes created by this checker class when the ExplodedGraph is
+ /// dumped for debugging.
+ StringRef getTagDescription() const override;
};
-template <typename CHECK1, typename... CHECKs>
-class Checker : public CHECK1, public CHECKs..., public CheckerBase {
+// Template magic to implement the static method `_register()` which registers
+// the `Checker` or `CheckerFamily` for all the implemented callbacks.
+template <typename CHECKER, typename CHECK1, typename... CHECKs>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {
+ CHECK1::_register(Chk, Mgr);
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
+}
+
+template <typename CHECKER>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {}
+
+/// Simple checker classes that implement one frontend (i.e. checker name)
+/// should derive from this template and specify all the implemented callbacks
+/// (i.e. classes like `check::PreStmt` or `eval::Call`) as template arguments
+/// of `Checker`.
+template <typename... CHECKs>
+class Checker : public CheckerBase, public CHECKs... {
public:
template <typename CHECKER>
- static void _register(CHECKER *checker, CheckerManager &mgr) {
- CHECK1::_register(checker, mgr);
- Checker<CHECKs...>::_register(checker, mgr);
+ static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
}
};
-template <typename CHECK1>
-class Checker<CHECK1> : public CHECK1, public CheckerBase {
+/// Checker families (where a single backend class implements multiple related
+/// frontends) should derive from this template, specify all the implemented
+/// callbacks (i.e. classes like `check::PreStmt` or `eval::Call`) as template
+/// arguments of `FamilyChecker` and implement the pure virtual method
+/// `StringRef getTagDescription()` which is inherited from `ProgramPointTag`
+/// and should return a string identifying the class for debugging purposes.
+template <typename... CHECKs>
+class CheckerFamily : public CheckerBackend, public CHECKs... {
public:
template <typename CHECKER>
- static void _register(CHECKER *checker, CheckerManager &mgr) {
- CHECK1::_register(checker, mgr);
+ static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
}
};
@@ -581,6 +585,20 @@ class EventDispatcher {
}
};
+/// Tag that can use a checker name as a message provider
+/// (see SimpleProgramPointTag).
+/// FIXME: This is a cargo cult class which is copied into several checkers but
+/// does not provide anything useful.
+/// The only added functionality provided by this class (compared to
+/// SimpleProgramPointTag) is that it composes the tag description string from
+/// two arguments -- but tag descriptions only appear in debug output so there
+/// is no reason to bother with this.
+class CheckerProgramPointTag : public SimpleProgramPointTag {
+public:
+ CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
+ CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+};
+
/// We dereferenced a location that may be null.
struct ImplicitNullDerefEvent {
SVal Location;
diff --git a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
index 03ffadd346d0b..8fa122f004bfe 100644
--- a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
@@ -39,7 +39,8 @@ class AnalysisManager;
class CXXAllocatorCall;
class BugReporter;
class CallEvent;
-class CheckerBase;
+class CheckerFrontend;
+class CheckerBackend;
class CheckerContext;
class CheckerRegistry;
struct CheckerRegistryData;
@@ -64,9 +65,9 @@ class CheckerFn<RET(Ps...)> {
Func Fn;
public:
- CheckerBase *Checker;
+ CheckerBackend *Checker;
- CheckerFn(CheckerBase *checker, Func fn) : Fn(fn), Checker(checker) {}
+ CheckerFn(CheckerBackend *checker, Func fn) : Fn(fn), Checker(checker) {}
RET operator()(Ps... ps) const {
return Fn(Checker, ps...);
@@ -116,19 +117,6 @@ class CheckerNameRef {
operator StringRef() const { return Name; }
};
-/// A single checker class (and its singleton instance) can act as the
-/// implementation of several (user-facing or modeling) checker parts that
-/// have shared state and logic, but have their own names and can be enabled or
-/// disabled separately.
-/// Each checker class that implement multiple parts introduces its own enum
-/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
-/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
-using CheckerPartIdx = unsigned;
-
-/// If a checker doesn't have multiple parts, then its single part is
-/// represented by this index.
-constexpr inline CheckerPartIdx DefaultPart = 0;
-
enum class ObjCMessageVisitKind {
Pre,
Post,
@@ -193,14 +181,7 @@ class CheckerManager {
/// Emits an error through a DiagnosticsEngine about an invalid user supplied
/// checker option value.
- void reportInvalidCheckerOptionValue(const CheckerBase *C,
- StringRef OptionName,
- StringRef ExpectedValueDesc) const {
- reportInvalidCheckerOptionValue(C, DefaultPart, OptionName,
- ExpectedValueDesc);
- }
-
- void reportInvalidCheckerOptionValue(const CheckerBase *C, CheckerPartIdx Idx,
+ void reportInvalidCheckerOptionValue(const CheckerFrontend *CP,
StringRef OptionName,
StringRef ExpectedValueDesc) const;
@@ -210,28 +191,15 @@ class CheckerManager {
// Checker registration.
//===--------------------------------------------------------------------===//
- /// Construct the singleton instance of a checker, register it for the
- /// supported callbacks and record its name with `registerCheckerPart()`.
- /// Arguments passed to this function are forwarded to the constructor of the
- /// checker.
- ///
- /// If `CHECKER` has multiple parts, then the constructor call and the
- /// callback registration only happen within the first `registerChecker()`
- /// call; while the subsequent calls only enable additional parts of the
- /// existing checker object (while registering their names).
- ///
- /// \returns a pointer to the checker object.
- template <typename CHECKER, CheckerPartIdx Idx = DefaultPart, typename... AT>
- CHECKER *registerChecker(AT &&...Args) {
- // This assert could be removed but then we need to make sure that calls
- // registering different parts of the same checker pass the same arguments.
- static_assert(
- Idx == DefaultPart || !sizeof...(AT),
- "Argument forwarding isn't supported with multi-part checkers!");
-
+ /// If the the singleton instance of a checker class is not yet constructed,
+ /// then construct it (with the supplied arguments), register it for the
+ /// callbacks that are supported by it, and return it. Otherwise, just return
+ /// a pointer to the existing instance.
+ template <typename CHECKER, typename... AT>
+ CHECKER *getChecker(AT &&...Args) {
CheckerTag Tag = getTag<CHECKER>();
- std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
+ std::unique_ptr<CheckerBackend> &Ref = CheckerTags[Tag];
if (!Ref) {
std::unique_ptr<CHECKER> Checker =
std::make_unique<CHECKER>(std::forward<AT>(Args)...);
@@ -239,18 +207,18 @@ class CheckerManager {
Ref = std::move(Checker);
}
- CHECKER *Result = static_cast<CHECKER *>(Ref.get());
- Result->registerCheckerPart(Idx, CurrentCheckerName);
- return Result;
+ return static_cast<CHECKER *>(Ref.get());
}
- template <typename CHECKER>
- CHECKER *getChecker() {
- CheckerTag Tag = getTag<CHECKER>();
- std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
- assert(Ref && "Requested checker is not registered! Maybe you should add it"
- " as a dependency in Checkers.td?");
- return static_cast<CHECKER *>(Ref.get());
+ /// Register a single-part checker (derived from `Checker`): construct its
+ /// singleton instance, register it for the supported callbacks and record
+ /// its name (with `CheckerFrontend::enable`). Calling this multiple times
+ /// triggers an assertion failure.
+ template <typename CHECKER, typename... AT>
+ CHECKER *registerChecker(AT &&...Args) {
+ CHECKER *Chk = getChecker<CHECKER>(std::forward<AT>(Args)...);
+ Chk->enable(*this);
+ return Chk;
}
template <typename CHECKER> bool isRegisteredChecker() {
@@ -482,7 +450,7 @@ class CheckerManager {
/// Run checkers for debug-printing a ProgramState.
///
/// Unlike most other callbacks, any checker can simply implement the virtual
- /// method CheckerBase::printState if it has custom data to print.
+ /// method CheckerBackend::printState if it has custom data to print.
///
/// \param Out The output stream
/// \param State The state being printed
@@ -651,7 +619,7 @@ class CheckerManager {
template <typename T>
static void *getTag() { static int tag; return &tag; }
- llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBase>> CheckerTags;
+ llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBackend>> CheckerTags;
struct DeclCheckerInfo {
CheckDeclFunc CheckFn;
diff --git a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
index 3dd57732305b2..7672c63a646e4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
@@ -25,7 +25,7 @@ using namespace ento;
using namespace taint;
namespace {
-class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
+class DivZeroChecker : public CheckerFamily<check::PreStmt<BinaryOperator>> {
void reportBug(StringRef Msg, ProgramStateRef StateZero,
CheckerContext &C) const;
void reportTaintBug(StringRef Msg, ProgramStateRef StateZero,
@@ -33,17 +33,15 @@ class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
llvm::ArrayRef<SymbolRef> TaintedSym...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is a really good direction. I like it.
I left a couple minor remarks inline.
I handled all the inline comments. I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of |
I'll think about this, but overall I'd prefer simplicity for defining Checkers - which may make magic behind the doors slightly more involved. |
I prototyped a "get the name of template argument as string" solution which is sufficient for our use cases (where the checker family is a non-templated class type) and works on all supported versions of all three compilers (GCC, clang, MSVC) that are supported for LLVM compilation: #include <iostream>
#include <string>
class MyFancyCheckerFamily {};
template <class T>
std::string getTypeName() {
#ifndef _MSC_VER
std::string res = __PRETTY_FUNCTION__;
int start = res.find('='), end = res.find(';');
if (end == -1)
end = res.find(']');
if (start == -1 || end == -1)
return "unknown-checker-family"; // paranoia, should not happen
return res.substr(start + 2, end - start - 2);
#else
std::string res = __FUNCSIG__;
int start = res.find("<class "), end = res.find('>');
if (start == -1 || end == -1)
return "unknown-checker-family"; // paranoia, should not happen
return res.substr(start + 7, end - start - 7);
#endif
}
int main() {
std::cout << getTypeName<MyFancyCheckerFamily>() << std::endl;
return 0;
} If I invoke this machinery from the registration functions of the checker families (where each checker type is available as a template argument), then it can "magically" ensure that This is a bit ugly, because there is no standard way to stringify a type, so I need to rely on the macros (By the way, note that "simple" single-part @steakhal (or anybody else) What do you think about this? |
I don't like this. LLVM is used as a library for wide set of tools and compiled with an even wider set of compilers. As a user, I wouldn't mind a solution like this, but as a library vendor, I'd vote very much against this. |
🤔 These strings are only relevant for debugging, so this whole machinery could be limited to debug builds (or builds with an off-by-default flag). That way these debug names would be still available for the very specific situation when somebody is debugging the analyzer, but wouldn't cause problems for anybody else. However, I'm also open to other ideas and I'd be happy to accept a better solution if you can find one. I hope that I can merge the |
I think I can't push to your branch. So here is what I'd do as git format-patch: 0001-NFC-Rename-getTagDescription-to-getDebugName.patch.txt |
Thanks for the patches! I originally thought that renaming I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:
However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the I also thought about tweaking |
I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this. Frankly, I would have preferred if the internal checker class name would be printed there instead of the user-facing name of the checker - but given that I despise defining this for every checker as a mandatory virtual function for checker families, I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.
I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered. |
You're correct wrt the current state of the code but I'm planning to get rid of many "backend/modeling" checkers that have no real purpose 😅 -- in my opinion having a superfluous checker definition in There are a few modeling checkers that fulfill a valid role because they act as dependencies of checkers defined in other files/classes, but I want to eliminate the pattern where a checker class defines a modeling checker which is the dependency of the other "real" checkers from the class (to be registered first) and does nothing useful.
Sorry for adding time pressure...
I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?) Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a |
I'd recommend you to look at the Just have a look at And thanks again for sticking around. I can understand if this thread already got far longer than it should have taken. |
For a moment I was very happy to see this solution, but unfortunately XXX is not the name of the checker class (= backend = full family), but instead it is just yet another kind of "internal" name that is assigned to each individual checker (= frontend = one checker from the family). You're right that for the simple single-part checkers this kind of name usually coincides with the checker class name, but it is still not a single name for the full checker family. For example consider the class
In this particular case using "debug name = internal name of the first checker part" would work (and it is a core checker so we can say that it won't be disabled), but if we have a checker family where all the parts are non-core checkers, then this solution cannot assign a single stable name to the whole family.
Don't worry about this -- your feedback is valuable (and it's completely understandable that you also need to work on different things) and I agree with you that it is important to implement this framework well even if it takes a bit more time. |
Accommodating the modeling checkers in the checker family frameworkI thought a bit about the place of the modeling checkers in the checker family framework (which is introduced in this PR). This question is a bit architectural, so in theory this post would "belong to" discourse, but I'm posting this here because I don't want to fragment the discussion. As far as I see currently the modeling checkers can fulfill two purposes:
Unfortunately the modeling checkers were introduced gradually in an ad hoc fashion, so this architecture has several drawbacks:
"Default" planIn my opinion this is the best and least disruptive way for handling the modeling checkers in the "checker family" framework:
This approach avoids disruptive changes and eliminates a significant amount of boilerplate code (all the superfluous checker families). Alternative plan
This approach would be more disruptive (we'd need to change the parsing of @steakhal @ anybody else What do you think about this question? How would you integrate modeling checkers into the checker family framework? |
To me what is important with this debug name is to be able to easily connect it with an implementation file to look at. This debug name should be the XXX part of a registerXXX fn generated for each checker (let it be modeling or reporting), and if it's a checker family, then the debug name of all the "parts" should be the name of the family. This would directly point to what implementation to look at no matter how the checker is surfaced to the users, under what aliases or what have we. Reflecting on your proposal: It's a great property if the exploded graph remains sort of the same no matter what checkers are enabled. It makes it easier to reason about it and when and why we have sink nodes, affecting what paths explored etc. So the problem was that we had to hack together multiple "checker frontends" and conditionally report a fatal bug or a sink node on an error. I reread both outlines solutions and both seemed better than what we have today. Neither are for free, but the second resonated more with me, and was aligned with my expectations. This makes me question why we want to improve the ergonomics of defining and organizing checkers? Is this the most important pain point we want to focus on right now? To me btw, this checker families and what we had before with raw bool flags are the same thing. It's just now a lot less ugly. |
I'm aware of this goal that a checker should emit sinks even if the reports are suppressed, and I agree that this would be useful as the effect of disabling a "well-behaving", trustworthy checker. However, I would guess that checkers are usually disabled when they mismodel the analyzed code and produce crazy garbage (or e.g. it's an
I agree and I hope that the checker family framework will provide a way towards implementing this.
Actually I introduced the class
No, this is not the most important pain point -- this was just annoying me personally, so I tried to fix it quickly and got carried away... I wouldn't say that this is completely useless (we will be able to build new functionality onto the new framework in a way that wouldn't be possible with the old swamp of hacks), but this isn't an urgent project.
I agree, this is after all a non-functional change -- I'm proposing it because I got fed up with ugly code.
I agree here as well. When I started to describe the "Default" and "Alternative" plans I initially felt that we need to pick between them right now; but as I thought about it more, it seems that they are not opposed and in fact the best path for eventually implementing the "Alternative" plan (which is indeed more appealing intuitively) would be first implementing the "Default" plan (which cleans up lots of hacks, but leaves the modeling checkers unchanged) and then later introducing the "Alternative" plan if we feel that we need it. Based on this I'll quickly incorporate your patches and the "use the internal classname-like checker name" logic onto the branch, and then I think we can merge it as a good enough implementation of the Default plan framework. |
This fragment is often the same as the class name.
In commit 13f4a30 I implemented your suggestion that the debug name of the checker name should be derived from the name fragment of the registration functions (which is called I pushed this commit mostly to show its convoluted nature, I would strongly prefer rejecting it, because it does not solve the fundamental issue (namely, that the debug name of the checker family is something "taken from" one subchecker and will change depending on which subcheckers are registered) and introduces several additional complications:
If I sort the possible implementations of
@steakhal What do you think about this list? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed answer to my previous reply.
I think we are completely aligned on that front.
Thank you for accepting my patches, and also for implementing exactly what I had in mind for the debug names.
It's awesome. I don't think it's too complicated or ugly TBH.
I agree it's not nice, but we can either merge this or discuss how else we could do something even better.
I'm happy as this PR looks right now, except for having that backward compatibility overload for plugins.
We can just move on with life and let them migrate once the next clang is out. To me, the upgrade path looks straight forward and this isn't the only API they will need to adjust anyway. They should brace for impact.
Now, to the alternative options you layed out.
(Thanks for thinking about this btw.)
(PS: Sorry about my seemingly unstructured response here to the list. To me, these options are not orthogonal and it makes it difficult to reflect.)
I'm fine with option 6, aka. just merge this PR as-is.
- Use the
CLASS
field of the checker specifications inCheckers.td
(as currently implemented by 13f4a30)
Very misleading, looks like a nice proper identifier (the class name) but may be different in rare cases.
Requires touching the fragile "register checkers from analyzer plugins" logic.
"Identifier" is not stable: depends on the set of currently registered subcheckers.
Could you give an example when it's misleading and not refers to a class name? I figured the tablegen construct we have ensures this.
We should be fine touching the plugin registration. This shouldn't be disruptive.
WDYM by that this identifier is not stable? We could have a tablegen property like ImplementedBy<"MyChecker">
for the few checker families we currently have, and say that by default they are implemented by CLASS
.
I reject options 1 and 2, as we already discussed.
I agree with your statement in option 3 about the identifier of a class. And I think I already layed out something like this part of the previous section in this reply.
I'm fine with option 4, but I want to avoid this if possible.
I reject option 5, as I find it a buggy implementation of option 6 (or option 4).
/// Adds a checker to the registry. This overload doesn't take a `DebugName` | ||
/// (which usually looks like `DivZeroChecker`), so it uses the user-facing | ||
/// `FullName` (which usually looks like ``core.DivideZero`) as a debug name. | ||
/// THIS IS DEPRECATED and is only provided to preserve compatibility with | ||
/// legacy plugins. | ||
/// TODO: Eventually remove this from the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can break compatibility. In fact, a clang plugin is only compatible with the same major version of clang already.
There is no such thing in Clang as API or ABI compatibility across major releases.
I'd recommend dropping this compatibility overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will remove this method variant from the codebase (I'll need to update a unit test).
Unfortunately this "no DebugName
, we need to pass the FullName
issue" also affects the templated variant of the method which is below this one, and that one is used by a dozen unit tests, so modifying it would be bothersome (I would like to skip updating it, but even if I do it perhaps it should be in a separate commit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you point me to some examples? That way we could use collective intelligence and maybe I could figure out a way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be straightforward to update the templated variant of addChecker
by adding an extra DebugName
parameter and updating all the code that calls it – but it is called from 20+ locations, so I felt that it would be bothersome to update them all:
clang/include/clang/StaticAnalyzer/Frontend/CheckerRegistry.h |141 Col 8| void addChecker(StringRef FullName, StringRef Desc, StringRef DocsUri,
clang/lib/Analysis/plugins/CheckerDependencyHandling/CheckerDependencyHandling.cpp |20 Col 12| registry.addChecker<Dependency>("example.Dependency", "", "");
clang/lib/Analysis/plugins/CheckerDependencyHandling/CheckerDependencyHandling.cpp |21 Col 12| registry.addChecker<DependendentChecker>("example.DependendentChecker", "",
clang/lib/Analysis/plugins/SampleAnalyzer/MainCallChecker.cpp |48 Col 12| registry.addChecker<MainCallChecker>(
clang/unittests/StaticAnalyzer/BugReportInterestingnessTest.cpp |122 Col 16| Registry.addChecker<InterestingnessTestChecker>("test.Interestingness",
clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp |623 Col 14| Registry.addChecker<CallDescChecker>("test.CallDescChecker", "Description",
clang/unittests/StaticAnalyzer/CallEventTest.cpp |58 Col 14| Registry.addChecker<CXXDeallocatorChecker>("test.CXXDeallocator",
clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |80 Col 14| Registry.addChecker<ExprEngineVisitPreChecker>("ExprEngineVisitPreChecker",
clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |89 Col 14| Registry.addChecker<ExprEngineVisitPostChecker>(
clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |98 Col 14| Registry.addChecker<MemAccessChecker>("MemAccessChecker", "Desc",
clang/unittests/StaticAnalyzer/ObjcBug-124477.cpp |39 Col 14| Registry.addChecker<FlipFlagOnCheckLocation>("test.FlipFlagOnCheckLocation",
clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp |47 Col 14| Registry.addChecker<CustomChecker>("test.CustomChecker", "Description", "");
clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp |76 Col 14| Registry.addChecker<CustomChecker>("test.LocIncDecChecker", "Description",
clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp |71 Col 14| Registry.addChecker<SimplifyChecker>("SimplifyChecker", "EmptyDescription",
clang/unittests/StaticAnalyzer/SValTest.cpp |160 Col 1| SVAL_TEST(GetConstType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |180 Col 1| SVAL_TEST(GetLocAsIntType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |202 Col 1| SVAL_TEST(GetSymExprType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |224 Col 1| SVAL_TEST(GetPointerType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |278 Col 1| SVAL_TEST(GetCompoundType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |330 Col 1| SVAL_TEST(GetStringType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |342 Col 1| SVAL_TEST(GetThisType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |359 Col 1| SVAL_TEST(GetFunctionPtrType, R"(
clang/unittests/StaticAnalyzer/SValTest.cpp |372 Col 1| SVAL_TEST(GetLabelType, R"(
clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp |52 Col 16| Registry.addChecker<TestReturnValueUnderConstructionChecker>(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.
And btw, could we assert that when adding a checker the debug names are unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.
Most of these are mock checkers where the debug name is completely irrelevant, they won't be used for actual analysis. In fact I created a method called addMockChecker(StringRef FullName)
which is sufficient for these simple situations.
And btw, could we assert that when adding a checker the debug names are unique?
Perhaps, although providing unique debug names for the mock checkers in the unit tests is pointless. Overall, the whole "debug name" concept is an extremely unimportant feature that is only used during manual investigation of bugs, so IMO it doesn't deserve complex assertions and we should stop making it the centerpiece of our architecture that demands reorganizations and constrains otherwise good plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.Most of these are mock checkers where the debug name is completely irrelevant, they won't be used for actual analysis. In fact I created a method called
addMockChecker(StringRef FullName)
which is sufficient for these simple situations.And btw, could we assert that when adding a checker the debug names are unique?
Perhaps, although providing unique debug names for the mock checkers in the unit tests is pointless. Overall, the whole "debug name" concept is an extremely unimportant feature that is only used during manual investigation of bugs, so IMO it doesn't deserve complex assertions and we should stop making it the centerpiece of our architecture that demands reorganizations and constrains otherwise good plans.
Makes sense. Lets not enforce this.
I first created commit 84eb5af which eliminated the compatibility overloads and introduced Unfortunately these changes add ±100 changed lines, which makes this commit even larger -- but at least I got to fix some minor problems (e.g. obsolete docs and one case of the
The second argument of the CHECKER("core.DivideZero", DivZeroChecker, "Check for division by zero", "https://...", false)
CHECKER("optin.taint.TaintedDiv", TaintedDivChecker, "Check for divisions ...", "https://...", false) and you can see that
Ok, noted -- I wasn't familiar with the plugin ecosystem.
Under this plan 6, the "identifier" = debug name of the checker family is the As the
As I described above, we would need these Based on these, I'm still pleading you to reconsider your rejection of option 1, because it is still the most clear solution to ensure that If the class name of the Moreover, placing the class name into the current CheckerFrontend-centric
Another reason for supporting plan 1 is that it is a good "placeholder" that leaves space for later improvements that could provide an actually good way of injecting the checker class name. For example in the future we could move towards my "alternative" plan, where I suggested that we should
In this system each checker family would correspond to a single lightweight |
I have yet another implementation idea where:
The minimal-boilerplate version looks like template <typename T>
StringRef getCheckerFamilyName();
#define REGISTER_CHECKER_FAMILY(NAME) template <> StringRef getCheckerFamilyName<NAME>() { return #NAME; }
// in the body of CheckerFamily
template <typename CHECKER>
static void _register(CHECKER *Chk, CheckerManager &Mgr) {
Chk->CheckerBackendName = getCheckerFamilyName<CHECKER>();
(CHECKs::_register(Chk, Mgr), ...);
}
// in each file that implements a checker family
REGISTER_CHECKER_FAMILY(MyBeautifulCheckerFamily) If we assume that each checker family class is mentioned in #define GET_CHECKERS
#define CHECKER(FULLNAME, CLASS, HELPTEXT, DOC_URI, IS_HIDDEN) template <> StringRef getCheckerFamilyName<typename CLASS>() { return #CLASS; }
#include "Checkers.inc"
#undef CHECKER
#undef GET_CHECKERS ... however this is a hack, and I would prefer explicit (Disclaimer: I wrote this at home without testing, so the code may need tweaks before it can be compiled.) @steakhal WDYT about this one? |
I don't think I have time for this. I'm really sorry. I read the first couple paragraphs then gave up. I'm really short of time. What would it take to merge this ASAP? |
I see, and I understand that this review process became much longer than the ideal. Unfortunately, I have major concerns with this PR and would not like to merge it as is. Based on your comments I suspect that you are misunderstanding the situation (especially about the contents of
Just please let me return to the trivial solution where each I would not be opposed to eliminate this one last piece of boilerplate in follow-up commits if there is a clear solution but the current state of this PR is NOT a clear solution. |
Fine, lets do that. I hope we can untangle this in the future. |
Thanks for your understanding – and sorry for stretching this review process for so long. I will push and merge the reduced commit on Monday (to be able to follow up its effect in the CI). I will probably also create a spinoff PR that contains some minor NFC code quality improvements (introducing |
The checker classes (i.e. classes derived from
CheckerBase
via the utility templateChecker<...>
) act as intermediates between the user and the analyzer engine, so they have two interfaces:ExplodedNode
s that are created by them. (TheseProgramPointTag
marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of
mutable std::unique_ptr<BugType>
members and redundant data members (when a checker used its customCheckNames
array and ignored the inherited singleName
).My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.
This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with
{parent checker object, index of part}
pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.This is implemented by separating the functionality of
CheckerBase
into two new classes:CheckerFrontend
andCheckerBackend
. The nameCheckerBase
is kept (as a class derived from bothCheckerFrontend
andCheckerBackend
), so "simple" checkers that useCheckerBase
andChecker<...>
continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:CheckerFamily<...>
works exactly likeChecker<...>
but inherits fromCheckerBackend
instead ofCheckerBase
, so it won't have a superfluous singleName
member.CheckerFamily
can freely own multipleCheckerFrontend
data members, which are enabled within the registration methods corresponding to their name and can be used to initialize theBugType
s that they can emit.In this scheme each
CheckerFamily
needs to override the pure virtual methodProgramPointTag::getTagDescription()
which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this
CheckerFamily
framework.