-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[-Wunsafe-buffer-usage] Warning Libc functions #101583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[-Wunsafe-buffer-usage] Warning Libc functions #101583
Conversation
@llvm/pr-subscribers-clang-analysis @llvm/pr-subscribers-clang Author: Ziqing Luo (ziqingluo-90) ChangesPatch is 30.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101583.diff 9 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
index 228b4ae1e3e11..32305d413749b 100644
--- a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
+++ b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsage.h
@@ -147,10 +147,15 @@ class UnsafeBufferUsageHandler {
virtual bool isSafeBufferOptOut(const SourceLocation &Loc) const = 0;
/// \return true iff unsafe uses in containers should NOT be reported at
- /// `Loc`; false otherwise.
+ /// `Loc`; false otherwise. Controlled by `-Wunsafe-buffer-usage-in-container`
virtual bool
ignoreUnsafeBufferInContainer(const SourceLocation &Loc) const = 0;
+ /// \return true iff unsafe uses in string_views should NOT be reported at
+ /// `Loc`; false otherwise. Controlled by `-Wunsafe-buffer-usage-in-string-view`
+ virtual bool
+ ignoreUnsafeBufferInStringView(const SourceLocation &Loc) const = 0;
+
virtual std::string
getUnsafeBufferUsageAttributeTextAt(SourceLocation Loc,
StringRef WSSuffix = "") const = 0;
diff --git a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
index 242ad763ba62b..7f3fd24b5cdc7 100644
--- a/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
+++ b/clang/include/clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def
@@ -18,6 +18,12 @@
#define WARNING_GADGET(name) GADGET(name)
#endif
+/// Make `WARNING_LIBC_GADGET` self a subset of WARNING_GADGET so that it can be
+/// tuned specially according to whether we warn `StringViewUnsafeConstructor`.
+#ifndef WARNING_LIBC_GADGET
+#define WARNING_LIBC_GADGET(name) WARNING_GADGET(name)
+#endif
+
/// A `WARNING_GADGET` subset, where the code pattern of each gadget
/// corresponds uses of a (possibly hardened) contatiner (e.g., `std::span`).
#ifndef WARNING_CONTAINER_GADGET
@@ -38,7 +44,9 @@ WARNING_GADGET(PointerArithmetic)
WARNING_GADGET(UnsafeBufferUsageAttr)
WARNING_GADGET(UnsafeBufferUsageCtorAttr)
WARNING_GADGET(DataInvocation)
+WARNING_LIBC_GADGET(UnsafeLibcFunctionCall)
WARNING_CONTAINER_GADGET(SpanTwoParamConstructor) // Uses of `std::span(arg0, arg1)`
+WARNING_CONTAINER_GADGET(StringViewUnsafeConstructor) // `std::string_view` constructed from std::string guarantees null-termination
FIXABLE_GADGET(ULCArraySubscript) // `DRE[any]` in an Unspecified Lvalue Context
FIXABLE_GADGET(DerefSimplePtrArithFixable)
FIXABLE_GADGET(PointerDereference)
@@ -52,5 +60,7 @@ FIXABLE_GADGET(PointerInit)
#undef FIXABLE_GADGET
#undef WARNING_GADGET
+#undef WARNING_LIBC_GADGET
#undef WARNING_CONTAINER_GADGET
+#undef WARNING_STRINGVIEW_GADGET
#undef GADGET
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index 19c3f1e043349..68413fab136fe 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -1551,7 +1551,8 @@ def HLSLAvailability : DiagGroup<"hlsl-availability">;
def ReadOnlyPlacementChecks : DiagGroup<"read-only-types">;
// Warnings and fixes to support the "safe buffers" programming model.
-def UnsafeBufferUsageInContainer : DiagGroup<"unsafe-buffer-usage-in-container">;
+def UnsafeBufferUsageInStringView : DiagGroup<"unsafe-buffer-usage-in-string-view">;
+def UnsafeBufferUsageInContainer : DiagGroup<"unsafe-buffer-usage-in-container", [UnsafeBufferUsageInStringView]>;
def UnsafeBufferUsage : DiagGroup<"unsafe-buffer-usage", [UnsafeBufferUsageInContainer]>;
// Warnings and notes related to the function effects system underlying
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 581434d33c5c9..713a3b10b413b 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -12388,6 +12388,9 @@ def note_safe_buffer_usage_suggestions_disabled : Note<
def warn_unsafe_buffer_usage_in_container : Warning<
"the two-parameter std::span construction is unsafe as it can introduce mismatch between buffer size and the bound information">,
InGroup<UnsafeBufferUsageInContainer>, DefaultIgnore;
+def warn_unsafe_buffer_usage_in_string_view : Warning<
+ "construct string_view from raw pointers does not guarantee null-termination, construct from std::string instead">,
+ InGroup<UnsafeBufferUsageInStringView>, DefaultIgnore;
#ifndef NDEBUG
// Not a user-facing diagnostic. Useful for debugging false negatives in
// -fsafe-buffer-usage-suggestions (i.e. lack of -Wunsafe-buffer-usage fixits).
diff --git a/clang/lib/Analysis/UnsafeBufferUsage.cpp b/clang/lib/Analysis/UnsafeBufferUsage.cpp
index 866222380974b..9f92b5acc4a61 100644
--- a/clang/lib/Analysis/UnsafeBufferUsage.cpp
+++ b/clang/lib/Analysis/UnsafeBufferUsage.cpp
@@ -9,20 +9,25 @@
#include "clang/Analysis/Analyses/UnsafeBufferUsage.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/Decl.h"
+#include "clang/AST/DeclCXX.h"
#include "clang/AST/Expr.h"
+#include "clang/AST/ExprCXX.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/AST/Stmt.h"
#include "clang/AST/StmtVisitor.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/ASTMatchers/ASTMatchers.h"
-#include "clang/Basic/CharInfo.h"
#include "clang/Basic/SourceLocation.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "llvm/ADT/APSInt.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Casting.h"
+#include <cstddef>
#include <memory>
#include <optional>
#include <queue>
@@ -242,11 +247,16 @@ AST_MATCHER_P(Stmt, notInSafeBufferOptOut, const UnsafeBufferUsageHandler *,
return !Handler->isSafeBufferOptOut(Node.getBeginLoc());
}
-AST_MATCHER_P(Stmt, ignoreUnsafeBufferInContainer,
+AST_MATCHER_P(Stmt, ignoreSpanTwoParamConstructor,
const UnsafeBufferUsageHandler *, Handler) {
return Handler->ignoreUnsafeBufferInContainer(Node.getBeginLoc());
}
+AST_MATCHER_P(Stmt, ignoreStringViewUnsafeConstructor,
+ const UnsafeBufferUsageHandler *, Handler) {
+ return Handler->ignoreUnsafeBufferInStringView(Node.getBeginLoc());
+}
+
AST_MATCHER_P(CastExpr, castSubExpr, internal::Matcher<Expr>, innerMatcher) {
return innerMatcher.matches(*Node.getSubExpr(), Finder, Builder);
}
@@ -443,6 +453,270 @@ AST_MATCHER(ArraySubscriptExpr, isSafeArraySubscript) {
return false;
}
+AST_MATCHER_P(CallExpr, isUnsafeLibcFunctionCall, internal::Matcher<Expr>,
+ hasUnsafeStringView) {
+ static const std::set<StringRef> PredefinedNames{
+ // numeric conversion:
+ "atof",
+ "atoi",
+ "atol",
+ "atoll",
+ "strtol",
+ "strtoll",
+ "strtoul",
+ "strtoull",
+ "strtof",
+ "strtod",
+ "strtold",
+ "strtoimax",
+ "strtoumax",
+ // "strfromf", "strfromd", "strfroml", // C23?
+ // string manipulation:
+ "strcpy",
+ "strncpy",
+ "strlcpy",
+ "strcat",
+ "strncat",
+ "strlcat",
+ "strxfrm",
+ "strdup",
+ "strndup",
+ // string examination:
+ "strlen",
+ "strnlen",
+ "strcmp",
+ "strncmp",
+ "stricmp",
+ "strcasecmp",
+ "strcoll",
+ "strchr",
+ "strrchr",
+ "strspn",
+ "strcspn",
+ "strpbrk",
+ "strstr",
+ "strtok",
+ // "mem-" functions
+ "memchr",
+ "wmemchr",
+ "memcmp",
+ "wmemcmp",
+ "memcpy",
+ "memccpy",
+ "mempcpy",
+ "wmemcpy",
+ "memmove",
+ "wmemmove",
+ "memset",
+ "wmemset",
+ // IO:
+ "fread",
+ "fwrite",
+ "fgets",
+ "fgetws",
+ "gets",
+ "fputs",
+ "fputws",
+ "puts",
+ // others
+ "strerror_s",
+ "strerror_r",
+ "bcopy",
+ "bzero",
+ "bsearch",
+ "qsort",
+ };
+
+ // A tiny name parser for unsafe libc function names with additional
+ // checks for `printf`s:
+ struct FuncNameMatch {
+ const CallExpr *const Call;
+ const bool hasUnsafeStringView;
+ FuncNameMatch(const CallExpr *Call, bool hasUnsafeStringView)
+ : Call(Call), hasUnsafeStringView(hasUnsafeStringView) {}
+
+ // For a name `S` in `PredefinedNames` or a member of the printf/scanf
+ // family, define matching function names with `S` by the grammar below:
+ //
+ // CoreName := S | S["wcs"/"str"]
+ // LibcName := CoreName | CoreName + "_s"
+ // MatchingName := "__builtin_" + LibcName |
+ // "__builtin___" + LibcName + "_chk" |
+ // "__asan_" + LibcName
+ //
+ // (Note S["wcs"/"str"] means substitute "str" with "wcs" in S.)
+ bool matchName(StringRef FunName) {
+ // Try to match __builtin_:
+ if (FunName.starts_with("__builtin_"))
+ // Then either it is __builtin_LibcName or __builtin___LibcName_chk or
+ // no match:
+ return matchLibcNameOrBuiltinChk(
+ FunName.drop_front(10 /* truncate "__builtin_" */));
+ // Try to match __asan_:
+ if (FunName.starts_with("__asan_"))
+ return matchLibcName(FunName.drop_front(7 /* truncate of "__asan_" */));
+ return matchLibcName(FunName);
+ }
+
+ // Parameter `Name` is the substring after stripping off the prefix
+ // "__builtin_".
+ bool matchLibcNameOrBuiltinChk(StringRef Name) {
+ if (Name.starts_with("__") && Name.ends_with("_chk"))
+ return matchLibcName(
+ Name.drop_front(2).drop_back(4) /* truncate "__" and "_chk" */);
+ return matchLibcName(Name);
+ }
+
+ bool matchLibcName(StringRef Name) {
+ if (Name.ends_with("_s"))
+ return matchCoreName(Name.drop_back(2 /* truncate "_s" */));
+ return matchCoreName(Name);
+ }
+
+ bool matchCoreName(StringRef Name) {
+ if (PredefinedNames.find(Name.str()) != PredefinedNames.end())
+ return !isSafeStrlen(Name); // match unless it's a safe strlen call
+
+ std::string NameWCS = Name.str();
+ size_t WcsPos = NameWCS.find("wcs");
+
+ while (WcsPos != std::string::npos) {
+ NameWCS[WcsPos++] = 's';
+ NameWCS[WcsPos++] = 't';
+ NameWCS[WcsPos++] = 'r';
+ WcsPos = NameWCS.find("wcs", WcsPos);
+ }
+ if (PredefinedNames.find(NameWCS) != PredefinedNames.end())
+ return !isSafeStrlen(NameWCS);
+ return matchPrintfOrScanfFamily(Name);
+ }
+
+ bool matchPrintfOrScanfFamily(StringRef Name) {
+ if (Name.ends_with("scanf"))
+ return true; // simply warn about scanf functions
+ if (!Name.ends_with("printf"))
+ return false; // neither printf nor scanf
+ if (Name.starts_with("v"))
+ // cannot check args for va_list, so `vprintf`s are treated as regular
+ // unsafe libc calls:
+ return true;
+
+ // Truncate "printf", focus on prefixes. There are different possible
+ // name prefixes: "k", "f", "s", "sn", "fw", ..., "snw". We strip off the
+ // 'w' and handle printfs differently by "k", "f", "s", "sn" or no prefix:
+ StringRef Prefix = Name.drop_back(6);
+
+ if (Prefix.ends_with("w"))
+ Prefix = Prefix.drop_back(1);
+ return isUnsafePrintf(Prefix);
+ }
+
+ // A pointer type expression is known to be null-terminated, if it has the
+ // form: E.c_str(), for any expression E of `std::string` type.
+ static bool isNullTermPointer(const Expr *Ptr,
+ bool UnsafeStringView = true) {
+ if (isa<StringLiteral>(Ptr->IgnoreParenImpCasts()))
+ return true;
+ if (auto *MCE = dyn_cast<CXXMemberCallExpr>(Ptr->IgnoreParenImpCasts())) {
+ const CXXMethodDecl *MD = MCE->getMethodDecl();
+ const CXXRecordDecl *RD = MCE->getRecordDecl()->getCanonicalDecl();
+
+ if (MD && RD && RD->isInStdNamespace()) {
+ if (MD->getName() == "c_str" && RD->getName() == "basic_string")
+ return true;
+ if (!UnsafeStringView && MD->getName() == "data" &&
+ RD->getName() == "basic_string_view")
+ return true;
+ }
+ }
+ return false;
+ }
+
+ // Check safe patterns for printfs w.r.t their prefixes:
+ bool isUnsafePrintf(StringRef Prefix /* empty, 'k', 'f', 's', or 'sn' */) {
+ const bool hasUnsafeStringView = this->hasUnsafeStringView;
+ auto AnyUnsafeStrPtr = [&hasUnsafeStringView](const Expr *Arg) -> bool {
+ return Arg->getType()->isPointerType() &&
+ !isNullTermPointer(Arg, hasUnsafeStringView);
+ };
+
+ if (Prefix.empty() || Prefix == "k") // printf: all pointer args should be null-terminated
+ return llvm::any_of(Call->arguments(), AnyUnsafeStrPtr);
+
+ if (Prefix == "f" && Call->getNumArgs() > 1) {
+ // fprintf, same as printf but skip the first arg
+ auto Range = llvm::make_range(Call->arg_begin() + 1, Call->arg_end());
+ return llvm::any_of(Range, AnyUnsafeStrPtr);
+ }
+ if (Prefix == "sn" && Call->getNumArgs() > 2) {
+ // The first two arguments need to be in safe patterns, which is checked
+ // by `isSafeSizedby`:
+ auto Range = llvm::make_range(Call->arg_begin() + 2, Call->arg_end());
+ return !isSafeSizedby(*Call->arg_begin(), *(Call->arg_begin() + 1)) ||
+ llvm::any_of(Range, AnyUnsafeStrPtr);
+ }
+ // Note `sprintf`s should be changed to `snprintf`s, so they are treated
+ // as regular unsafe libc calls:
+ return true;
+ }
+
+ // Checks if the two Exprs `SizedByPtr` and `Size` are in the pattern:
+ // SizedByPtr := DRE.data()
+ // Size := DRE.size_bytes(), for a same DRE of sized-container/view
+ // type.
+ bool isSafeSizedby(const Expr *SizedByPtr, const Expr *Size) {
+ static StringRef SizedObjs[] = {"span", "array", "vector",
+ "basic_string_view", "basic_string"};
+ if (auto *MCEPtr = dyn_cast<CXXMemberCallExpr>(SizedByPtr))
+ if (auto *MCESize = dyn_cast<CXXMemberCallExpr>(Size)) {
+ if (auto *DREPtr =
+ dyn_cast<DeclRefExpr>(MCEPtr->getImplicitObjectArgument()))
+ if (auto *DRESize =
+ dyn_cast<DeclRefExpr>(MCESize->getImplicitObjectArgument()))
+ if (DREPtr->getDecl() ==
+ DRESize->getDecl()) // coming out of the same variable
+ if (MCEPtr->getMethodDecl()->getName() == "data")
+ if (MCESize->getMethodDecl()->getName() == "size_bytes")
+ for (StringRef SizedObj : SizedObjs)
+ if (MCEPtr->getRecordDecl()->isInStdNamespace() &&
+ MCEPtr->getRecordDecl()
+ ->getCanonicalDecl()
+ ->getName() == SizedObj)
+ return true;
+ }
+ return false;
+ }
+
+ // This is safe: strlen("hello"). We don't want to be noisy on this case.
+ bool isSafeStrlen(StringRef Name) {
+ return Name == "strlen" && Call->getNumArgs() == 1 &&
+ isa<StringLiteral>(Call->getArg(0)->IgnoreParenImpCasts());
+ }
+ } FuncNameMatch{&Node, hasUnsafeStringView.matches(Node, Finder, Builder)};
+
+ const FunctionDecl *FD = Node.getDirectCallee();
+ const IdentifierInfo *II;
+
+ if (!FD)
+ return false;
+ II = FD->getIdentifier();
+ // If this is a special C++ name without IdentifierInfo, it can't be a
+ // C library function.
+ if (!II)
+ return false;
+
+ // Look through 'extern "C"' and anything similar invented in the future.
+ // If this function is not in TU directly, it is not a C library function.
+ if (!FD->getDeclContext()->getRedeclContext()->isTranslationUnit())
+ return false;
+
+ // If this function is not externally visible, it is not a C library function.
+ // Note that we make an exception for inline functions, which may be
+ // declared in header files without external linkage.
+ if (!FD->isInlined() && !FD->isExternallyVisible())
+ return false;
+ return FuncNameMatch.matchName(II->getName());
+}
} // namespace clang::ast_matchers
namespace {
@@ -778,6 +1052,47 @@ class SpanTwoParamConstructorGadget : public WarningGadget {
}
};
+class StringViewUnsafeConstructorGadget : public WarningGadget {
+ static constexpr const char *const Tag = "StringViewUnsafeConstructor";
+ const CXXConstructExpr *Ctor;
+
+public:
+ StringViewUnsafeConstructorGadget(const MatchFinder::MatchResult &Result)
+ : WarningGadget(Kind::StringViewUnsafeConstructor),
+ Ctor(Result.Nodes.getNodeAs<CXXConstructExpr>(Tag)) {}
+
+ static bool classof(const Gadget *G) {
+ return G->getKind() == Kind::StringViewUnsafeConstructor;
+ }
+
+ // Matches any `basic_string_view` constructor call that is not a copy
+ // constructor or on a string literal:
+ static Matcher matcher() {
+ auto isStringViewDecl = hasDeclaration(cxxConstructorDecl(
+ hasDeclContext(isInStdNamespace()), hasName("basic_string_view")));
+
+ return stmt(
+ cxxConstructExpr(
+ isStringViewDecl,
+ unless(
+ hasArgument(0, expr(ignoringParenImpCasts(stringLiteral())))),
+ unless(hasDeclaration(cxxConstructorDecl(isCopyConstructor()))))
+ .bind(Tag));
+ }
+
+ const Stmt *getBaseStmt() const override { return Ctor; }
+
+ DeclUseList getClaimedVarUseSites() const override {
+ // If the constructor call is of the form `std::basic_string_view{ptrVar,
+ // ...}`, `ptrVar` is considered an unsafe variable.
+ if (auto *DRE = dyn_cast<DeclRefExpr>(Ctor->getArg(0))) {
+ if (DRE->getType()->isPointerType() && isa<VarDecl>(DRE->getDecl()))
+ return {DRE};
+ }
+ return {};
+ }
+};
+
/// A pointer initialization expression of the form:
/// \code
/// int *p = q;
@@ -1025,6 +1340,26 @@ class DataInvocationGadget : public WarningGadget {
DeclUseList getClaimedVarUseSites() const override { return {}; }
};
+class UnsafeLibcFunctionCallGadget : public WarningGadget {
+ const CallExpr *const Call;
+ constexpr static const char *const Tag = "UnsafeLibcFunctionCall";
+
+public:
+ UnsafeLibcFunctionCallGadget(const MatchFinder::MatchResult &Result)
+ : WarningGadget(Kind::UnsafeLibcFunctionCall),
+ Call(Result.Nodes.getNodeAs<CallExpr>(Tag)) {}
+
+ static Matcher
+ matcher(ast_matchers::internal::Matcher<Expr> HasUnsafeStringView) {
+ return stmt(
+ callExpr(isUnsafeLibcFunctionCall(HasUnsafeStringView)).bind(Tag));
+ }
+
+ const Stmt *getBaseStmt() const override { return Call; }
+
+ DeclUseList getClaimedVarUseSites() const override { return {}; }
+};
+
// Represents expressions of the form `DRE[*]` in the Unspecified Lvalue
// Context (see `isInUnspecifiedLvalueContext`).
// Note here `[]` is the built-in subscript operator.
@@ -1447,10 +1782,13 @@ findGadgets(const Decl *D, const UnsafeBufferUsageHandler &Handler,
#define WARNING_GADGET(x) \
allOf(x ## Gadget::matcher().bind(#x), \
notInSafeBufferOptOut(&Handler)),
+#define WARNING_LIBC_GADGET(x) \
+ allOf(x ## Gadget::matcher(ignoreStringViewUnsafeConstructor(&Handler)).bind(#x), \
+ notInSafeBufferOptOut(&Handler)),
#define WARNING_CONTAINER_GADGET(x) \
allOf(x ## Gadget::matcher().bind(#x), \
notInSafeBufferOptOut(&Handler), \
- unless(ignoreUnsafeBufferInContainer(&Handler))),
+ unless(ignore ## x(&Handler))),
#include "clang/Analysis/Analyses/UnsafeBufferUsageGadgets.def"
// Avoid a hanging comma.
unless(stmt())
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 0f604c61fa3af..04ad798b809b2 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -2256,6 +2256,18 @@ class Unsafe...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
979910f
to
2dff88d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo that's a lot of functions!
First round of comments, will try to look at the next commit tomorrow.
// LibcName := CoreName | CoreName + "_s" | ||
// MatchingName := "__builtin_" + LibcName | | ||
// "__builtin___" + LibcName + "_chk" | | ||
// "__asan_" + LibcName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For builtins, I think in compiler proper we're supposed to consult Call->getBuiltinCallee()
and then match the BuiltinID produced this way to the list in clang/Basic/Builtins.def
.
But that's like, incredibly tedious for a list of that size. There's probably a shortcut we can take right? (... right?) Like, maybe double-check that this is "a" builtin by confirming that its BuiltinID is non-zero, but then continue matching by name?
|
||
// Look through 'extern "C"' and anything similar invented in the future. | ||
// If this function is not in TU directly, it is not a C library function. | ||
if (!FD->getDeclContext()->getRedeclContext()->isTranslationUnit()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also consider that some of these functions are "redeclared" in the std::
namespace. As in, std::printf
is a valid way to invoke printf
.
} | ||
|
||
void f(char * p, char * q, std::span<char> s) { | ||
memcpy(); // expected-warning{{function introduces unsafe buffer manipulation}} expected-note{{pass -fsafe-buffer-usage-suggestions to receive code hardening suggestions}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly in favor of emitting a specialized warning message here, given how much custom logic we're implementing for these functions.
Or, at the very least, it may be a good idea to include the name of the function in the message, no matter what caused it to be displayed.
const CXXMethodDecl *MD = MCE->getMethodDecl(); | ||
const CXXRecordDecl *RD = MCE->getRecordDecl()->getCanonicalDecl(); | ||
|
||
if (MD && RD && RD->isInStdNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this correctly covers both std::basic_string
and std::__1::basic_string
but it might be a good idea to double-check and add a test because I remember encountering problems with this in the past. (https://stackoverflow.com/questions/29293394/where-does-the-1-symbol-come-from-when-using-llvms-libc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider re-wording to: "string_view construction with raw pointers does not guarantee null-termination, construct with std::string instead"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we probably don't need this warning anymore but this is really a great catch of wording issues!
0d2d25e
to
979619e
Compare
FuncNameMatch::ResultKind RK = | ||
FuncNameMatch.matchName(II->getName(), FD->getBuiltinID()); | ||
|
||
// Bind extra strings for additional information passing to Gadgets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this, effectively, a way to implement a custom matcher that has a .bind()
directive in a custom position?
Can we refactor this code into three different matchers instead, just to make the use sites of this matcher more readable, make it easier to tell which .bind()
directives are present? Or would it be too annoying to deduplicate code this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, maybe we can make something more elegant like
stmt(unsafeLibcCall(anyOf(unsafeSprintf().bind(...), unsafePrintfs().bind(...), unsafeOthers().bind(...))))
How about I do this re-factoring in a follow-up PR once this one is good? Otherwise it would change the code a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this, effectively, a way to implement a custom matcher that has a
.bind()
directive in a custom position?Can we refactor this code into three different matchers instead, just to make the use sites of this matcher more readable, make it easier to tell which
.bind()
directives are present? Or would it be too annoying to deduplicate code this way?
I refactored the code, please take another look @haoNoQ
Warning about calls to libc functions involving buffer access. Warned functions are hardcoded by names. (rdar://117182250)
f92da0f
to
a7a3aff
Compare
- Predefined name macros such as `__PRETTY_FUNCTION__` are considered safe. - We need to distinguish between `%s` and `%p` as both accept pointers but the latter is not a buffer operation. This leaves us no choice other than parsing the format string. Fortunately, the building blocks of format parsing have already existed and are quite handy.
a7a3aff
to
e9a5b7a
Compare
StringRef Name = Call->getDirectCallee()->getName(); | ||
|
||
S.Diag(Call->getBeginLoc(), diag::warn_unsafe_buffer_libc_call) | ||
<< Name << Call->getSourceRange(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pass const Decl *D = Call->getDirectCallee()
directly into S.Diag()
. It even automatically adds quotes around the name this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Maintaining the invariant that all matchers under | ||
// `libc_fun_disjoint_inner_matchers` are disjoint. | ||
AST_MATCHER_P(CallExpr, predefinedUnsafeLibcFunCall, StringRef, CoreName) { | ||
static const std::set<StringRef> PredefinedNames{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have nice things because of https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors - if we need this to store this as a set (and I think we do because of the sheer size of it), we have to store it as a raw pointer and lazy-initialize it on first use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/// \param E The end of the C format string; | ||
/// \param ArgIndex The argument index of the last found `s` specifier; Or the | ||
/// argument index of the formatter in initial case. | ||
unsigned ParseFormatStringFirstSArgIndex(const char *&I, const char *E, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The *&
part is probably going to be very confusing to the users of this utility function.
Maybe just use the handler directly in our code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clang/lib/AST/PrintfFormatString.cpp
Outdated
ParsePrintfSpecifier(H, I, E, argIndex, LO, Target, false, false); | ||
// Did a fail-stop error of any kind occur when parsing the specifier? | ||
// If so, don't do any more processing. | ||
if (FSR.shouldStop()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this any easier than simply overriding HandlePrintfSpecifier()
and HandleScanfSpecifier()
in FormatStringHandler
to tell you everything you need in a simple mostly-statically-typed manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}; | ||
|
||
// Match predefined names: | ||
if (PredefinedNames.find(CoreName.str()) != PredefinedNames.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, .str()
is an O(N) copy of the string. Which isn't a lot but llvm::StringSet
would probably work much better here given how it works with StringRef
s directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
static bool isNullTermPointer(const Expr *Ptr) { | ||
if (isa<StringLiteral>(Ptr->IgnoreParenImpCasts())) | ||
return true; | ||
if (isa<PredefinedExpr>(Ptr->IgnoreParenImpCasts())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked up PredefinedExpr
on Doxygen and was very confused about what did they mean by "A predefined identifier such as func". Turns out they meant __func__
but it got markdown'd 😄
Yeah nice one, this has to be safe.
// In this case, this call is considered unsafe if at least one argument | ||
// (including the format argument) is unsafe pointer. | ||
return llvm::any_of( | ||
llvm::make_range(Call->arg_begin() + FmtArgIdx, Call->arg_end()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think llvm::any_of(llvm::make_range(b, e, λ))
is equivalent to std::any_of(b, e, λ)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it seems doesn't work that way🤔
} | ||
// If format is not a string literal, we cannot analyze the format string. | ||
// In this case, this call is considered unsafe if at least one argument | ||
// (including the format argument) is unsafe pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that it's not particularly important whether the target-buffer argument of sprintf
/snprintf
is null-terminated. They don't consider a null terminator when they print, they treat it as a simple buffer. Both of them are equally happy "printing" into a buffer filled with zeros or a completely uninitialized buffer. They won't treat each zero in a zero-filled buffer as a terminator and they won't have a problem with an uninitialized buffer even if it never had a null terminator.
Though, in our case, we don't really consider a string null-terminated if it's not constant. And a constant string can't be a target string. So there's no real bug here, just a bit surprising to read and may cause problems in the future if we decide to relax the conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that it's not particularly important whether the target-buffer argument of
sprintf
/snprintf
is null-terminated. They don't consider a null terminator when they print, they treat it as a simple buffer.
Yes, this is why the checking range is llvm::make_range(Call->arg_begin() + FmtArgIdx, Call->arg_end()),
. It starts from the format string argument to the rest of them. If it's snprintf
, it starts from the 3rd argument, skipping the first two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, in our case, we don't really consider a string null-terminated if it's not constant.
Then is this message "use 'std::string::c_str' or string literal as string pointer to guarantee null-termination" delivering correct information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, not a problem then.
// The application results in an inner matcher that matches the call node with | ||
// respect to the core name of the callee. | ||
AST_MATCHER_P(CallExpr, ignoreLibcPrefixAndSuffix, | ||
std::function<internal::Matcher<CallExpr>(StringRef)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're dynamically generating the inner matcher in the middle of the match? I probably need to process this more, and it's likely that there could be a more "compile-time" way to do this, let me think 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this part.
06294be
to
fa21260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! I don't have major concerns.
// va-list, which cannot be checked at compile-time: | ||
callExpr(callee(functionDecl( | ||
libc_func_matchers::isUnsafeVaListPrintfFunc()))) | ||
.bind(UnsafeVaListTag), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you care which node carries the tag, only that the tag exists in the match result, right? So if you attach the .bind()
to functionDecl()
instead, would it help you simplify the code like this?:
callExpr(callee(anyOf(
functionDecl(...),
functionDecl(...).bind(UnsafeVaListTag),
...
)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's a good point!
return hasUnsafeFormatOrSArg(&Node, 2, Ctx, false); | ||
} | ||
// It is printf but the format string is passed by pointer. The only thing we | ||
// can do is to require all pointers to be null-terminated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This emits false positives in presence of %p
right?
Should we outright ban passing format string by pointer instead? (We already have a warning for this: -Wformat-nonliteral
. Felix worked on it so it's likely that we can simply recommend it to be enabled alongside -Wunsafe-buffer-usage
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could mention this in the guidance.
} | ||
// If format is not a string literal, we cannot analyze the format string. | ||
// In this case, this call is considered unsafe if at least one argument | ||
// (including the format argument) is unsafe pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, not a problem then.
|
||
if (0 < ArgIdx && ArgIdx < Call->getNumArgs()) | ||
if (!isNullTermPointer(Call->getArg(ArgIdx))) | ||
return false; // stop parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't just stop parsing, it also changes the return value of ParsePrintfString()
from false
to true
right? This is how you get away without storing the intermediate answer anywhere in the visitor.
I think it's a valid way to use it but it probably deserves a slightly more verbose comment.
InGroup<UnsafeBufferUsage>, DefaultIgnore; | ||
def note_unsafe_buffer_printf_call : Note< | ||
"%select{| change to 'snprintf' for explicit bounds checking | buffer pointer and size may not match" | ||
"| use 'std::string::c_str' or string literal as string pointer to guarantee null-termination" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this warning in particular, it's valuable to point the user to the specific %s
argument. To, at least, make sure that they know we don't mean the snprintf's target string or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if -Wunsafe-buffer-usage
is enabled in C (for any obscure reason), would we still recommend c_str
here? Maybe this recommendation should be guarded by the "emit suggestions" flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep this patch minimal then I suggest splitting the note off as a follow-up improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Communicating what parameters are used in an unsafe way wouldn't be strictly necessary for the initial patch if the warning as is pushed the user to do the right thing.
Have we tried getting data from a real project?
I can also imagine that for some functions it won't be as simple as saying "pointer parameter 1, 3 and 5" as it could be interplay between pointers, integer parameters and buffer content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tried getting data from a real project?
Yes! Basically these special handlings are all corresponding to actual cases in our downstream projects.
I think for this warning in particular, it's valuable to point the user to the specific %s argument. To, at least, make sure that they know we don't mean the snprintf's target string or something.
I will let the notes point to specific argument source locations. For better note messages, we could do it in a follow-up patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also imagine that for some functions it won't be as simple as saying "pointer parameter 1, 3 and 5" as it could be interplay between pointers, integer parameters and buffer content.
Not sure if I understand your concern. Could you give an example? @jkorous-apple
printf("%s%d", p, *p); // expected-warning{{function 'printf' introduces unsafe buffer access}} expected-note{{use 'std::string::c_str' or string literal as string pointer to guarantee null-termination}} | ||
sprintf(q, "%s%d", "hello", *p); // expected-warning{{function 'sprintf' introduces unsafe buffer access}} expected-note{{change to 'snprintf' for explicit bounds checking}} | ||
swprintf(q, "%s%d", "hello", *p); // expected-warning{{function 'swprintf' introduces unsafe buffer access}} expected-note{{change to 'snprintf' for explicit bounds checking}} | ||
snprintf(q, 10, "%s%d", "hello", *p); // expected-warning{{function 'snprintf' introduces unsafe buffer access}} expected-note{{buffer pointer and size may not match}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we (and/or should we) support
char arr[10];
snprintf(arr, sizeof(arr), ...);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this case actually happens a lot in our downstream projects. I planed to handle it in a follow-up patch: #105383
|
||
/* Test printfs */ | ||
fprintf((FILE*)p, "%s%d", p, *p); // expected-warning{{function 'fprintf' introduces unsafe buffer access}} expected-note{{use 'std::string::c_str' or string literal as string pointer to guarantee null-termination}} | ||
printf("%s%d", p, *p); // expected-warning{{function 'printf' introduces unsafe buffer access}} expected-note{{use 'std::string::c_str' or string literal as string pointer to guarantee null-termination}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to "... or a string literal .."
snprintf(s.data(), s2.size(), "%s%d", "hello", *p); // expected-warning{{function 'snprintf' introduces unsafe buffer access}} expected-note{{buffer pointer and size may not match}} | ||
snwprintf(s.data(), s2.size(), "%s%d", "hello", *p); // expected-warning{{function 'snwprintf' introduces unsafe buffer access}} expected-note{{buffer pointer and size may not match}} | ||
snwprintf_s(s.data(), s2.size(), "%s%d", "hello", *p); // expected-warning{{function 'snwprintf_s' introduces unsafe buffer access}} expected-note{{buffer pointer and size may not match}} | ||
vsnprintf(s.data(), s.size_bytes(), "%s%d", "hello", *p); // expected-warning{{function 'vsnprintf' introduces unsafe buffer access}} expected-note{{do not use va_list that cannot be checked at compile-time for bounds safety}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to "... va_list, which cannot ..."
@@ -2256,6 +2256,17 @@ class UnsafeBufferUsageReporter : public UnsafeBufferUsageHandler { | |||
Range = UO->getSubExpr()->getSourceRange(); | |||
MsgParam = 1; | |||
} | |||
} else if (const auto *CtorExpr = dyn_cast<CXXConstructExpr>(Operation)) { | |||
if (CtorExpr->getConstructor()->getCanonicalDecl()->getNameAsString() == | |||
"span") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore if the code above already takes care of this but can we compare the fully qualified name against "std::span" to make sure that we don't get tripped up by classes like my::nonstd::span
?
InGroup<UnsafeBufferUsage>, DefaultIgnore; | ||
def note_unsafe_buffer_printf_call : Note< | ||
"%select{| change to 'snprintf' for explicit bounds checking | buffer pointer and size may not match" | ||
"| use 'std::string::c_str' or string literal as string pointer to guarantee null-termination" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep this patch minimal then I suggest splitting the note off as a follow-up improvement.
InGroup<UnsafeBufferUsage>, DefaultIgnore; | ||
def note_unsafe_buffer_printf_call : Note< | ||
"%select{| change to 'snprintf' for explicit bounds checking | buffer pointer and size may not match" | ||
"| use 'std::string::c_str' or string literal as string pointer to guarantee null-termination" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Communicating what parameters are used in an unsafe way wouldn't be strictly necessary for the initial patch if the warning as is pushed the user to do the right thing.
Have we tried getting data from a real project?
I can also imagine that for some functions it won't be as simple as saying "pointer parameter 1, 3 and 5" as it could be interplay between pointers, integer parameters and buffer content.
Addressed comments. @haoNoQ please take another look. |
Btw a question about the new warning: So with -Wunsafe-buffer-usage-in-libc-call clang now warns on the following?
It says
Is that as expected? If so, how should snprintf be used to avoid the warning? |
The commit d7dd2c4 crashes for such an example: ``` void printf() { printf(); } ``` Because it assumes `printf` must have arguments. This commit fixes this issue. (rdar://117182250)
Thanks for finding the bug and making a reproducer. It has been fixed in de88d7d. |
Yes, this is expected. According to the C++ Safe Buffers programming model, buffer pointers should be changed to |
But this new warning is given also when compiling C code. As in the example from @mikaelholmen. So I doubt that the C++ programming model applies. |
But as @bjope said, we get the warning also for C code, even if I explicitly say e.g. "-std=c11". |
Which is kind of the point of |
… Warning Libc functions (#101583)" StringLiteral::getString() is not applicable to strings of wide characters. Added handling for that. (rdar://117182250)
… Warning Libc functions (llvm#101583)" StringLiteral::getString() is not applicable to strings of wide characters. Added handling for that. (rdar://117182250)
|
||
assert(PtrTy->isPointerType()); | ||
|
||
QualType PteTy = (cast<PointerType>(PtrTy))->getPointeeType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast<PointerType>(PtrTy)
is only safe to do if isa<PointerType>(PtrTy)
is true. PtrTy->isPointerType()
is true also for various sugar types that wrap PointerType
. To unwrap them, use PtrTy->getAs<PointerType>()
.
[-Wunsafe-buffer-usage] Add warn on unsafe calls to libc functions Warning about calls to libc functions involving buffer access. Warned functions are hardcoded by names. (rdar://117182250) (cherry picked from commit 0fffdeb)
… Warning Libc functions (llvm#101583)" StringLiteral::getString() is not applicable to strings of wide characters. Added handling for that. (rdar://117182250) (cherry picked from commit 48498ec)
The first commit is the basic warning for libc functions that does not assume
string_view
guarantees null-termination or warn aboutstring_view
construction.