Skip to content

[clang] Constant-evaluate format strings as last resort #135864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
34 changes: 34 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,40 @@ related warnings within the method body.
``format_matches`` accepts an example valid format string as its third
argument. For more information, see the Clang attributes documentation.

- Clang can now verify format strings that can be constant-folded even if they
do not resolve to a string literal. For instance, all of these can now be
verified:

.. code-block:: c++

const char format[] = {'h', 'e', 'l', 'l', 'o', ' ', '%', 's', 0};
printf(format, "world");
// no warning

printf(format, 123);
// warning: format specifies type 'char *' but the argument has type 'int'

printf(("%"s + "i"s).c_str(), "world");
// warning: format specifies type 'int' but the argument has type 'char *'

When the format expression does not evaluate to a string literal, Clang
points diagnostics into a pseudo-file called ``<scratch space>`` that contains
the format string literal as it evaluated, like so:

.. code-block:: text

example.c:6:17: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]
6 | printf(format, 123);
| ~~~~~~ ^~~
<scratch space>:1:4: note: format string computed from non-literal expression
1 | "hello %s"
| ^~
| %d

This may mean that format strings which were previously unverified (or which
triggered ``-Wformat-nonliteral``) are now verified by ``-Wformat`` and its
allies.

- Introduced a new statement attribute ``[[clang::atomic]]`` that enables
fine-grained control over atomic code generation on a per-statement basis.
Supported options include ``[no_]remote_memory``,
Expand Down
28 changes: 24 additions & 4 deletions clang/include/clang/AST/Expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -791,10 +791,30 @@ class Expr : public ValueStmt {
const Expr *PtrExpression, ASTContext &Ctx,
EvalResult &Status) const;

/// If the current Expr can be evaluated to a pointer to a null-terminated
/// constant string, return the constant string (without the terminating
/// null).
std::optional<std::string> tryEvaluateString(ASTContext &Ctx) const;
/// Represents the result of a string compile-time evaluation query. Can tell
/// whether the expression evaluated to a string literal, which is often
/// preferable if diagnostics are involved (since a string literal has
/// source location info), and whether the string was null-terminated.
class StringEvalResult {
std::string Storage;
const StringLiteral *SL;
uint64_t Offset;
bool HasNullTerminator;

public:
StringEvalResult(std::string Contents, bool HasNullTerminator);
StringEvalResult(const StringLiteral *SL, uint64_t Offset,
bool HasNullTerminator);

llvm::StringRef getString() const;
bool hasNullTerminator() const { return HasNullTerminator; }
bool getStringLiteral(const StringLiteral *&SL, uint64_t &Offset) const;
};

/// If the current \c Expr can be evaluated to a pointer to a constant string,
/// return the constant string. The string may not be NUL-terminated.
std::optional<StringEvalResult>
tryEvaluateString(ASTContext &Ctx, bool InConstantContext = false) const;

/// Enumeration used to describe the kind of Null pointer constant
/// returned from \c isNullPointerConstant().
Expand Down
4 changes: 3 additions & 1 deletion clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -9268,7 +9268,7 @@ def err_expected_callable_argument : Error<
def note_building_builtin_dump_struct_call : Note<
"in call to printing function with arguments '(%0)' while dumping struct">;
def err_builtin_verbose_trap_arg : Error<
"argument to __builtin_verbose_trap must %select{be a pointer to a constant string|not contain $}0">;
"argument to '__builtin_verbose_trap' must %select{be a pointer to a constant null-terminated string|not contain $}0">;

def err_atomic_load_store_uses_lib : Error<
"atomic %select{load|store}0 requires runtime support that is not "
Expand Down Expand Up @@ -10354,6 +10354,8 @@ def warn_format_bool_as_character : Warning<
"using '%0' format specifier, but argument has boolean value">,
InGroup<Format>;
def note_format_string_defined : Note<"format string is defined here">;
def note_format_string_evaluated_to : Note<
"format string computed from non-literal expression">;
def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
def note_printf_c_str: Note<"did you mean to call the %0 method?">;
def note_format_security_fixit: Note<
Expand Down
178 changes: 143 additions & 35 deletions clang/lib/AST/ExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1920,9 +1920,23 @@ static bool EvaluateComplex(const Expr *E, ComplexValue &Res, EvalInfo &Info);
static bool EvaluateAtomic(const Expr *E, const LValue *This, APValue &Result,
EvalInfo &Info);
static bool EvaluateAsRValue(EvalInfo &Info, const Expr *E, APValue &Result);
static bool EvaluateStringAsLValue(EvalInfo &Info, const Expr *E,
QualType &CharTy, LValue &String);
static const StringLiteral *GetLValueAsStringLiteralAndOffset(EvalInfo &Info,
LValue &String,
QualType CharTy,
uint64_t &Offset);

/// Call \c Action (which must be like \c bool(int) ) on each character in a
/// string \c LValue . Iteration stops "normally" when \c Action returns
/// \c false . This function returns \c true if iteration stopped normally; if
/// it runs out of characters before \c Action breaks, it returns \c false.
template <typename CharAction>
static bool ForEachCharacterInStringLValue(EvalInfo &Info, const Expr *E,
QualType CharTy, LValue &String,
CharAction &&Action);
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info,
std::string *StringResult = nullptr);
EvalInfo &Info);

/// Evaluate an integer or fixed point expression into an APResult.
static bool EvaluateFixedPointOrInteger(const Expr *E, APFixedPoint &Result,
Expand Down Expand Up @@ -17950,19 +17964,45 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx,
return tryEvaluateBuiltinObjectSize(this, Type, Info, Result);
}

static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info, std::string *StringResult) {
if (!E->getType()->hasPointerRepresentation() || !E->isPRValue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better early exit would be:

if (!Ty->hasPointerRepresentation() && !Ty->canDecayToPointerType())
  return false;

This would eliminate the need for the else { return false; } below.

I think leaving the if (!E->isPRValue()) separate might be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, I don't love checking the same thing twice. I reorganized things a little differently to avoid the else { return false } branch.

static bool EvaluateStringAsLValue(EvalInfo &Info, const Expr *E,
QualType &CharTy, LValue &String) {
QualType Ty = E->getType();
if (!E->isPRValue())
return false;

LValue String;
if (Ty->canDecayToPointerType()) {
if (E->isGLValue()) {
if (!EvaluateLValue(E, String, Info))
return false;
} else {
APValue &Value = Info.CurrentCall->createTemporary(
E, Ty, ScopeKind::FullExpression, String);
if (!EvaluateInPlace(Value, Info, String, E))
return false;
}
// The result is a pointer to the first element of the array.
auto *AT = Info.Ctx.getAsArrayType(Ty);
CharTy = AT->getElementType();
if (auto *CAT = dyn_cast<ConstantArrayType>(AT))
String.addArray(Info, E, CAT);
else
String.addUnsizedArray(Info, E, CharTy);
return true;
}

if (!EvaluatePointer(E, String, Info))
return false;
if (Ty->hasPointerRepresentation()) {
if (!EvaluatePointer(E, String, Info))
return false;
CharTy = Ty->getPointeeType();
return true;
}

QualType CharTy = E->getType()->getPointeeType();
return false;
}

// Fast path: if it's a string literal, search the string value.
static const StringLiteral *
GetLValueAsStringLiteralAndOffset(EvalInfo &Info, LValue &String,
QualType CharTy, uint64_t &Offset) {
if (const StringLiteral *S = dyn_cast_or_null<StringLiteral>(
String.getLValueBase().dyn_cast<const Expr *>())) {
StringRef Str = S->getBytes();
Expand All @@ -17971,46 +18011,114 @@ static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
S->getCharByteWidth() == 1 &&
// FIXME: Add fast-path for wchar_t too.
Info.Ctx.hasSameUnqualifiedType(CharTy, Info.Ctx.CharTy)) {
Str = Str.substr(Off);

StringRef::size_type Pos = Str.find(0);
if (Pos != StringRef::npos)
Str = Str.substr(0, Pos);

Result = Str.size();
if (StringResult)
*StringResult = Str;
return true;
Offset = static_cast<uint64_t>(Off);
return S;
}

// Fall through to slow path.
}
return nullptr;
}

// Slow path: scan the bytes of the string looking for the terminating 0.
for (uint64_t Strlen = 0; /**/; ++Strlen) {
template <typename CharAction>
static bool ForEachCharacterInStringLValue(EvalInfo &Info, const Expr *E,
QualType CharTy, LValue &String,
CharAction &&Action) {
while (true) {
APValue Char;
if (!handleLValueToRValueConversion(Info, E, CharTy, String, Char) ||
!Char.isInt())
return false;
if (!Char.getInt()) {
Result = Strlen;
if (!Action(Char.getInt().getExtValue()))
return true;
} else if (StringResult)
StringResult->push_back(Char.getInt().getExtValue());
if (!HandleLValueArrayAdjustment(Info, E, String, CharTy, 1))
return false;
}
}

std::optional<std::string> Expr::tryEvaluateString(ASTContext &Ctx) const {
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info) {
LValue String;
QualType CharTy;
if (!EvaluateStringAsLValue(Info, E, CharTy, String))
return false;

// Fast path: if it's a string literal, search the string value.
uint64_t Off;
if (const auto *S =
GetLValueAsStringLiteralAndOffset(Info, String, CharTy, Off)) {
StringRef Str = S->getBytes().substr(Off);

StringRef::size_type Pos = Str.find(0);
if (Pos != StringRef::npos)
Str = Str.substr(0, Pos);

Result = Str.size();
return true;
}

// Slow path: scan the bytes of the string looking for the terminating 0.
Result = 0;
return ForEachCharacterInStringLValue(Info, E, CharTy, String, [&](int Char) {
if (Char) {
Result++;
return true;
} else
return false;
});
}

Expr::StringEvalResult::StringEvalResult(const StringLiteral *SL,
uint64_t Offset, bool NullTerm)
: SL(SL), Offset(Offset), HasNullTerminator(NullTerm) {}

Expr::StringEvalResult::StringEvalResult(std::string Contents, bool NullTerm)
: Storage(std::move(Contents)), SL(nullptr), Offset(0),
HasNullTerminator(NullTerm) {}

llvm::StringRef Expr::StringEvalResult::getString() const {
return SL ? SL->getBytes().substr(Offset) : Storage;
}

bool Expr::StringEvalResult::getStringLiteral(const StringLiteral *&SL,
uint64_t &Offset) const {
if (this->SL) {
SL = this->SL;
Offset = this->Offset;
return true;
}
return false;
}

std::optional<Expr::StringEvalResult>
Expr::tryEvaluateString(ASTContext &Ctx, bool InConstantContext) const {
Expr::EvalStatus Status;
EvalInfo Info(Ctx, Status, EvalInfo::EM_ConstantFold);
uint64_t Result;
std::string StringResult;
EvalInfo Info(Ctx, Status,
(InConstantContext &&
(Ctx.getLangOpts().CPlusPlus || Ctx.getLangOpts().C23))
? EvalInfo::EM_ConstantExpression
: EvalInfo::EM_ConstantFold);
Info.InConstantContext = InConstantContext;
LValue String;
QualType CharTy;
if (!EvaluateStringAsLValue(Info, this, CharTy, String))
return {};

uint64_t Off;
if (const auto *S =
GetLValueAsStringLiteralAndOffset(Info, String, CharTy, Off)) {
return StringEvalResult(S, Off, true);
}

std::string Result;
bool NTFound =
ForEachCharacterInStringLValue(Info, this, CharTy, String, [&](int Char) {
if (Char) {
Result.push_back(Char);
return true;
} else
return false;
});

if (EvaluateBuiltinStrLen(this, Result, Info, &StringResult))
return StringResult;
return {};
return StringEvalResult(Result, NTFound);
}

template <typename T>
Expand Down
5 changes: 3 additions & 2 deletions clang/lib/Analysis/UnsafeBufferUsage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -726,12 +726,13 @@ static bool hasUnsafeFormatOrSArg(const CallExpr *Call, const Expr *&UnsafeArg,
const Expr *Fmt = Call->getArg(FmtArgIdx);

if (auto *SL = dyn_cast<clang::StringLiteral>(Fmt->IgnoreParenImpCasts())) {
std::optional<Expr::StringEvalResult> SER;
StringRef FmtStr;

if (SL->getCharByteWidth() == 1)
FmtStr = SL->getString();
else if (auto EvaledFmtStr = SL->tryEvaluateString(Ctx))
FmtStr = *EvaledFmtStr;
else if ((SER = SL->tryEvaluateString(Ctx)))
FmtStr = SER->getString();
else
goto CHECK_UNSAFE_PTR;

Expand Down
5 changes: 3 additions & 2 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3477,8 +3477,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
llvm::DILocation *TrapLocation = Builder.getCurrentDebugLocation();
if (getDebugInfo()) {
TrapLocation = getDebugInfo()->CreateTrapFailureMessageFor(
TrapLocation, *E->getArg(0)->tryEvaluateString(getContext()),
*E->getArg(1)->tryEvaluateString(getContext()));
TrapLocation,
E->getArg(0)->tryEvaluateString(getContext())->getString(),
E->getArg(1)->tryEvaluateString(getContext())->getString());
}
ApplyDebugLocation ApplyTrapDI(*this, TrapLocation);
// Currently no attempt is made to prevent traps from being merged.
Expand Down
Loading