Skip to content

[clang] Constant-evaluate format strings as last resort #135864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,40 @@ related warnings within the method body.
``format_matches`` accepts an example valid format string as its third
argument. For more information, see the Clang attributes documentation.

- Clang can now verify format strings that can be constant-folded even if they
do not resolve to a string literal. For instance, all of these can now be
verified:

.. code-block:: c++

const char format[] = {'h', 'e', 'l', 'l', 'o', ' ', '%', 's', 0};
printf(format, "world");
// no warning

printf(format, 123);
// warning: format specifies type 'char *' but the argument has type 'int'

printf(("%"s + "i"s).c_str(), "world");
// warning: format specifies type 'int' but the argument has type 'char *'

When the format expression does not evaluate to a string literal, Clang
points diagnostics into a pseudo-file called ``<scratch space>`` that contains
the format string literal as it evaluated, like so:

.. code-block:: text

example.c:6:17: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]
6 | printf(format, 123);
| ~~~~~~ ^~~
<scratch space>:1:4: note: format string computed from non-literal expression
1 | "hello %s"
| ^~
| %d

This may mean that format strings which were previously unverified (or which
triggered ``-Wformat-nonliteral``) are now verified by ``-Wformat`` and its
allies.

- Introduced a new statement attribute ``[[clang::atomic]]`` that enables
fine-grained control over atomic code generation on a per-statement basis.
Supported options include ``[no_]remote_memory``,
Expand Down
24 changes: 20 additions & 4 deletions clang/include/clang/AST/Expr.h
Original file line number Diff line number Diff line change
Expand Up @@ -791,10 +791,26 @@ class Expr : public ValueStmt {
const Expr *PtrExpression, ASTContext &Ctx,
EvalResult &Status) const;

/// If the current Expr can be evaluated to a pointer to a null-terminated
/// constant string, return the constant string (without the terminating
/// null).
std::optional<std::string> tryEvaluateString(ASTContext &Ctx) const;
class StringEvalResult {
std::string Storage;
const StringLiteral *SL;
uint64_t Offset;

public:
StringEvalResult(std::string Contents);
StringEvalResult(const StringLiteral *SL, uint64_t Offset);

llvm::StringRef getString() const;
bool getStringLiteral(const StringLiteral *&SL, uint64_t &Offset) const;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that just an aggregate, that we could use with structured binding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a polite union of std::string and StringLiteral: Storage is empty when SL is set, and vice versa. getString() encapsulates useful logic to decide how to get a StringRef back, which I think justifies having a separate type instead of an aggregate. I will add a comment on top of it.


/// If the current \c Expr can be evaluated to a pointer to a constant string,
/// return the constant string. The string may not be NUL-terminated. If
/// \c NullTerminated is supplied, it is set to whether there is at least one
/// NUL character in the string.
std::optional<StringEvalResult>
tryEvaluateString(ASTContext &Ctx, bool *NullTerminated = nullptr,
bool InConstantContext = false) const;

/// Enumeration used to describe the kind of Null pointer constant
/// returned from \c isNullPointerConstant().
Expand Down
4 changes: 3 additions & 1 deletion clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -9268,7 +9268,7 @@ def err_expected_callable_argument : Error<
def note_building_builtin_dump_struct_call : Note<
"in call to printing function with arguments '(%0)' while dumping struct">;
def err_builtin_verbose_trap_arg : Error<
"argument to __builtin_verbose_trap must %select{be a pointer to a constant string|not contain $}0">;
"argument to '__builtin_verbose_trap' must %select{be a pointer to a constant null-terminated string|not contain $}0">;

def err_atomic_load_store_uses_lib : Error<
"atomic %select{load|store}0 requires runtime support that is not "
Expand Down Expand Up @@ -10354,6 +10354,8 @@ def warn_format_bool_as_character : Warning<
"using '%0' format specifier, but argument has boolean value">,
InGroup<Format>;
def note_format_string_defined : Note<"format string is defined here">;
def note_format_string_evaluated_to : Note<
"format string computed from non-literal expression">;
def note_format_fix_specifier : Note<"did you mean to use '%0'?">;
def note_printf_c_str: Note<"did you mean to call the %0 method?">;
def note_format_security_fixit: Note<
Expand Down
176 changes: 141 additions & 35 deletions clang/lib/AST/ExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1920,9 +1920,17 @@ static bool EvaluateComplex(const Expr *E, ComplexValue &Res, EvalInfo &Info);
static bool EvaluateAtomic(const Expr *E, const LValue *This, APValue &Result,
EvalInfo &Info);
static bool EvaluateAsRValue(EvalInfo &Info, const Expr *E, APValue &Result);
static bool EvaluateStringAsLValue(EvalInfo &Info, const Expr *E,
QualType &CharTy, LValue &String);
static const StringLiteral *StringLValueIsLiteral(EvalInfo &Info,
LValue &String,
QualType CharTy,
uint64_t &Offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty confusing interface.
Can we have for example?

  GetLValueAsStringLiteral(LValue) -> StringLiteral*;
  EvaluateStringLiteral(StringLiteral*, Offset) -> unsigned;
```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to rename StringLValueIsLiteral to GetLValueAsStringLiteral, but I don't understand what is being asked for EvaluateStringLiteral. What is being evaluated and what is the unsigned returned value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I meant EvaluateStringLiteralSize - ie split the method in 2. Pseudo code

if(StringLiteral* SL = GetLValueAsStringLiteral(LValue)) 
    Size =  EvaluateStringLiteralSize(SL, Offset);

I believe the second method never fails but I might be wrong about that, use optional if it can.
Does that make some sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation makes sense now, thank you. From what I can tell, though, GetLValueAsStringLiteral would just be dyn_cast_or_null<StringLiteral>(String.getLValueBase().dyn_cast<const Expr *>()), and EvaluateStringLiteralSize would be the rest of the current StringLValueIsLiteral. It would also need to take all of the current EvalInfo, LValue and QualType arguments instead of the StringLiteral value. I think this makes the separation rather awkward. I'm still planning to rename it to GetLValueAsStringLiteralAndOffset or something like that.

template <typename CharAction>
static bool IterateStringLValue(EvalInfo &Info, const Expr *E, QualType CharTy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment? Maybe call it ForEachElementInStringExpression or something

LValue &String, CharAction &&Action);
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info,
std::string *StringResult = nullptr);
EvalInfo &Info);

/// Evaluate an integer or fixed point expression into an APResult.
static bool EvaluateFixedPointOrInteger(const Expr *E, APFixedPoint &Result,
Expand Down Expand Up @@ -17950,19 +17958,46 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx,
return tryEvaluateBuiltinObjectSize(this, Type, Info, Result);
}

static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info, std::string *StringResult) {
if (!E->getType()->hasPointerRepresentation() || !E->isPRValue())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better early exit would be:

if (!Ty->hasPointerRepresentation() && !Ty->canDecayToPointerType())
  return false;

This would eliminate the need for the else { return false; } below.

I think leaving the if (!E->isPRValue()) separate might be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, I don't love checking the same thing twice. I reorganized things a little differently to avoid the else { return false } branch.

static bool EvaluateStringAsLValue(EvalInfo &Info, const Expr *E,
QualType &CharTy, LValue &String) {
QualType Ty = E->getType();
if (!E->isPRValue())
return false;

LValue String;
if (Ty->canDecayToPointerType()) {
if (E->isGLValue()) {
if (!EvaluateLValue(E, String, Info))
return false;
} else {
APValue &Value = Info.CurrentCall->createTemporary(
E, Ty, ScopeKind::FullExpression, String);
if (!EvaluateInPlace(Value, Info, String, E))
return false;
}
// The result is a pointer to the first element of the array.
auto *AT = Info.Ctx.getAsArrayType(Ty);
CharTy = AT->getElementType();
if (auto *CAT = dyn_cast<ConstantArrayType>(AT))
String.addArray(Info, E, CAT);
else
String.addUnsizedArray(Info, E, CharTy);
return true;
}

if (!EvaluatePointer(E, String, Info))
return false;
if (Ty->hasPointerRepresentation()) {
if (!EvaluatePointer(E, String, Info))
return false;
CharTy = Ty->getPointeeType();
return true;
}

QualType CharTy = E->getType()->getPointeeType();
return false;
}

// Fast path: if it's a string literal, search the string value.
static const StringLiteral *StringLValueIsLiteral(EvalInfo &Info,
LValue &String,
QualType CharTy,
uint64_t &Offset) {
if (const StringLiteral *S = dyn_cast_or_null<StringLiteral>(
String.getLValueBase().dyn_cast<const Expr *>())) {
StringRef Str = S->getBytes();
Expand All @@ -17971,46 +18006,117 @@ static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
S->getCharByteWidth() == 1 &&
// FIXME: Add fast-path for wchar_t too.
Info.Ctx.hasSameUnqualifiedType(CharTy, Info.Ctx.CharTy)) {
Str = Str.substr(Off);

StringRef::size_type Pos = Str.find(0);
if (Pos != StringRef::npos)
Str = Str.substr(0, Pos);

Result = Str.size();
if (StringResult)
*StringResult = Str;
return true;
Offset = static_cast<uint64_t>(Off);
return S;
}

// Fall through to slow path.
}
return nullptr;
}

// Slow path: scan the bytes of the string looking for the terminating 0.
for (uint64_t Strlen = 0; /**/; ++Strlen) {
template <typename CharAction>
static bool IterateStringLValue(EvalInfo &Info, const Expr *E, QualType CharTy,
LValue &String, CharAction &&Action) {
while (true) {
APValue Char;
if (!handleLValueToRValueConversion(Info, E, CharTy, String, Char) ||
!Char.isInt())
return false;
if (!Char.getInt()) {
Result = Strlen;
if (!Action(Char.getInt().getExtValue()))
return true;
} else if (StringResult)
StringResult->push_back(Char.getInt().getExtValue());
if (!HandleLValueArrayAdjustment(Info, E, String, CharTy, 1))
return false;
}
}

std::optional<std::string> Expr::tryEvaluateString(ASTContext &Ctx) const {
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
EvalInfo &Info) {
LValue String;
QualType CharTy;
if (!EvaluateStringAsLValue(Info, E, CharTy, String))
return false;

// Fast path: if it's a string literal, search the string value.
uint64_t Off;
if (const auto *S = StringLValueIsLiteral(Info, String, CharTy, Off)) {
StringRef Str = S->getBytes().substr(Off);

StringRef::size_type Pos = Str.find(0);
if (Pos != StringRef::npos)
Str = Str.substr(0, Pos);

Result = Str.size();
return true;
}

// Slow path: scan the bytes of the string looking for the terminating 0.
Result = 0;
return IterateStringLValue(Info, E, CharTy, String, [&](int Char) {
if (Char) {
Result++;
return true;
} else
return false;
});
}

Expr::StringEvalResult::StringEvalResult(const StringLiteral *SL,
uint64_t Offset)
: SL(SL), Offset(Offset) {}

Expr::StringEvalResult::StringEvalResult(std::string Contents)
: Storage(std::move(Contents)), SL(nullptr), Offset(0) {}

llvm::StringRef Expr::StringEvalResult::getString() const {
return SL ? SL->getBytes().substr(Offset) : Storage;
}

bool Expr::StringEvalResult::getStringLiteral(const StringLiteral *&SL,
uint64_t &Offset) const {
if (this->SL) {
SL = this->SL;
Offset = this->Offset;
return true;
}
return false;
}

std::optional<Expr::StringEvalResult>
Expr::tryEvaluateString(ASTContext &Ctx, bool *NullTerminated,
bool InConstantContext) const {
if (NullTerminated)
*NullTerminated = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put that in StringEvalResult instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, not sure what was going through my head that it wasn't there to begin with.


Expr::EvalStatus Status;
EvalInfo Info(Ctx, Status, EvalInfo::EM_ConstantFold);
uint64_t Result;
std::string StringResult;
EvalInfo Info(Ctx, Status,
(InConstantContext &&
(Ctx.getLangOpts().CPlusPlus || Ctx.getLangOpts().C23))
? EvalInfo::EM_ConstantExpression
: EvalInfo::EM_ConstantFold);
Info.InConstantContext = InConstantContext;
LValue String;
QualType CharTy;
if (!EvaluateStringAsLValue(Info, this, CharTy, String))
return {};

uint64_t Off;
if (const auto *S = StringLValueIsLiteral(Info, String, CharTy, Off)) {
if (NullTerminated)
*NullTerminated = true;
return StringEvalResult(S, Off);
}

std::string Result;
bool NTFound = IterateStringLValue(Info, this, CharTy, String, [&](int Char) {
if (Char) {
Result.push_back(Char);
return true;
} else
return false;
});

if (EvaluateBuiltinStrLen(this, Result, Info, &StringResult))
return StringResult;
return {};
if (NullTerminated)
*NullTerminated = NTFound;
return StringEvalResult(Result);
}

template <typename T>
Expand Down
5 changes: 3 additions & 2 deletions clang/lib/Analysis/UnsafeBufferUsage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -726,12 +726,13 @@ static bool hasUnsafeFormatOrSArg(const CallExpr *Call, const Expr *&UnsafeArg,
const Expr *Fmt = Call->getArg(FmtArgIdx);

if (auto *SL = dyn_cast<clang::StringLiteral>(Fmt->IgnoreParenImpCasts())) {
std::optional<Expr::StringEvalResult> SER;
StringRef FmtStr;

if (SL->getCharByteWidth() == 1)
FmtStr = SL->getString();
else if (auto EvaledFmtStr = SL->tryEvaluateString(Ctx))
FmtStr = *EvaledFmtStr;
else if ((SER = SL->tryEvaluateString(Ctx)))
FmtStr = SER->getString();
else
goto CHECK_UNSAFE_PTR;

Expand Down
5 changes: 3 additions & 2 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3477,8 +3477,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
llvm::DILocation *TrapLocation = Builder.getCurrentDebugLocation();
if (getDebugInfo()) {
TrapLocation = getDebugInfo()->CreateTrapFailureMessageFor(
TrapLocation, *E->getArg(0)->tryEvaluateString(getContext()),
*E->getArg(1)->tryEvaluateString(getContext()));
TrapLocation,
E->getArg(0)->tryEvaluateString(getContext())->getString(),
E->getArg(1)->tryEvaluateString(getContext())->getString());
}
ApplyDebugLocation ApplyTrapDI(*this, TrapLocation);
// Currently no attempt is made to prevent traps from being merged.
Expand Down
Loading