-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[TySan] A Type Sanitizer (Clang) #76260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-compiler-rt-sanitizer @llvm/pr-subscribers-clang-driver Author: Florian Hahn (fhahn) ChangesThis patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. The Clang changes seems mostly formulaic, the one specific change being that when the TBAA sanitizer is enabled, TBAA is always generated, even at -O0. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. Based on https://reviews.llvm.org/D32199. Patch is 22.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76260.diff 17 Files Affected:
diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def
index 06efac0cf1abd7..6a1bc04963af68 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -98,6 +98,7 @@ FEATURE(nullability_nullable_result, true)
FEATURE(memory_sanitizer,
LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory |
SanitizerKind::KernelMemory))
+FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type))
FEATURE(thread_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Thread))
FEATURE(dataflow_sanitizer, LangOpts.Sanitize.has(SanitizerKind::DataFlow))
FEATURE(scudo, LangOpts.Sanitize.hasOneOf(SanitizerKind::Scudo))
diff --git a/clang/include/clang/Basic/Sanitizers.def b/clang/include/clang/Basic/Sanitizers.def
index c2137e3f61f645..a0f54d9e86327a 100644
--- a/clang/include/clang/Basic/Sanitizers.def
+++ b/clang/include/clang/Basic/Sanitizers.def
@@ -73,6 +73,9 @@ SANITIZER("fuzzer", Fuzzer)
// libFuzzer-required instrumentation, no linking.
SANITIZER("fuzzer-no-link", FuzzerNoLink)
+// TypeSanitizer
+SANITIZER("type", Type)
+
// ThreadSanitizer
SANITIZER("thread", Thread)
diff --git a/clang/include/clang/Driver/SanitizerArgs.h b/clang/include/clang/Driver/SanitizerArgs.h
index 07070ec4fc0653..52b482a0e8a1a9 100644
--- a/clang/include/clang/Driver/SanitizerArgs.h
+++ b/clang/include/clang/Driver/SanitizerArgs.h
@@ -86,6 +86,7 @@ class SanitizerArgs {
bool needsHwasanAliasesRt() const {
return needsHwasanRt() && HwasanUseAliases;
}
+ bool needsTysanRt() const { return Sanitizers.has(SanitizerKind::Type); }
bool needsTsanRt() const { return Sanitizers.has(SanitizerKind::Thread); }
bool needsMsanRt() const { return Sanitizers.has(SanitizerKind::Memory); }
bool needsFuzzer() const { return Sanitizers.has(SanitizerKind::Fuzzer); }
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 480410db1021b7..d7c233b4da3ff5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -79,6 +79,7 @@
#include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h"
#include "llvm/Transforms/Instrumentation/SanitizerCoverage.h"
#include "llvm/Transforms/Instrumentation/ThreadSanitizer.h"
+#include "llvm/Transforms/Instrumentation/TypeSanitizer.h"
#include "llvm/Transforms/ObjCARC.h"
#include "llvm/Transforms/Scalar/EarlyCSE.h"
#include "llvm/Transforms/Scalar/GVN.h"
@@ -685,6 +686,11 @@ static void addSanitizers(const Triple &TargetTriple,
MPM.addPass(createModuleToFunctionPassAdaptor(ThreadSanitizerPass()));
}
+ if (LangOpts.Sanitize.has(SanitizerKind::Type)) {
+ MPM.addPass(ModuleTypeSanitizerPass());
+ MPM.addPass(createModuleToFunctionPassAdaptor(TypeSanitizerPass()));
+ }
+
auto ASanPass = [&](SanitizerMask Mask, bool CompileKernel) {
if (LangOpts.Sanitize.has(Mask)) {
bool UseGlobalGC = asanUseGlobalsGC(TargetTriple, CodeGenOpts);
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index a5da0aa2965a00..3f8e58db65fe1b 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -481,7 +481,8 @@ void CodeGenFunction::EmitStaticVarDecl(const VarDecl &D,
LocalDeclMap.find(&D)->second = Address(castedAddr, elemTy, alignment);
CGM.setStaticLocalDeclAddress(&D, castedAddr);
- CGM.getSanitizerMetadata()->reportGlobal(var, D);
+ CGM.getSanitizerMetadata()->reportGlobalToASan(var, D);
+ CGM.getSanitizerMetadata()->reportGlobalToTySan(var, D);
// Emit global variable debug descriptor for static vars.
CGDebugInfo *DI = getDebugInfo();
diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index e08a1e5f42df20..08b3e06cb5a450 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -472,6 +472,10 @@ llvm::Function *CodeGenModule::CreateGlobalInitOrCleanUpFunction(
!isInNoSanitizeList(SanitizerKind::MemtagStack, Fn, Loc))
Fn->addFnAttr(llvm::Attribute::SanitizeMemTag);
+ if (getLangOpts().Sanitize.has(SanitizerKind::Type) &&
+ !isInNoSanitizeList(SanitizerKind::Type, Fn, Loc))
+ Fn->addFnAttr(llvm::Attribute::SanitizeType);
+
if (getLangOpts().Sanitize.has(SanitizerKind::Thread) &&
!isInNoSanitizeList(SanitizerKind::Thread, Fn, Loc))
Fn->addFnAttr(llvm::Attribute::SanitizeThread);
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 2199d7b58fb96e..26825802fc3d93 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -790,6 +790,8 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
Fn->addFnAttr(llvm::Attribute::SanitizeMemTag);
if (SanOpts.has(SanitizerKind::Thread))
Fn->addFnAttr(llvm::Attribute::SanitizeThread);
+ if (SanOpts.has(SanitizerKind::Type))
+ Fn->addFnAttr(llvm::Attribute::SanitizeType);
if (SanOpts.hasOneOf(SanitizerKind::Memory | SanitizerKind::KernelMemory))
Fn->addFnAttr(llvm::Attribute::SanitizeMemory);
}
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index d78f2594a23764..891af7252b24a6 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -391,8 +391,8 @@ CodeGenModule::CodeGenModule(ASTContext &C,
if (LangOpts.HLSL)
createHLSLRuntime();
- // Enable TBAA unless it's suppressed. ThreadSanitizer needs TBAA even at O0.
- if (LangOpts.Sanitize.has(SanitizerKind::Thread) ||
+ // Enable TBAA unless it's suppressed. TSan and TySan need TBAA even at O0.
+ if (LangOpts.Sanitize.hasOneOf(SanitizerKind::Thread | SanitizerKind::Type) ||
(!CodeGenOpts.RelaxedAliasing && CodeGenOpts.OptimizationLevel > 0))
TBAA.reset(new CodeGenTBAA(Context, TheModule, CodeGenOpts, getLangOpts(),
getCXXABI().getMangleContext()));
@@ -4924,7 +4924,7 @@ CodeGenModule::GetOrCreateLLVMGlobal(StringRef MangledName, llvm::Type *Ty,
}
if (D)
- SanitizerMD->reportGlobal(GV, *D);
+ SanitizerMD->reportGlobalToASan(GV, *D);
LangAS ExpectedAS =
D ? D->getType().getAddressSpace()
@@ -5465,7 +5465,8 @@ void CodeGenModule::EmitGlobalVarDefinition(const VarDecl *D,
if (NeedsGlobalCtor || NeedsGlobalDtor)
EmitCXXGlobalVarDeclInitFunc(D, GV, NeedsGlobalCtor);
- SanitizerMD->reportGlobal(GV, *D, NeedsGlobalCtor);
+ SanitizerMD->reportGlobalToASan(GV, *D, NeedsGlobalCtor);
+ SanitizerMD->reportGlobalToTySan(GV, *D);
// Emit global variable debug information.
if (CGDebugInfo *DI = getModuleDebugInfo())
@@ -6341,7 +6342,8 @@ CodeGenModule::GetAddrOfConstantStringFromLiteral(const StringLiteral *S,
if (Entry)
*Entry = GV;
- SanitizerMD->reportGlobal(GV, S->getStrTokenLoc(0), "<string literal>");
+ SanitizerMD->reportGlobalToASan(GV, S->getStrTokenLoc(0), "<string literal>");
+ // FIXME: Should we also report to the TySan?
return ConstantAddress(castStringLiteralToDefaultAddressSpace(*this, GV),
GV->getValueType(), Alignment);
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index dc288bc3f6157a..861b2b35a27d13 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -226,8 +226,10 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
}
llvm::MDNode *CodeGenTBAA::getTypeInfo(QualType QTy) {
- // At -O0 or relaxed aliasing, TBAA is not emitted for regular types.
- if (CodeGenOpts.OptimizationLevel == 0 || CodeGenOpts.RelaxedAliasing)
+ // At -O0 or relaxed aliasing, TBAA is not emitted for regular types (unless
+ // we're running TypeSanitizer).
+ if (!Features.Sanitize.has(SanitizerKind::Type) &&
+ (CodeGenOpts.OptimizationLevel == 0 || CodeGenOpts.RelaxedAliasing))
return nullptr;
// If the type has the may_alias attribute (even on a typedef), it is
diff --git a/clang/lib/CodeGen/SanitizerMetadata.cpp b/clang/lib/CodeGen/SanitizerMetadata.cpp
index 53161c316c58a4..c0d38d3c747dea 100644
--- a/clang/lib/CodeGen/SanitizerMetadata.cpp
+++ b/clang/lib/CodeGen/SanitizerMetadata.cpp
@@ -34,11 +34,11 @@ SanitizerMask expandKernelSanitizerMasks(SanitizerMask Mask) {
return Mask;
}
-void SanitizerMetadata::reportGlobal(llvm::GlobalVariable *GV,
- SourceLocation Loc, StringRef Name,
- QualType Ty,
- SanitizerMask NoSanitizeAttrMask,
- bool IsDynInit) {
+void SanitizerMetadata::reportGlobalToASan(llvm::GlobalVariable *GV,
+ SourceLocation Loc, StringRef Name,
+ QualType Ty,
+ SanitizerMask NoSanitizeAttrMask,
+ bool IsDynInit) {
SanitizerSet FsanitizeArgument = CGM.getLangOpts().Sanitize;
if (!isAsanHwasanOrMemTag(FsanitizeArgument))
return;
@@ -75,8 +75,8 @@ void SanitizerMetadata::reportGlobal(llvm::GlobalVariable *GV,
GV->setSanitizerMetadata(Meta);
}
-void SanitizerMetadata::reportGlobal(llvm::GlobalVariable *GV, const VarDecl &D,
- bool IsDynInit) {
+void SanitizerMetadata::reportGlobalToASan(llvm::GlobalVariable *GV,
+ const VarDecl &D, bool IsDynInit) {
if (!isAsanHwasanOrMemTag(CGM.getLangOpts().Sanitize))
return;
std::string QualName;
@@ -94,10 +94,34 @@ void SanitizerMetadata::reportGlobal(llvm::GlobalVariable *GV, const VarDecl &D,
return NoSanitizeMask;
};
- reportGlobal(GV, D.getLocation(), OS.str(), D.getType(), getNoSanitizeMask(D),
- IsDynInit);
+ reportGlobalToASan(GV, D.getLocation(), OS.str(), D.getType(),
+ getNoSanitizeMask(D), IsDynInit);
+}
+
+void SanitizerMetadata::reportGlobalToTySan(llvm::GlobalVariable *GV,
+ const VarDecl &D) {
+ if (!CGM.getLangOpts().Sanitize.has(SanitizerKind::Type))
+ return;
+
+ for (auto Attr : D.specific_attrs<NoSanitizeAttr>())
+ if (Attr->getMask() & SanitizerKind::Type)
+ return;
+
+ QualType QTy = D.getType();
+ llvm::MDNode *TBAAInfo = CGM.getTBAATypeInfo(QTy);
+ if (!TBAAInfo || TBAAInfo == CGM.getTBAATypeInfo(CGM.getContext().CharTy))
+ return;
+
+ llvm::Metadata *GlobalMetadata[] = {llvm::ConstantAsMetadata::get(GV),
+ TBAAInfo};
+
+ llvm::MDNode *ThisGlobal =
+ llvm::MDNode::get(CGM.getLLVMContext(), GlobalMetadata);
+ llvm::NamedMDNode *TysanGlobals =
+ CGM.getModule().getOrInsertNamedMetadata("llvm.tysan.globals");
+ TysanGlobals->addOperand(ThisGlobal);
}
void SanitizerMetadata::disableSanitizerForGlobal(llvm::GlobalVariable *GV) {
- reportGlobal(GV, SourceLocation(), "", QualType(), SanitizerKind::All);
+ reportGlobalToASan(GV, SourceLocation(), "", QualType(), SanitizerKind::All);
}
diff --git a/clang/lib/CodeGen/SanitizerMetadata.h b/clang/lib/CodeGen/SanitizerMetadata.h
index 000f02cf8dcf11..9de087c518c6ad 100644
--- a/clang/lib/CodeGen/SanitizerMetadata.h
+++ b/clang/lib/CodeGen/SanitizerMetadata.h
@@ -37,12 +37,13 @@ class SanitizerMetadata {
public:
SanitizerMetadata(CodeGenModule &CGM);
- void reportGlobal(llvm::GlobalVariable *GV, const VarDecl &D,
- bool IsDynInit = false);
- void reportGlobal(llvm::GlobalVariable *GV, SourceLocation Loc,
- StringRef Name, QualType Ty = {},
- SanitizerMask NoSanitizeAttrMask = {},
- bool IsDynInit = false);
+ void reportGlobalToASan(llvm::GlobalVariable *GV, const VarDecl &D,
+ bool IsDynInit = false);
+ void reportGlobalToASan(llvm::GlobalVariable *GV, SourceLocation Loc,
+ StringRef Name, QualType Ty = {},
+ SanitizerMask NoSanitizeAttrMask = {},
+ bool IsDynInit = false);
+ void reportGlobalToTySan(llvm::GlobalVariable *GV, const VarDecl &D);
void disableSanitizerForGlobal(llvm::GlobalVariable *GV);
};
} // end namespace CodeGen
diff --git a/clang/lib/Driver/SanitizerArgs.cpp b/clang/lib/Driver/SanitizerArgs.cpp
index ad68c086b71790..6865451b8bb7e4 100644
--- a/clang/lib/Driver/SanitizerArgs.cpp
+++ b/clang/lib/Driver/SanitizerArgs.cpp
@@ -42,14 +42,14 @@ static const SanitizerMask NotAllowedWithExecuteOnly =
static const SanitizerMask RequiresPIE =
SanitizerKind::DataFlow | SanitizerKind::Scudo;
static const SanitizerMask NeedsUnwindTables =
- SanitizerKind::Address | SanitizerKind::HWAddress | SanitizerKind::Thread |
- SanitizerKind::Memory | SanitizerKind::DataFlow;
+ SanitizerKind::Address | SanitizerKind::HWAddress | SanitizerKind::Type |
+ SanitizerKind::Thread | SanitizerKind::Memory | SanitizerKind::DataFlow;
static const SanitizerMask SupportsCoverage =
SanitizerKind::Address | SanitizerKind::HWAddress |
SanitizerKind::KernelAddress | SanitizerKind::KernelHWAddress |
- SanitizerKind::MemtagStack | SanitizerKind::MemtagHeap |
- SanitizerKind::MemtagGlobals | SanitizerKind::Memory |
- SanitizerKind::KernelMemory | SanitizerKind::Leak |
+ SanitizerKind::Type | SanitizerKind::MemtagStack |
+ SanitizerKind::MemtagHeap | SanitizerKind::MemtagGlobals |
+ SanitizerKind::Memory | SanitizerKind::KernelMemory | SanitizerKind::Leak |
SanitizerKind::Undefined | SanitizerKind::Integer | SanitizerKind::Bounds |
SanitizerKind::ImplicitConversion | SanitizerKind::Nullability |
SanitizerKind::DataFlow | SanitizerKind::Fuzzer |
@@ -178,6 +178,7 @@ static void addDefaultIgnorelists(const Driver &D, SanitizerMask Kinds,
{"memtag_ignorelist.txt", SanitizerKind::MemTag},
{"msan_ignorelist.txt", SanitizerKind::Memory},
{"tsan_ignorelist.txt", SanitizerKind::Thread},
+ {"tysan_blacklist.txt", SanitizerKind::Type},
{"dfsan_abilist.txt", SanitizerKind::DataFlow},
{"cfi_ignorelist.txt", SanitizerKind::CFI},
{"ubsan_ignorelist.txt",
@@ -516,6 +517,10 @@ SanitizerArgs::SanitizerArgs(const ToolChain &TC,
std::pair<SanitizerMask, SanitizerMask> IncompatibleGroups[] = {
std::make_pair(SanitizerKind::Address,
SanitizerKind::Thread | SanitizerKind::Memory),
+ std::make_pair(SanitizerKind::Type,
+ SanitizerKind::Address | SanitizerKind::KernelAddress |
+ SanitizerKind::Memory | SanitizerKind::Leak |
+ SanitizerKind::Thread | SanitizerKind::KernelAddress),
std::make_pair(SanitizerKind::Thread, SanitizerKind::Memory),
std::make_pair(SanitizerKind::Leak,
SanitizerKind::Thread | SanitizerKind::Memory),
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 6eb0ed8f3fed9a..bb2b85c0553d72 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1337,8 +1337,10 @@ collectSanitizerRuntimes(const ToolChain &TC, const ArgList &Args,
if (SanArgs.needsScudoRt()) {
SharedRuntimes.push_back("scudo_standalone");
}
- if (SanArgs.needsTsanRt())
+ if (SanArgs.needsTsanRt() && SanArgs.linkRuntimes())
SharedRuntimes.push_back("tsan");
+ if (SanArgs.needsTysanRt())
+ StaticRuntimes.push_back("tysan");
if (SanArgs.needsHwasanRt()) {
if (SanArgs.needsHwasanAliasesRt())
SharedRuntimes.push_back("hwasan_aliases");
@@ -1403,6 +1405,8 @@ collectSanitizerRuntimes(const ToolChain &TC, const ArgList &Args,
if (SanArgs.linkCXXRuntimes())
StaticRuntimes.push_back("tsan_cxx");
}
+ if (!SanArgs.needsSharedRt() && SanArgs.needsTysanRt())
+ StaticRuntimes.push_back("tysan");
if (!SanArgs.needsSharedRt() && SanArgs.needsUbsanRt()) {
if (SanArgs.requiresMinimalRuntime()) {
StaticRuntimes.push_back("ubsan_minimal");
diff --git a/clang/lib/Driver/ToolChains/Darwin.cpp b/clang/lib/Driver/ToolChains/Darwin.cpp
index 65846cace461e3..02028bc1a50cc7 100644
--- a/clang/lib/Driver/ToolChains/Darwin.cpp
+++ b/clang/lib/Driver/ToolChains/Darwin.cpp
@@ -1505,6 +1505,8 @@ void DarwinClang::AddLinkRuntimeLibArgs(const ArgList &Args,
"Static sanitizer runtimes not supported");
AddLinkSanitizerLibArgs(Args, CmdArgs, "tsan");
}
+ if (Sanitize.needsTysanRt())
+ AddLinkSanitizerLibArgs(Args, CmdArgs, "tysan");
if (Sanitize.needsFuzzer() && !Args.hasArg(options::OPT_dynamiclib)) {
AddLinkSanitizerLibArgs(Args, CmdArgs, "fuzzer", /*shared=*/false);
@@ -3373,6 +3375,9 @@ SanitizerMask Darwin::getSupportedSanitizers() const {
isTargetTvOSSimulator() || isTargetWatchOSSimulator())) {
Res |= SanitizerKind::Thread;
}
+ if ((IsX86_64 || IsAArch64) && isTargetMacOSBased()) {
+ Res |= SanitizerKind::Type;
+ }
return Res;
}
diff --git a/clang/lib/Driver/ToolChains/Linux.cpp b/clang/lib/Driver/ToolChains/Linux.cpp
index 735af54f114cef..97a8d2a29f1b95 100644
--- a/clang/lib/Driver/ToolChains/Linux.cpp
+++ b/clang/lib/Driver/ToolChains/Linux.cpp
@@ -803,6 +803,8 @@ SanitizerMask Linux::getSupportedSanitizers() const {
if (IsX86_64 || IsMIPS64 || IsAArch64 || IsPowerPC64 || IsSystemZ ||
IsLoongArch64 || IsRISCV64)
Res |= SanitizerKind::Thread;
+ if (IsX86_64 || IsAArch64)
+ Res |= SanitizerKind::Type;
if (IsX86_64 || IsSystemZ)
Res |= SanitizerKind::KernelMemory;
if (IsX86_64 || IsMIPS64 || IsAArch64 || IsX86 || IsMIPS || IsArmArch ||
diff --git a/clang/test/CodeGen/sanitize-type-attr.cpp b/clang/test/CodeGen/sanitize-type-attr.cpp
new file mode 100644
index 00000000000000..4da8488e1f9486
--- /dev/null
+++ b/clang/test/CodeGen/sanitize-type-attr.cpp
@@ -0,0 +1,74 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s | FileCheck -check-prefix=WITHOUT %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s -fsanitize=type | FileCheck -check-prefix=TYSAN %s
+// RUN: echo "src:%s" | sed -e 's/\\/\\\\/g' > %t
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s -fsanitize=type -fsanitize-blacklist=%t | FileCheck -check-prefix=BL %s
+
+// The sanitize_type attribute should be attached to functions
+// when TypeSanitizer is enabled, unless no_sanitize("type") attribute
+// is present.
+
+// WITHOUT: NoTYSAN1{{.*}}) [[NOATTR:#[0-9]+]]
+// BL: NoTYSAN1{{.*}}) [[NOATTR:#[0-9]+]]
+// TYSAN: NoTYSAN1{{.*}}) [[NOATTR:#[0-9]+]]
+__attribute__((no_sanitize("type"))) int NoTYSAN1(int *a) { return *a; }
+
+// WITHOUT: NoTYSAN2{{.*}}) [[NOATTR]]
+// BL: NoTYSAN2{{.*}}) [[NOATTR]]
+// TYSAN: NoTYSAN2{{.*}}) [[NOATTR]]
+__attribute__((no_sanitize("type"))) int NoTYSAN2(int *a);
+int NoTYSAN2(int *a) { return *a; }
+
+// WITHOUT: NoTYSAN3{{.*}}) [[NOATTR:#[0-9]+]]
+// BL: NoTYSAN3{{.*}}) [[NOATTR:#[0-9]+]]
+// TYSAN: NoTYSAN3{{.*}}) [[NOATTR:#[0-9]+]]
+__attribute__((no_sanitize("type"))) int NoTYSAN3(int *a) { return *a; }
+
+// WITHOUT: TYSANOk{{.*}}) [[NOATTR]]
+// BL: TYSANOk{{.*}}) [[NOATTR]]
+// TYSAN: TYSANOk{{.*}}) [[WITH:#[0-9]+]]
+int TYSANOk(int *a) { return *a; }
+
+// WITHOUT: TemplateTYSANOk{{.*}}) [[NOATTR]]
+// BL: TemplateTYSANOk{{.*}}) [[NOATTR]]
+// TYSAN: TemplateTYSANOk{{.*}}) [[WITH]]
+template <int i>
+int TemplateTYSANOk() { return i; }
+
+// WITHOUT: TemplateNoTYSAN{{.*}}) [[NOATTR]]
+// BL: TemplateNoTYSAN{{.*}}) [[NOATTR]]
+// TYSAN: TemplateNoTYSAN{{.*}}) [[NOATTR]]
+template <int i>
+__attribute__((no_sanitize("type"))) int TemplateNoTYSAN() { return i; }
+
+int force_instance = TemplateTYSANOk<42>() + TemplateNoTYSAN<42>();
+
+// Check that __cxx_global_var_init* get the sanitize_type attribute.
+int global1 = 0;
+int global2 = *(int *)((char *)&global1 + 1);
+// WITHOUT: @__cxx_global_var_init{{.*}}[[NOATTR:#[0-9]+]]
+// BL: @__cxx_global_var_init{{.*}}[[NOATTR:#[0-9]+]]
+// TYSAN: @__cxx_global_var_init{{.*}}[[WITH:#[0-9]+]]
+
+// Make sure that we don't add globals to the list for which we don't have a
+// specific type description.
+// FIXME: We now have a type description for this type and a global...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clang changes are ok, but this needs some level of documentation/release notes, which I don't see in the clang release. As this is a part of a larger feature, do we intend to push that later?
Also, the clang-format suggestion makes sense.
76b7017
to
96912ae
Compare
7f93cb0
to
ee7ed21
Compare
eb97b07
to
b1465a7
Compare
ee7ed21
to
fb7da09
Compare
b1465a7
to
a6d44dd
Compare
fb7da09
to
9cbb116
Compare
Sorry for the long delay! Added a release note and fixed the formatting. This is part of larger feature and depends on an LLVM part (#76259) and compiler-rt support (#76261) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pair of minor changes requested, else this looks about right? Not sure who the right person to approve this is though
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
@@ -5740,7 +5740,8 @@ void CodeGenModule::EmitGlobalVarDefinition(const VarDecl *D, | |||
if (NeedsGlobalCtor || NeedsGlobalDtor) | |||
EmitCXXGlobalVarDeclInitFunc(D, GV, NeedsGlobalCtor); | |||
|
|||
SanitizerMD->reportGlobal(GV, *D, NeedsGlobalCtor); | |||
SanitizerMD->reportGlobalToASan(GV, *D, NeedsGlobalCtor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has happened a few times, I would suggest keeping reportGlobal
and documenting that it does BOTH of these things, and in the few cases you only need ASan, do reportGlobalToASan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on current main
, reportGlobal
reports not only to Asan, but also memsan and hwsan. Updated to include reporting globals to TySan, thanks!
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
@@ -6630,7 +6631,8 @@ CodeGenModule::GetAddrOfConstantStringFromLiteral(const StringLiteral *S, | |||
if (Entry) | |||
*Entry = GV; | |||
|
|||
SanitizerMD->reportGlobal(GV, S->getStrTokenLoc(0), "<string literal>"); | |||
SanitizerMD->reportGlobalToASan(GV, S->getStrTokenLoc(0), "<string literal>"); | |||
// FIXME: Should we also report to the TySan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, merged the code to report globals again, thanks!
clang/docs/ReleaseNotes.rst
Outdated
@@ -1027,6 +1027,10 @@ Sanitizers | |||
<https://clang.llvm.org/docs/SanitizerSpecialCaseList.html>`_. See that link | |||
for examples. | |||
|
|||
- Introduced an experimental Type Sanitizer, activated by using the | |||
-fsanitize=type flag. This sanitizer detects violations of C/C++ type-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fsanitize=type flag. This sanitizer detects violations of C/C++ type-based | |
``-fsanitize=type`` flag. This sanitizer detects violations of C/C++ type-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
@@ -102,6 +102,7 @@ FEATURE(numerical_stability_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Nume | |||
FEATURE(memory_sanitizer, | |||
LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory | | |||
SanitizerKind::KernelMemory)) | |||
FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these sanitizers are features. :-( I think this is the correct thing to do for consistency, but at the same time, we keep adding more sanitizers and we keep making this problem worse.
Nothing to change here, just me grumbling. :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would be the right place to add those? Might be good to at least file an issue to clean this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, as this depends on LLVM & compiler-rt patches, so it should only be merged once those other PRs are also approved.
Motivation for the sanitizer is enabling -fpointer-tbaa
by default (#117244)
clang/docs/ReleaseNotes.rst
Outdated
@@ -1027,6 +1027,10 @@ Sanitizers | |||
<https://clang.llvm.org/docs/SanitizerSpecialCaseList.html>`_. See that link | |||
for examples. | |||
|
|||
- Introduced an experimental Type Sanitizer, activated by using the | |||
-fsanitize=type flag. This sanitizer detects violations of C/C++ type-based |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
@@ -102,6 +102,7 @@ FEATURE(numerical_stability_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Nume | |||
FEATURE(memory_sanitizer, | |||
LangOpts.Sanitize.hasOneOf(SanitizerKind::Memory | | |||
SanitizerKind::KernelMemory)) | |||
FEATURE(type_sanitizer, LangOpts.Sanitize.has(SanitizerKind::Type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would be the right place to add those? Might be good to at least file an issue to clean this up?
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
@@ -6630,7 +6631,8 @@ CodeGenModule::GetAddrOfConstantStringFromLiteral(const StringLiteral *S, | |||
if (Entry) | |||
*Entry = GV; | |||
|
|||
SanitizerMD->reportGlobal(GV, S->getStrTokenLoc(0), "<string literal>"); | |||
SanitizerMD->reportGlobalToASan(GV, S->getStrTokenLoc(0), "<string literal>"); | |||
// FIXME: Should we also report to the TySan? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, merged the code to report globals again, thanks!
clang/lib/CodeGen/CodeGenModule.cpp
Outdated
@@ -5740,7 +5740,8 @@ void CodeGenModule::EmitGlobalVarDefinition(const VarDecl *D, | |||
if (NeedsGlobalCtor || NeedsGlobalDtor) | |||
EmitCXXGlobalVarDeclInitFunc(D, GV, NeedsGlobalCtor); | |||
|
|||
SanitizerMD->reportGlobal(GV, *D, NeedsGlobalCtor); | |||
SanitizerMD->reportGlobalToASan(GV, *D, NeedsGlobalCtor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on current main
, reportGlobal
reports not only to Asan, but also memsan and hwsan. Updated to include reporting globals to TySan, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comments addressed. (I haven't looked carefully at the non-clang parts; this is just approval for the clang changes.)
SharedRuntimes.push_back("tsan"); | ||
if (SanArgs.needsTysanRt()) | ||
StaticRuntimes.push_back("tysan"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean SharedRuntimes.push_back
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
@@ -1441,8 +1441,10 @@ collectSanitizerRuntimes(const ToolChain &TC, const ArgList &Args, | |||
if (SanArgs.needsScudoRt()) { | |||
SharedRuntimes.push_back("scudo_standalone"); | |||
} | |||
if (SanArgs.needsTsanRt()) | |||
if (SanArgs.needsTsanRt() && SanArgs.linkRuntimes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this change to tsan related to tysan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I think this was left over from a merge, removed, thanks!
2235299
to
694cb2c
Compare
8fa7bac
to
13c4092
Compare
694cb2c
to
9af0ba5
Compare
This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (#76260) and compiler-rt changes (#76261) PR: #76259
…ype-sanitizer-clang
All 3 parts have been approved and the LLVM part just landed. I am planning on landing the Clang part and the compiler-rt part later today |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/12442 Here is the relevant piece of the build log for the reference
|
This patch introduces the runtime components for type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32197. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. This includes build fixes for Linux from Mingjie Xu. Depends on #76260 (Clang support), #76259 (LLVM support) PR: #76261
@@ -105,11 +106,32 @@ void SanitizerMetadata::reportGlobal(llvm::GlobalVariable *GV, | |||
GV, Loc, Ty, "init"); | |||
|
|||
GV->setSanitizerMetadata(Meta); | |||
|
|||
if (Ty.isNull() || !CGM.getLangOpts().Sanitize.has(SanitizerKind::Type) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to exclude globals defined externally as well I think: #120565
Sorry for the very late review but do we have clang documentation for this? |
+1, we do need documentation for this before Clang 20 ships. CC @fhahn |
This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.
C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.
The Clang changes seems mostly formulaic, the one specific change being that when the TBAA sanitizer is enabled, TBAA is always generated, even at -O0.
Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.
Based on https://reviews.llvm.org/D32199.
#76259 (LLVM support)