Skip to content

Commit a487b79

Browse files
authored
[TySan] Add initial Type Sanitizer (LLVM) (#76259)
This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (#76260) and compiler-rt changes (#76261) PR: #76259
1 parent f6f4744 commit a487b79

File tree

27 files changed

+2275
-8
lines changed

27 files changed

+2275
-8
lines changed

llvm/include/llvm/Analysis/TypeBasedAliasAnalysis.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,15 @@ class MemoryLocation;
2929

3030
/// A simple AA result that uses TBAA metadata to answer queries.
3131
class TypeBasedAAResult : public AAResultBase {
32+
/// True if type sanitizer is enabled. When TypeSanitizer is used, don't use
33+
/// TBAA information for alias analysis as this might cause us to remove
34+
/// memory accesses that we need to verify at runtime.
35+
bool UsingTypeSanitizer;
36+
3237
public:
38+
TypeBasedAAResult(bool UsingTypeSanitizer)
39+
: UsingTypeSanitizer(UsingTypeSanitizer) {}
40+
3341
/// Handle invalidation events from the new pass manager.
3442
///
3543
/// By definition, this result is stateless and so remains valid.
@@ -52,6 +60,10 @@ class TypeBasedAAResult : public AAResultBase {
5260

5361
private:
5462
bool Aliases(const MDNode *A, const MDNode *B) const;
63+
64+
/// Returns true if TBAA metadata should be used, that is if TBAA is enabled
65+
/// and type sanitizer is not used.
66+
bool shouldUseTBAA() const;
5567
};
5668

5769
/// Analysis pass providing a never-invalidated alias analysis result.

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -787,6 +787,7 @@ enum AttributeKindCodes {
787787
ATTR_KIND_CORO_ELIDE_SAFE = 98,
788788
ATTR_KIND_NO_EXT = 99,
789789
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
790+
ATTR_KIND_SANITIZE_TYPE = 101,
790791
};
791792

792793
enum ComdatSelectionKindCodes {

llvm/include/llvm/IR/Attributes.td

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,9 @@ def SanitizeAddress : EnumAttr<"sanitize_address", IntersectPreserve, [FnAttr]>;
317317
/// ThreadSanitizer is on.
318318
def SanitizeThread : EnumAttr<"sanitize_thread", IntersectPreserve, [FnAttr]>;
319319

320+
/// TypeSanitizer is on.
321+
def SanitizeType : EnumAttr<"sanitize_type", IntersectPreserve, [FnAttr]>;
322+
320323
/// MemorySanitizer is on.
321324
def SanitizeMemory : EnumAttr<"sanitize_memory", IntersectPreserve, [FnAttr]>;
322325

@@ -425,6 +428,7 @@ class CompatRuleStrAttr<string F, string Attr> : CompatRule<F> {
425428

426429
def : CompatRule<"isEqual<SanitizeAddressAttr>">;
427430
def : CompatRule<"isEqual<SanitizeThreadAttr>">;
431+
def : CompatRule<"isEqual<SanitizeTypeAttr>">;
428432
def : CompatRule<"isEqual<SanitizeMemoryAttr>">;
429433
def : CompatRule<"isEqual<SanitizeHWAddressAttr>">;
430434
def : CompatRule<"isEqual<SanitizeMemTagAttr>">;
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
//===- Transforms/Instrumentation/TypeSanitizer.h - TySan Pass -----------===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file defines the type sanitizer pass.
10+
//
11+
//===----------------------------------------------------------------------===//
12+
13+
#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_TYPESANITIZER_H
14+
#define LLVM_TRANSFORMS_INSTRUMENTATION_TYPESANITIZER_H
15+
16+
#include "llvm/IR/PassManager.h"
17+
18+
namespace llvm {
19+
class Function;
20+
class FunctionPass;
21+
class Module;
22+
23+
/// A function pass for tysan instrumentation.
24+
struct TypeSanitizerPass : public PassInfoMixin<TypeSanitizerPass> {
25+
PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM);
26+
static bool isRequired() { return true; }
27+
};
28+
29+
/// A module pass for tysan instrumentation.
30+
///
31+
/// Create ctor and init functions.
32+
struct ModuleTypeSanitizerPass : public PassInfoMixin<ModuleTypeSanitizerPass> {
33+
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
34+
static bool isRequired() { return true; }
35+
};
36+
37+
} // namespace llvm
38+
#endif /* LLVM_TRANSFORMS_INSTRUMENTATION_TYPESANITIZER_H */

llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ static bool isStructPathTBAA(const MDNode *MD) {
375375
AliasResult TypeBasedAAResult::alias(const MemoryLocation &LocA,
376376
const MemoryLocation &LocB,
377377
AAQueryInfo &AAQI, const Instruction *) {
378-
if (!EnableTBAA)
378+
if (!shouldUseTBAA())
379379
return AliasResult::MayAlias;
380380

381381
if (Aliases(LocA.AATags.TBAA, LocB.AATags.TBAA))
@@ -388,7 +388,7 @@ AliasResult TypeBasedAAResult::alias(const MemoryLocation &LocA,
388388
ModRefInfo TypeBasedAAResult::getModRefInfoMask(const MemoryLocation &Loc,
389389
AAQueryInfo &AAQI,
390390
bool IgnoreLocals) {
391-
if (!EnableTBAA)
391+
if (!shouldUseTBAA())
392392
return ModRefInfo::ModRef;
393393

394394
const MDNode *M = Loc.AATags.TBAA;
@@ -406,7 +406,7 @@ ModRefInfo TypeBasedAAResult::getModRefInfoMask(const MemoryLocation &Loc,
406406

407407
MemoryEffects TypeBasedAAResult::getMemoryEffects(const CallBase *Call,
408408
AAQueryInfo &AAQI) {
409-
if (!EnableTBAA)
409+
if (!shouldUseTBAA())
410410
return MemoryEffects::unknown();
411411

412412
// If this is an "immutable" type, the access is not observable.
@@ -426,7 +426,7 @@ MemoryEffects TypeBasedAAResult::getMemoryEffects(const Function *F) {
426426
ModRefInfo TypeBasedAAResult::getModRefInfo(const CallBase *Call,
427427
const MemoryLocation &Loc,
428428
AAQueryInfo &AAQI) {
429-
if (!EnableTBAA)
429+
if (!shouldUseTBAA())
430430
return ModRefInfo::ModRef;
431431

432432
if (const MDNode *L = Loc.AATags.TBAA)
@@ -440,7 +440,7 @@ ModRefInfo TypeBasedAAResult::getModRefInfo(const CallBase *Call,
440440
ModRefInfo TypeBasedAAResult::getModRefInfo(const CallBase *Call1,
441441
const CallBase *Call2,
442442
AAQueryInfo &AAQI) {
443-
if (!EnableTBAA)
443+
if (!shouldUseTBAA())
444444
return ModRefInfo::ModRef;
445445

446446
if (const MDNode *M1 = Call1->getMetadata(LLVMContext::MD_tbaa))
@@ -705,10 +705,14 @@ bool TypeBasedAAResult::Aliases(const MDNode *A, const MDNode *B) const {
705705
return matchAccessTags(A, B);
706706
}
707707

708+
bool TypeBasedAAResult::shouldUseTBAA() const {
709+
return EnableTBAA && !UsingTypeSanitizer;
710+
}
711+
708712
AnalysisKey TypeBasedAA::Key;
709713

710714
TypeBasedAAResult TypeBasedAA::run(Function &F, FunctionAnalysisManager &AM) {
711-
return TypeBasedAAResult();
715+
return TypeBasedAAResult(F.hasFnAttribute(Attribute::SanitizeType));
712716
}
713717

714718
char TypeBasedAAWrapperPass::ID = 0;
@@ -724,7 +728,7 @@ TypeBasedAAWrapperPass::TypeBasedAAWrapperPass() : ImmutablePass(ID) {
724728
}
725729

726730
bool TypeBasedAAWrapperPass::doInitialization(Module &M) {
727-
Result.reset(new TypeBasedAAResult());
731+
Result.reset(new TypeBasedAAResult(/*UsingTypeSanitizer=*/false));
728732
return false;
729733
}
730734

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2192,6 +2192,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
21922192
return Attribute::SanitizeHWAddress;
21932193
case bitc::ATTR_KIND_SANITIZE_THREAD:
21942194
return Attribute::SanitizeThread;
2195+
case bitc::ATTR_KIND_SANITIZE_TYPE:
2196+
return Attribute::SanitizeType;
21952197
case bitc::ATTR_KIND_SANITIZE_MEMORY:
21962198
return Attribute::SanitizeMemory;
21972199
case bitc::ATTR_KIND_SANITIZE_NUMERICAL_STABILITY:

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -851,6 +851,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
851851
return bitc::ATTR_KIND_SANITIZE_HWADDRESS;
852852
case Attribute::SanitizeThread:
853853
return bitc::ATTR_KIND_SANITIZE_THREAD;
854+
case Attribute::SanitizeType:
855+
return bitc::ATTR_KIND_SANITIZE_TYPE;
854856
case Attribute::SanitizeMemory:
855857
return bitc::ATTR_KIND_SANITIZE_MEMORY;
856858
case Attribute::SanitizeNumericalStability:

llvm/lib/CodeGen/ShrinkWrap.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -986,6 +986,7 @@ bool ShrinkWrap::isShrinkWrapEnabled(const MachineFunction &MF) {
986986
!(MF.getFunction().hasFnAttribute(Attribute::SanitizeAddress) ||
987987
MF.getFunction().hasFnAttribute(Attribute::SanitizeThread) ||
988988
MF.getFunction().hasFnAttribute(Attribute::SanitizeMemory) ||
989+
MF.getFunction().hasFnAttribute(Attribute::SanitizeType) ||
989990
MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress));
990991
// If EnableShrinkWrap is set, it takes precedence on whatever the
991992
// target sets. The rational is that we assume we want to test

llvm/lib/Passes/PassBuilder.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,7 @@
225225
#include "llvm/Transforms/Instrumentation/SanitizerBinaryMetadata.h"
226226
#include "llvm/Transforms/Instrumentation/SanitizerCoverage.h"
227227
#include "llvm/Transforms/Instrumentation/ThreadSanitizer.h"
228+
#include "llvm/Transforms/Instrumentation/TypeSanitizer.h"
228229
#include "llvm/Transforms/ObjCARC.h"
229230
#include "llvm/Transforms/Scalar/ADCE.h"
230231
#include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"

llvm/lib/Passes/PassRegistry.def

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ MODULE_PASS("strip-nonlinetable-debuginfo", StripNonLineTableDebugInfoPass())
155155
MODULE_PASS("trigger-crash-module", TriggerCrashModulePass())
156156
MODULE_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
157157
MODULE_PASS("tsan-module", ModuleThreadSanitizerPass())
158+
MODULE_PASS("tysan-module", ModuleTypeSanitizerPass())
158159
MODULE_PASS("verify", VerifierPass())
159160
MODULE_PASS("view-callgraph", CallGraphViewerPass())
160161
MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass())
@@ -480,6 +481,7 @@ FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
480481
FUNCTION_PASS("trigger-crash-function", TriggerCrashFunctionPass())
481482
FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
482483
FUNCTION_PASS("tsan", ThreadSanitizerPass())
484+
FUNCTION_PASS("tysan", TypeSanitizerPass())
483485
FUNCTION_PASS("typepromotion", TypePromotionPass(TM))
484486
FUNCTION_PASS("unify-loop-exits", UnifyLoopExitsPass())
485487
FUNCTION_PASS("vector-combine", VectorCombinePass())

llvm/lib/Transforms/Instrumentation/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ add_llvm_component_library(LLVMInstrumentation
2424
SanitizerBinaryMetadata.cpp
2525
ValueProfileCollector.cpp
2626
ThreadSanitizer.cpp
27+
TypeSanitizer.cpp
2728
HWAddressSanitizer.cpp
2829
RealtimeSanitizer.cpp
2930

0 commit comments

Comments
 (0)