-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[TySan] A Type Sanitizer (Runtime Library) #76261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TySan] A Type Sanitizer (Runtime Library) #76261
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Florian Hahn (fhahn) ChangesThis patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. (This includes build fixes for Linux from Mingjie Xu) Based on https://reviews.llvm.org/D32197. Patch is 53.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76261.diff 28 Files Affected:
diff --git a/clang/runtime/CMakeLists.txt b/clang/runtime/CMakeLists.txt
index 65fcdc2868f031..ff2605b23d25b0 100644
--- a/clang/runtime/CMakeLists.txt
+++ b/clang/runtime/CMakeLists.txt
@@ -122,7 +122,7 @@ if(LLVM_BUILD_EXTERNAL_COMPILER_RT AND EXISTS ${COMPILER_RT_SRC_ROOT}/)
COMPONENT compiler-rt)
# Add top-level targets that build specific compiler-rt runtimes.
- set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan ubsan ubsan-minimal)
+ set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan tysan ubsan ubsan-minimal)
foreach(runtime ${COMPILER_RT_RUNTIMES})
get_ext_project_build_command(build_runtime_cmd ${runtime})
add_custom_target(${runtime}
diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
index 416777171d2ca7..66ebee4392e1ff 100644
--- a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+++ b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
@@ -67,6 +67,7 @@ set(ALL_PROFILE_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32} ${PPC
${RISCV32} ${RISCV64} ${LOONGARCH64})
set(ALL_TSAN_SUPPORTED_ARCH ${X86_64} ${MIPS64} ${ARM64} ${PPC64} ${S390X}
${LOONGARCH64} ${RISCV64})
+set(ALL_TYSAN_SUPPORTED_ARCH ${X86_64} ${ARM64})
set(ALL_UBSAN_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${RISCV64}
${MIPS32} ${MIPS64} ${PPC64} ${S390X} ${SPARC} ${SPARCV9} ${HEXAGON}
${LOONGARCH64})
diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index 2dccd4954b2537..e448e71e5422a1 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -448,6 +448,7 @@ if(APPLE)
set(SANITIZER_COMMON_SUPPORTED_OS osx)
set(PROFILE_SUPPORTED_OS osx)
set(TSAN_SUPPORTED_OS osx)
+ set(TYSAN_SUPPORTED_OS osx)
set(XRAY_SUPPORTED_OS osx)
set(FUZZER_SUPPORTED_OS osx)
set(ORC_SUPPORTED_OS)
@@ -581,6 +582,7 @@ if(APPLE)
list(APPEND FUZZER_SUPPORTED_OS ${platform})
list(APPEND ORC_SUPPORTED_OS ${platform})
list(APPEND UBSAN_SUPPORTED_OS ${platform})
+ list(APPEND TYSAN_SUPPORTED_OS ${platform})
list(APPEND LSAN_SUPPORTED_OS ${platform})
list(APPEND STATS_SUPPORTED_OS ${platform})
endif()
@@ -630,6 +632,9 @@ if(APPLE)
list_intersect(PROFILE_SUPPORTED_ARCH
ALL_PROFILE_SUPPORTED_ARCH
SANITIZER_COMMON_SUPPORTED_ARCH)
+ list_intersect(TYSAN_SUPPORTED_ARCH
+ ALL_TYSAN_SUPPORTED_ARCH
+ SANITIZER_COMMON_SUPPORTED_ARCH)
list_intersect(TSAN_SUPPORTED_ARCH
ALL_TSAN_SUPPORTED_ARCH
SANITIZER_COMMON_SUPPORTED_ARCH)
@@ -676,6 +681,7 @@ else()
filter_available_targets(HWASAN_SUPPORTED_ARCH ${ALL_HWASAN_SUPPORTED_ARCH})
filter_available_targets(MEMPROF_SUPPORTED_ARCH ${ALL_MEMPROF_SUPPORTED_ARCH})
filter_available_targets(PROFILE_SUPPORTED_ARCH ${ALL_PROFILE_SUPPORTED_ARCH})
+ filter_available_targets(TYSAN_SUPPORTED_ARCH ${ALL_TYSAN_SUPPORTED_ARCH})
filter_available_targets(TSAN_SUPPORTED_ARCH ${ALL_TSAN_SUPPORTED_ARCH})
filter_available_targets(UBSAN_SUPPORTED_ARCH ${ALL_UBSAN_SUPPORTED_ARCH})
filter_available_targets(SAFESTACK_SUPPORTED_ARCH
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH)
endif()
message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}")
-set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
+set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
set(COMPILER_RT_SANITIZERS_TO_BUILD all CACHE STRING
"sanitizers to build if supported on the target (all;${ALL_SANITIZERS})")
list_replace(COMPILER_RT_SANITIZERS_TO_BUILD all "${ALL_SANITIZERS}")
@@ -801,6 +807,13 @@ else()
set(COMPILER_RT_HAS_PROFILE FALSE)
endif()
+if (COMPILER_RT_HAS_SANITIZER_COMMON AND TYSAN_SUPPORTED_ARCH AND
+ OS_NAME MATCHES "Linux|Darwin")
+ set(COMPILER_RT_HAS_TYSAN TRUE)
+else()
+ set(COMPILER_RT_HAS_TYSAN FALSE)
+endif()
+
if (COMPILER_RT_HAS_SANITIZER_COMMON AND TSAN_SUPPORTED_ARCH)
if (OS_NAME MATCHES "Linux|Darwin|FreeBSD|NetBSD")
set(COMPILER_RT_HAS_TSAN TRUE)
diff --git a/compiler-rt/lib/tysan/CMakeLists.txt b/compiler-rt/lib/tysan/CMakeLists.txt
new file mode 100644
index 00000000000000..859b67928f004a
--- /dev/null
+++ b/compiler-rt/lib/tysan/CMakeLists.txt
@@ -0,0 +1,64 @@
+include_directories(..)
+
+# Runtime library sources and build flags.
+set(TYSAN_SOURCES
+ tysan.cpp
+ tysan_interceptors.cpp)
+set(TYSAN_COMMON_CFLAGS ${SANITIZER_COMMON_CFLAGS})
+append_rtti_flag(OFF TYSAN_COMMON_CFLAGS)
+# Prevent clang from generating libc calls.
+append_list_if(COMPILER_RT_HAS_FFREESTANDING_FLAG -ffreestanding TYSAN_COMMON_CFLAGS)
+
+add_compiler_rt_object_libraries(RTTysan_dynamic
+ OS ${SANITIZER_COMMON_SUPPORTED_OS}
+ ARCHS ${TYSAN_SUPPORTED_ARCH}
+ SOURCES ${TYSAN_SOURCES}
+ ADDITIONAL_HEADERS ${TYSAN_HEADERS}
+ CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+ DEFS ${TYSAN_DYNAMIC_DEFINITIONS})
+
+
+# Static runtime library.
+add_compiler_rt_component(tysan)
+
+
+if(APPLE)
+ add_weak_symbols("sanitizer_common" WEAK_SYMBOL_LINK_FLAGS)
+
+ add_compiler_rt_runtime(clang_rt.tysan
+ SHARED
+ OS ${SANITIZER_COMMON_SUPPORTED_OS}
+ ARCHS ${TYSAN_SUPPORTED_ARCH}
+ OBJECT_LIBS RTTysan_dynamic
+ RTInterception
+ RTSanitizerCommon
+ RTSanitizerCommonLibc
+ RTSanitizerCommonSymbolizer
+ CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+ LINK_FLAGS ${WEAK_SYMBOL_LINK_FLAGS}
+ DEFS ${TYSAN_DYNAMIC_DEFINITIONS}
+ PARENT_TARGET tysan)
+
+ add_compiler_rt_runtime(clang_rt.tysan_static
+ STATIC
+ ARCHS ${TYSAN_SUPPORTED_ARCH}
+ OBJECT_LIBS RTTysan_static
+ CFLAGS ${TYSAN_CFLAGS}
+ DEFS ${TYSAN_COMMON_DEFINITIONS}
+ PARENT_TARGET tysan)
+else()
+ foreach(arch ${TYSAN_SUPPORTED_ARCH})
+ set(TYSAN_CFLAGS ${TYSAN_COMMON_CFLAGS})
+ append_list_if(COMPILER_RT_HAS_FPIE_FLAG -fPIE TYSAN_CFLAGS)
+ add_compiler_rt_runtime(clang_rt.tysan
+ STATIC
+ ARCHS ${arch}
+ SOURCES ${TYSAN_SOURCES}
+ $<TARGET_OBJECTS:RTInterception.${arch}>
+ $<TARGET_OBJECTS:RTSanitizerCommon.${arch}>
+ $<TARGET_OBJECTS:RTSanitizerCommonLibc.${arch}>
+ $<TARGET_OBJECTS:RTSanitizerCommonSymbolizer.${arch}>
+ CFLAGS ${TYSAN_CFLAGS}
+ PARENT_TARGET tysan)
+ endforeach()
+endif()
diff --git a/compiler-rt/lib/tysan/lit.cfg b/compiler-rt/lib/tysan/lit.cfg
new file mode 100644
index 00000000000000..bd2bbe855529a7
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.cfg
@@ -0,0 +1,35 @@
+# -*- Python -*-
+
+import os
+
+# Setup config name.
+config.name = 'TypeSanitizer' + getattr(config, 'name_suffix', 'default')
+
+# Setup source root.
+config.test_source_root = os.path.dirname(__file__)
+
+# Setup default compiler flags used with -fsanitize=type option.
+clang_tysan_cflags = (["-fsanitize=type",
+ "-mno-omit-leaf-frame-pointer",
+ "-fno-omit-frame-pointer",
+ "-fno-optimize-sibling-calls"] +
+ [config.target_cflags] +
+ config.debug_info_flags)
+clang_tysan_cxxflags = config.cxx_mode_flags + clang_tysan_cflags
+
+def build_invocation(compile_flags):
+ return " " + " ".join([config.clang] + compile_flags) + " "
+
+config.substitutions.append( ("%clang_tysan ", build_invocation(clang_tysan_cflags)) )
+config.substitutions.append( ("%clangxx_tysan ", build_invocation(clang_tysan_cxxflags)) )
+
+# Default test suffixes.
+config.suffixes = ['.c', '.cc', '.cpp']
+
+# TypeSanitizer tests are currently supported on Linux only.
+if config.host_os not in ['Linux']:
+ config.unsupported = True
+
+if config.target_arch != 'aarch64':
+ config.available_features.add('stable-runtime')
+
diff --git a/compiler-rt/lib/tysan/lit.site.cfg.in b/compiler-rt/lib/tysan/lit.site.cfg.in
new file mode 100644
index 00000000000000..673d04e514379b
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.site.cfg.in
@@ -0,0 +1,12 @@
+@LIT_SITE_CFG_IN_HEADER@
+
+# Tool-specific config options.
+config.name_suffix = "@TYSAN_TEST_CONFIG_SUFFIX@"
+config.target_cflags = "@TYSAN_TEST_TARGET_CFLAGS@"
+config.target_arch = "@TYSAN_TEST_TARGET_ARCH@"
+
+# Load common config for all compiler-rt lit tests.
+lit_config.load_config(config, "@COMPILER_RT_BINARY_DIR@/test/lit.common.configured")
+
+# Load tool-specific config that would do the real work.
+lit_config.load_config(config, "@TYSAN_LIT_SOURCE_DIR@/lit.cfg")
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
new file mode 100644
index 00000000000000..f627851d049e6a
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -0,0 +1,339 @@
+//===-- tysan.cpp ---------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// TypeSanitizer runtime.
+//===----------------------------------------------------------------------===//
+
+#include "sanitizer_common/sanitizer_atomic.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_flag_parser.h"
+#include "sanitizer_common/sanitizer_flags.h"
+#include "sanitizer_common/sanitizer_libc.h"
+#include "sanitizer_common/sanitizer_report_decorator.h"
+#include "sanitizer_common/sanitizer_stacktrace.h"
+#include "sanitizer_common/sanitizer_symbolizer.h"
+
+#include "tysan/tysan.h"
+
+using namespace __sanitizer;
+using namespace __tysan;
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_set_type_unknown(const void *addr, uptr size) {
+ if (tysan_inited)
+ internal_memset(shadow_for(addr), 0, size * sizeof(uptr));
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_copy_types(const void *daddr, const void *saddr, uptr size) {
+ if (tysan_inited)
+ internal_memmove(shadow_for(daddr), shadow_for(saddr), size * sizeof(uptr));
+}
+
+static const char *getDisplayName(const char *Name) {
+ if (Name[0] == '\0')
+ return "<anonymous type>";
+
+ // Clang generates tags for C++ types that demangle as typeinfo. Remove the
+ // prefix from the generated string.
+ const char TIPrefix[] = "typeinfo name for ";
+
+ const char *DName = Symbolizer::GetOrInit()->Demangle(Name);
+ if (!internal_strncmp(DName, TIPrefix, sizeof(TIPrefix) - 1))
+ DName += sizeof(TIPrefix) - 1;
+
+ return DName;
+}
+
+static void printTDName(tysan_type_descriptor *td) {
+ if (((sptr)td) <= 0) {
+ Printf("<unknown type>");
+ return;
+ }
+
+ switch (td->Tag) {
+ default:
+ DCHECK(0);
+ break;
+ case TYSAN_MEMBER_TD:
+ printTDName(td->Member.Access);
+ if (td->Member.Access != td->Member.Base) {
+ Printf(" (in ");
+ printTDName(td->Member.Base);
+ Printf(" at offset %zu)", td->Member.Offset);
+ }
+ break;
+ case TYSAN_STRUCT_TD:
+ Printf("%s", getDisplayName(
+ (char *)(td->Struct.Members + td->Struct.MemberCount)));
+ break;
+ }
+}
+
+static tysan_type_descriptor *getRootTD(tysan_type_descriptor *TD) {
+ tysan_type_descriptor *RootTD = TD;
+
+ do {
+ RootTD = TD;
+
+ if (TD->Tag == TYSAN_STRUCT_TD) {
+ if (TD->Struct.MemberCount > 0)
+ TD = TD->Struct.Members[0].Type;
+ else
+ TD = nullptr;
+ } else if (TD->Tag == TYSAN_MEMBER_TD) {
+ TD = TD->Member.Access;
+ } else {
+ DCHECK(0);
+ break;
+ }
+ } while (TD);
+
+ return RootTD;
+}
+
+static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
+ tysan_type_descriptor *TDB) {
+ // Walk up the tree starting with TDA to see if we reach TDB.
+ uptr OffsetA = 0, OffsetB = 0;
+ if (TDB->Tag == TYSAN_MEMBER_TD) {
+ OffsetB = TDB->Member.Offset;
+ TDB = TDB->Member.Base;
+ }
+
+ if (TDA->Tag == TYSAN_MEMBER_TD) {
+ OffsetA = TDA->Member.Offset;
+ TDA = TDA->Member.Base;
+ }
+
+ do {
+ if (TDA == TDB)
+ return OffsetA == OffsetB;
+
+ if (TDA->Tag == TYSAN_STRUCT_TD) {
+ // Reached root type descriptor.
+ if (!TDA->Struct.MemberCount)
+ break;
+
+ uptr Idx = 0;
+ for (; Idx < TDA->Struct.MemberCount - 1; ++Idx) {
+ if (TDA->Struct.Members[Idx].Offset >= OffsetA)
+ break;
+ }
+
+ OffsetA -= TDA->Struct.Members[Idx].Offset;
+ TDA = TDA->Struct.Members[Idx].Type;
+ } else {
+ DCHECK(0);
+ break;
+ }
+ } while (TDA);
+
+ return false;
+}
+
+static bool isAliasingLegal(tysan_type_descriptor *TDA,
+ tysan_type_descriptor *TDB) {
+ if (TDA == TDB || !TDB || !TDA)
+ return true;
+
+ // Aliasing is legal is the two types have different root nodes.
+ if (getRootTD(TDA) != getRootTD(TDB))
+ return true;
+
+ return isAliasingLegalUp(TDA, TDB) || isAliasingLegalUp(TDB, TDA);
+}
+
+namespace __tysan {
+class Decorator : public __sanitizer::SanitizerCommonDecorator {
+public:
+ Decorator() : SanitizerCommonDecorator() {}
+ const char *Warning() { return Red(); }
+ const char *Name() { return Green(); }
+ const char *End() { return Default(); }
+};
+} // namespace __tysan
+
+ALWAYS_INLINE
+static void reportError(void *Addr, int Size, tysan_type_descriptor *TD,
+ tysan_type_descriptor *OldTD, const char *AccessStr,
+ const char *DescStr, int Offset, uptr pc, uptr bp,
+ uptr sp) {
+ Decorator d;
+ Printf("%s", d.Warning());
+ Report("ERROR: TypeSanitizer: type-aliasing-violation on address %p"
+ " (pc %p bp %p sp %p tid %llu)\n",
+ Addr, (void *)pc, (void *)bp, (void *)sp, GetTid());
+ Printf("%s", d.End());
+ Printf("%s of size %d at %p with type ", AccessStr, Size, Addr);
+
+ Printf("%s", d.Name());
+ printTDName(TD);
+ Printf("%s", d.End());
+
+ Printf(" %s of type ", DescStr);
+
+ Printf("%s", d.Name());
+ printTDName(OldTD);
+ Printf("%s", d.End());
+
+ if (Offset != 0)
+ Printf(" that starts at offset %d\n", Offset);
+ else
+ Printf("\n");
+
+ if (pc) {
+
+ bool request_fast = StackTrace::WillUseFastUnwind(true);
+ BufferedStackTrace ST;
+ ST.Unwind(kStackTraceMax, pc, bp, 0, 0, 0, request_fast);
+ ST.Print();
+ } else {
+ Printf("\n");
+ }
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+__tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) {
+ GET_CALLER_PC_BP_SP;
+
+ bool IsRead = flags & 1;
+ bool IsWrite = flags & 2;
+ const char *AccessStr;
+ if (IsRead && !IsWrite)
+ AccessStr = "READ";
+ else if (!IsRead && IsWrite)
+ AccessStr = "WRITE";
+ else
+ AccessStr = "ATOMIC UPDATE";
+
+ tysan_type_descriptor **OldTDPtr = shadow_for(addr);
+ tysan_type_descriptor *OldTD = *OldTDPtr;
+ if (((sptr)OldTD) < 0) {
+ int i = -((sptr)OldTD);
+ OldTDPtr -= i;
+ OldTD = *OldTDPtr;
+
+ if (!isAliasingLegal(td, OldTD))
+ reportError(addr, size, td, OldTD, AccessStr,
+ "accesses part of an existing object", -i, pc, bp, sp);
+
+ return;
+ }
+
+ if (!isAliasingLegal(td, OldTD)) {
+ reportError(addr, size, td, OldTD, AccessStr, "accesses an existing object",
+ 0, pc, bp, sp);
+ return;
+ }
+
+ // These types are allowed to alias (or the stored type is unknown), report
+ // an error if we find an interior type.
+
+ for (int i = 0; i < size; ++i) {
+ OldTDPtr = shadow_for((void *)(((uptr)addr) + i));
+ OldTD = *OldTDPtr;
+ if (((sptr)OldTD) >= 0 && !isAliasingLegal(td, OldTD))
+ reportError(addr, size, td, OldTD, AccessStr,
+ "partially accesses an object", i, pc, bp, sp);
+ }
+}
+
+Flags __tysan::flags_data;
+
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_shadow_memory_address;
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_app_memory_mask;
+
+#ifdef TYSAN_RUNTIME_VMA
+// Runtime detected VMA size.
+int __tysan::vmaSize;
+#endif
+
+void Flags::SetDefaults() {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description) Name = DefaultValue;
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void RegisterTySanFlags(FlagParser *parser, Flags *f) {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description) \
+ RegisterFlag(parser, #Name, Description, &f->Name);
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void InitializeFlags() {
+ SetCommonFlagsDefaults();
+ {
+ CommonFlags cf;
+ cf.CopyFrom(*common_flags());
+ cf.external_symbolizer_path = GetEnv("TYSAN_SYMBOLIZER_PATH");
+ OverrideCommonFlags(cf);
+ }
+
+ flags().SetDefaults();
+
+ FlagParser parser;
+ RegisterCommonFlags(&parser);
+ RegisterTySanFlags(&parser, &flags());
+ parser.ParseString(GetEnv("TYSAN_OPTIONS"));
+ InitializeCommonFlags();
+ if (Verbosity())
+ ReportUnrecognizedFlags();
+ if (common_flags()->help)
+ parser.PrintFlagDescriptions();
+}
+
+static void TySanInitializePlatformEarly() {
+ AvoidCVE_2016_2143();
+#ifdef TYSAN_RUNTIME_VMA
+ vmaSize = (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1);
+#if defined(__aarch64__) && !SANITIZER_APPLE
+ if (vmaSize != 39 && vmaSize != 42 && vmaSize != 48) {
+ Printf("FATAL: TypeSanitizer: unsupported VMA range\n");
+ Printf("FATAL: Found %d - Supported 39, 42 and 48\n", vmaSize);
+ Die();
+ }
+#endif
+#endif
+
+ __sanitizer::InitializePlatformEarly();
+
+ __tysan_shadow_memory_address = ShadowAddr();
+ __tysan_app_memory_mask = AppMask();
+}
+
+namespace __tysan {
+bool tysan_inited = false;
+bool tysan_init_is_running;
+} // namespace __tysan
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __tysan_init() {
+ CHECK(!tysan_init_is_running);
+ if (tysan_inited)
+ return;
+ tysan_init_is_running = true;
+
+ InitializeFlags();
+ TySanInitializePlatformEarly();
+
+ InitializeInterceptors();
+
+ if (!MmapFixedNoReserve(ShadowAddr(), AppAddr() - ShadowAddr()))
+ Die();
+
+ tysan_init_is_running = false;
+ tysan_inited = true;
+}
+
+#if SANITIZER_CAN_USE_PREINIT_ARRAY
+__attribute__((section(".preinit_array"),
+ used)) static void (*tysan_init_ptr)() = __tysan_init;
+#endif
diff --git a/compiler-rt/lib/tysan/tysan.h b/compiler-rt/lib/tysan/tysan.h
new file mode 100644
index 00000000000000..ec6f9587e9ce58
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.h
@@ -0,0 +1,79 @@
+//===-- tysan.h -------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// Private TySan header.
+//===----------------------------------------------------------------------===//
+
+#ifndef TYSAN_H
+#define TYSAN_H
+
+#include "sanitizer_common/sanitizer_internal_defs.h"
+
+using __sanitizer::sptr;
+using __sanitizer::u16;
+using __sanitizer::uptr;
+
+#include "tysan_platform.h"
+
+extern "C" {
+void tysan_set_type_unknown(const void *addr, uptr size);
+void tysan_copy_types(const void *daddr, const void *saddr, uptr size);
+}
+
+namespace __tysan {
+extern bool tysan_inited;
+extern bool tysan_init_is_running;
+
+void InitializeInterceptors();
+
+enum { TYSAN_MEMBER_TD = 1, TYSAN_STRUCT_TD = 2 };
+
+struct tysan_member_type_descriptor {
+ struct tysan_type_descriptor *Base;
+ struct tysan_type_descriptor *Access;
+ uptr Offset;
+};
+
+struct tysan_struct_type_descriptor {
+ uptr MemberCount;
+ struct {
+ struct tysan_type_descriptor *Type;
+ uptr Offset;
+ } Members[1]; // Tail allocated.
+ // char Name[]; // Tail allocated.
+};
+
+struct tysan_type_descriptor {
+ uptr Tag;
+ union {
+ tysan_member_type_descriptor Member;
+ tysan_struct_type_descriptor Struct;
+ };
+};
+
+inline tysan_type_descriptor **shadow_for(const void *ptr) {
+ return (tysan_type_descriptor **)((((uptr)ptr) & AppMask()) * sizeof(ptr) +
+ ShadowAddr());
+}
+
+struct Flags {
+#define TYSAN_FLAG(Type, Name, Defaul...
[truncated]
|
✅ With the latest revision this PR passed the Python code formatter. |
compiler-rt/cmake/config-ix.cmake
Outdated
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH) | |||
endif() | |||
message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}") | |||
|
|||
set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi) | |||
set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ tysan;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated!
Added compiler-rt tests for various strict-aliasing violations from the bug tracker I found. |
ee7ed21
to
fb7da09
Compare
e2caf4e
to
733b3ed
Compare
I also needed diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index ab13d8c03683..f480083231a1 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -677,6 +677,7 @@ else()
filter_available_targets(PROFILE_SUPPORTED_ARCH ${ALL_PROFILE_SUPPORTED_ARCH})
filter_available_targets(CTX_PROFILE_SUPPORTED_ARCH ${ALL_CTX_PROFILE_SUPPORTED_ARCH})
filter_available_targets(TSAN_SUPPORTED_ARCH ${ALL_TSAN_SUPPORTED_ARCH})
+ filter_available_targets(TYSAN_SUPPORTED_ARCH ${ALL_TYSAN_SUPPORTED_ARCH})
filter_available_targets(UBSAN_SUPPORTED_ARCH ${ALL_UBSAN_SUPPORTED_ARCH})
filter_available_targets(SAFESTACK_SUPPORTED_ARCH
${ALL_SAFESTACK_SUPPORTED_ARCH}) to build the runtime on LInux. |
fb7da09
to
9cbb116
Compare
733b3ed
to
8e6e62d
Compare
The latest version also includes a new test case |
The Clang and LLVM parts are now ready to land, with just the compiler-rt bits pending approval. There are a few test failures on Linux which don't fail on macOS, will look at them today probably |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Test failures on linux should be fixed now, they were due to debug paths using full paths on Linux and in one case a C++ struct type wasnt' de-mangled on Linux |
ping :) |
compiler-rt/lib/tysan/tysan.cpp
Outdated
|
||
// Clang generates tags for C++ types that demangle as typeinfo. Remove the | ||
// prefix from the generated string. | ||
const char TIPrefix[] = "typeinfo name for "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we make this a const char*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done together with using strlen
thanks
compiler-rt/lib/tysan/tysan.cpp
Outdated
|
||
const char *DName = Symbolizer::GetOrInit()->Demangle(Name); | ||
if (!internal_strncmp(DName, TIPrefix, sizeof(TIPrefix) - 1)) | ||
DName += sizeof(TIPrefix) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strlen? the compiler should be able to optimize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
compiler-rt/lib/tysan/lit.cfg
Outdated
"-mno-omit-leaf-frame-pointer", | ||
"-fno-omit-frame-pointer", | ||
"-fno-optimize-sibling-calls"] + | ||
[config.target_cflags] + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, target_cflags
sounds like a list
compiler-rt/lib/tysan/tysan.cpp
Outdated
} else if (TD->Tag == TYSAN_MEMBER_TD) { | ||
TD = TD->Member.Access; | ||
} else { | ||
DCHECK(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DCHECK(false && "invalid enum value")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
compiler-rt/test/tysan/lit.cfg.py
Outdated
import shlex | ||
|
||
sh_quote = shlex.quote | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ImportError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks
compiler-rt/test/tysan/lit.cfg.py
Outdated
|
||
def get_required_attr(config, attr_name): | ||
attr_value = getattr(config, attr_name, None) | ||
if attr_value == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to check if attr_value
and return value early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just confirming it is expected that empty string, or false, or something like that will now fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted to return if not None, thanks
This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (#76260) and compiler-rt changes (#76261) PR: #76259
This patch introduces the Clang components of type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. The Clang changes are mostly formulaic, the one specific change being that when the TBAA sanitizer is enabled, TBAA is always generated, even at -O0. It goes together with the corresponding LLVM changes (#76259) and compiler-rt changes (#76261) PR: #76260
…ype-sanitizer-runtime-library
@fhahn I'm getting:
while compiling |
AFAICT this is happening at each place that includes the file, e.g. also if you build AddressSanitizer. Not sure what the best way forward here is as this is not related to TypeSanitizer, but seems to impact all sanitizers? |
I just posted #120314. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there are no release notes or manual b/c this is not ready to be used by general users on the command line and that will come via another PR?
|
||
char *mem = (char *)malloc(4 * sizeof(short)); | ||
memset(mem, 0, 4 * sizeof(short)); | ||
*(short *)(mem + 2) = x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I am reading the diagnostics correctly it looks like writes to malloc'ed memory does implicit object creation and subsequent read/write will be flagged based on that?
Some more explanatory comments in this test like you have in other tests would be super helpful.
u.i = 42; | ||
u.s = 1; | ||
|
||
printf("%d\n", u.i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++ this would be a aliasing violation, can we catch that in C++?
I added a release note as part of the Clang patch to the Are there sanitizer-specific release notes where this should also be added? |
This patch introduces the runtime components for type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32197. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. This includes build fixes for Linux from Mingjie Xu. Depends on #76260 (Clang support), #76259 (LLVM support) PR: llvm/llvm-project#76261
@fhahn can TySan be made to trap when it detects a problem? I tried
|
If we want TySan to stop and abort when it detects first type-based aliasing violation, I would suggest implement |
This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.
C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.
For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.
The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.
The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.
The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.
Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.
As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.
(This includes build fixes for Linux from Mingjie Xu)
Based on https://reviews.llvm.org/D32197.
Depends on #76260 (Clang support), #76259 (LLVM support)