Skip to content

[TySan] A Type Sanitizer (Runtime Library) #76261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 17, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Dec 22, 2023

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.

Depends on #76260 (Clang support), #76259 (LLVM support)

@llvmbot llvmbot added clang Clang issues not falling into any other category compiler-rt compiler-rt:sanitizer labels Dec 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 22, 2023

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-compiler-rt-sanitizer

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

This patch introduces the runtime components of a type sanitizer: a sanitizer for type-based aliasing violations.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer.

As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work.

(This includes build fixes for Linux from Mingjie Xu)

Based on https://reviews.llvm.org/D32197.


Patch is 53.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76261.diff

28 Files Affected:

  • (modified) clang/runtime/CMakeLists.txt (+1-1)
  • (modified) compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake (+1)
  • (modified) compiler-rt/cmake/config-ix.cmake (+14-1)
  • (added) compiler-rt/lib/tysan/CMakeLists.txt (+64)
  • (added) compiler-rt/lib/tysan/lit.cfg (+35)
  • (added) compiler-rt/lib/tysan/lit.site.cfg.in (+12)
  • (added) compiler-rt/lib/tysan/tysan.cpp (+339)
  • (added) compiler-rt/lib/tysan/tysan.h (+79)
  • (added) compiler-rt/lib/tysan/tysan.syms.extra (+2)
  • (added) compiler-rt/lib/tysan/tysan_flags.inc (+17)
  • (added) compiler-rt/lib/tysan/tysan_interceptors.cpp (+250)
  • (added) compiler-rt/lib/tysan/tysan_platform.h (+93)
  • (added) compiler-rt/lib/tysan/weak_symbols.txt ()
  • (added) compiler-rt/test/tysan/CMakeLists.txt (+32)
  • (added) compiler-rt/test/tysan/anon-ns.cpp (+41)
  • (added) compiler-rt/test/tysan/anon-same-struct.c (+26)
  • (added) compiler-rt/test/tysan/anon-struct.c (+27)
  • (added) compiler-rt/test/tysan/basic.c (+65)
  • (added) compiler-rt/test/tysan/char-memcpy.c (+45)
  • (added) compiler-rt/test/tysan/global.c (+31)
  • (added) compiler-rt/test/tysan/int-long.c (+21)
  • (added) compiler-rt/test/tysan/lit.cfg.py (+139)
  • (added) compiler-rt/test/tysan/lit.site.cfg.py.in (+17)
  • (added) compiler-rt/test/tysan/ptr-float.c (+19)
  • (added) compiler-rt/test/tysan/struct-offset-multiple-compilation-units.cpp (+51)
  • (added) compiler-rt/test/tysan/struct-offset.c (+26)
  • (added) compiler-rt/test/tysan/struct.c (+39)
  • (added) compiler-rt/test/tysan/union-wr-wr.c (+18)
diff --git a/clang/runtime/CMakeLists.txt b/clang/runtime/CMakeLists.txt
index 65fcdc2868f031..ff2605b23d25b0 100644
--- a/clang/runtime/CMakeLists.txt
+++ b/clang/runtime/CMakeLists.txt
@@ -122,7 +122,7 @@ if(LLVM_BUILD_EXTERNAL_COMPILER_RT AND EXISTS ${COMPILER_RT_SRC_ROOT}/)
                            COMPONENT compiler-rt)
 
   # Add top-level targets that build specific compiler-rt runtimes.
-  set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan ubsan ubsan-minimal)
+  set(COMPILER_RT_RUNTIMES fuzzer asan builtins dfsan lsan msan profile tsan tysan ubsan ubsan-minimal)
   foreach(runtime ${COMPILER_RT_RUNTIMES})
     get_ext_project_build_command(build_runtime_cmd ${runtime})
     add_custom_target(${runtime}
diff --git a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
index 416777171d2ca7..66ebee4392e1ff 100644
--- a/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
+++ b/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake
@@ -67,6 +67,7 @@ set(ALL_PROFILE_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${PPC32} ${PPC
     ${RISCV32} ${RISCV64} ${LOONGARCH64})
 set(ALL_TSAN_SUPPORTED_ARCH ${X86_64} ${MIPS64} ${ARM64} ${PPC64} ${S390X}
     ${LOONGARCH64} ${RISCV64})
+set(ALL_TYSAN_SUPPORTED_ARCH ${X86_64} ${ARM64})
 set(ALL_UBSAN_SUPPORTED_ARCH ${X86} ${X86_64} ${ARM32} ${ARM64} ${RISCV64}
     ${MIPS32} ${MIPS64} ${PPC64} ${S390X} ${SPARC} ${SPARCV9} ${HEXAGON}
     ${LOONGARCH64})
diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index 2dccd4954b2537..e448e71e5422a1 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -448,6 +448,7 @@ if(APPLE)
   set(SANITIZER_COMMON_SUPPORTED_OS osx)
   set(PROFILE_SUPPORTED_OS osx)
   set(TSAN_SUPPORTED_OS osx)
+  set(TYSAN_SUPPORTED_OS osx)
   set(XRAY_SUPPORTED_OS osx)
   set(FUZZER_SUPPORTED_OS osx)
   set(ORC_SUPPORTED_OS)
@@ -581,6 +582,7 @@ if(APPLE)
           list(APPEND FUZZER_SUPPORTED_OS ${platform})
           list(APPEND ORC_SUPPORTED_OS ${platform})
           list(APPEND UBSAN_SUPPORTED_OS ${platform})
+          list(APPEND TYSAN_SUPPORTED_OS ${platform})
           list(APPEND LSAN_SUPPORTED_OS ${platform})
           list(APPEND STATS_SUPPORTED_OS ${platform})
         endif()
@@ -630,6 +632,9 @@ if(APPLE)
   list_intersect(PROFILE_SUPPORTED_ARCH
     ALL_PROFILE_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
+  list_intersect(TYSAN_SUPPORTED_ARCH
+    ALL_TYSAN_SUPPORTED_ARCH
+    SANITIZER_COMMON_SUPPORTED_ARCH)
   list_intersect(TSAN_SUPPORTED_ARCH
     ALL_TSAN_SUPPORTED_ARCH
     SANITIZER_COMMON_SUPPORTED_ARCH)
@@ -676,6 +681,7 @@ else()
   filter_available_targets(HWASAN_SUPPORTED_ARCH ${ALL_HWASAN_SUPPORTED_ARCH})
   filter_available_targets(MEMPROF_SUPPORTED_ARCH ${ALL_MEMPROF_SUPPORTED_ARCH})
   filter_available_targets(PROFILE_SUPPORTED_ARCH ${ALL_PROFILE_SUPPORTED_ARCH})
+  filter_available_targets(TYSAN_SUPPORTED_ARCH ${ALL_TYSAN_SUPPORTED_ARCH})
   filter_available_targets(TSAN_SUPPORTED_ARCH ${ALL_TSAN_SUPPORTED_ARCH})
   filter_available_targets(UBSAN_SUPPORTED_ARCH ${ALL_UBSAN_SUPPORTED_ARCH})
   filter_available_targets(SAFESTACK_SUPPORTED_ARCH
@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH)
 endif()
 message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}")
 
-set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
+set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
 set(COMPILER_RT_SANITIZERS_TO_BUILD all CACHE STRING
     "sanitizers to build if supported on the target (all;${ALL_SANITIZERS})")
 list_replace(COMPILER_RT_SANITIZERS_TO_BUILD all "${ALL_SANITIZERS}")
@@ -801,6 +807,13 @@ else()
   set(COMPILER_RT_HAS_PROFILE FALSE)
 endif()
 
+if (COMPILER_RT_HAS_SANITIZER_COMMON AND TYSAN_SUPPORTED_ARCH AND
+        OS_NAME MATCHES "Linux|Darwin")
+  set(COMPILER_RT_HAS_TYSAN TRUE)
+else()
+  set(COMPILER_RT_HAS_TYSAN FALSE)
+endif()
+
 if (COMPILER_RT_HAS_SANITIZER_COMMON AND TSAN_SUPPORTED_ARCH)
   if (OS_NAME MATCHES "Linux|Darwin|FreeBSD|NetBSD")
     set(COMPILER_RT_HAS_TSAN TRUE)
diff --git a/compiler-rt/lib/tysan/CMakeLists.txt b/compiler-rt/lib/tysan/CMakeLists.txt
new file mode 100644
index 00000000000000..859b67928f004a
--- /dev/null
+++ b/compiler-rt/lib/tysan/CMakeLists.txt
@@ -0,0 +1,64 @@
+include_directories(..)
+
+# Runtime library sources and build flags.
+set(TYSAN_SOURCES
+  tysan.cpp
+  tysan_interceptors.cpp)
+set(TYSAN_COMMON_CFLAGS ${SANITIZER_COMMON_CFLAGS})
+append_rtti_flag(OFF TYSAN_COMMON_CFLAGS)
+# Prevent clang from generating libc calls.
+append_list_if(COMPILER_RT_HAS_FFREESTANDING_FLAG -ffreestanding TYSAN_COMMON_CFLAGS)
+
+add_compiler_rt_object_libraries(RTTysan_dynamic
+  OS ${SANITIZER_COMMON_SUPPORTED_OS}
+  ARCHS ${TYSAN_SUPPORTED_ARCH}
+  SOURCES ${TYSAN_SOURCES}
+  ADDITIONAL_HEADERS ${TYSAN_HEADERS}
+  CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+  DEFS ${TYSAN_DYNAMIC_DEFINITIONS})
+
+
+# Static runtime library.
+add_compiler_rt_component(tysan)
+
+
+if(APPLE)
+  add_weak_symbols("sanitizer_common" WEAK_SYMBOL_LINK_FLAGS)
+
+  add_compiler_rt_runtime(clang_rt.tysan
+    SHARED
+    OS ${SANITIZER_COMMON_SUPPORTED_OS}
+    ARCHS ${TYSAN_SUPPORTED_ARCH}
+    OBJECT_LIBS RTTysan_dynamic
+                RTInterception
+                RTSanitizerCommon
+                RTSanitizerCommonLibc
+                RTSanitizerCommonSymbolizer
+    CFLAGS ${TYSAN_DYNAMIC_CFLAGS}
+    LINK_FLAGS ${WEAK_SYMBOL_LINK_FLAGS}
+    DEFS ${TYSAN_DYNAMIC_DEFINITIONS}
+    PARENT_TARGET tysan)
+
+  add_compiler_rt_runtime(clang_rt.tysan_static
+    STATIC
+    ARCHS ${TYSAN_SUPPORTED_ARCH}
+    OBJECT_LIBS RTTysan_static
+    CFLAGS ${TYSAN_CFLAGS}
+    DEFS ${TYSAN_COMMON_DEFINITIONS}
+    PARENT_TARGET tysan)
+else()
+  foreach(arch ${TYSAN_SUPPORTED_ARCH})
+    set(TYSAN_CFLAGS ${TYSAN_COMMON_CFLAGS})
+    append_list_if(COMPILER_RT_HAS_FPIE_FLAG -fPIE TYSAN_CFLAGS)
+    add_compiler_rt_runtime(clang_rt.tysan
+      STATIC
+      ARCHS ${arch}
+      SOURCES ${TYSAN_SOURCES}
+              $<TARGET_OBJECTS:RTInterception.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommon.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommonLibc.${arch}>
+              $<TARGET_OBJECTS:RTSanitizerCommonSymbolizer.${arch}>
+      CFLAGS ${TYSAN_CFLAGS}
+      PARENT_TARGET tysan)
+  endforeach()
+endif()
diff --git a/compiler-rt/lib/tysan/lit.cfg b/compiler-rt/lib/tysan/lit.cfg
new file mode 100644
index 00000000000000..bd2bbe855529a7
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.cfg
@@ -0,0 +1,35 @@
+# -*- Python -*-
+
+import os
+
+# Setup config name.
+config.name = 'TypeSanitizer' + getattr(config, 'name_suffix', 'default')
+
+# Setup source root.
+config.test_source_root = os.path.dirname(__file__)
+
+# Setup default compiler flags used with -fsanitize=type option.
+clang_tysan_cflags = (["-fsanitize=type",
+                      "-mno-omit-leaf-frame-pointer",
+                      "-fno-omit-frame-pointer",
+                      "-fno-optimize-sibling-calls"] +
+                      [config.target_cflags] +
+                      config.debug_info_flags)
+clang_tysan_cxxflags = config.cxx_mode_flags + clang_tysan_cflags
+
+def build_invocation(compile_flags):
+  return " " + " ".join([config.clang] + compile_flags) + " "
+
+config.substitutions.append( ("%clang_tysan ", build_invocation(clang_tysan_cflags)) )
+config.substitutions.append( ("%clangxx_tysan ", build_invocation(clang_tysan_cxxflags)) )
+
+# Default test suffixes.
+config.suffixes = ['.c', '.cc', '.cpp']
+
+# TypeSanitizer tests are currently supported on Linux only.
+if config.host_os not in ['Linux']:
+  config.unsupported = True
+
+if config.target_arch != 'aarch64':
+  config.available_features.add('stable-runtime')
+
diff --git a/compiler-rt/lib/tysan/lit.site.cfg.in b/compiler-rt/lib/tysan/lit.site.cfg.in
new file mode 100644
index 00000000000000..673d04e514379b
--- /dev/null
+++ b/compiler-rt/lib/tysan/lit.site.cfg.in
@@ -0,0 +1,12 @@
+@LIT_SITE_CFG_IN_HEADER@
+
+# Tool-specific config options.
+config.name_suffix = "@TYSAN_TEST_CONFIG_SUFFIX@"
+config.target_cflags = "@TYSAN_TEST_TARGET_CFLAGS@"
+config.target_arch = "@TYSAN_TEST_TARGET_ARCH@"
+
+# Load common config for all compiler-rt lit tests.
+lit_config.load_config(config, "@COMPILER_RT_BINARY_DIR@/test/lit.common.configured")
+
+# Load tool-specific config that would do the real work.
+lit_config.load_config(config, "@TYSAN_LIT_SOURCE_DIR@/lit.cfg")
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
new file mode 100644
index 00000000000000..f627851d049e6a
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -0,0 +1,339 @@
+//===-- tysan.cpp ---------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// TypeSanitizer runtime.
+//===----------------------------------------------------------------------===//
+
+#include "sanitizer_common/sanitizer_atomic.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_flag_parser.h"
+#include "sanitizer_common/sanitizer_flags.h"
+#include "sanitizer_common/sanitizer_libc.h"
+#include "sanitizer_common/sanitizer_report_decorator.h"
+#include "sanitizer_common/sanitizer_stacktrace.h"
+#include "sanitizer_common/sanitizer_symbolizer.h"
+
+#include "tysan/tysan.h"
+
+using namespace __sanitizer;
+using namespace __tysan;
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_set_type_unknown(const void *addr, uptr size) {
+  if (tysan_inited)
+    internal_memset(shadow_for(addr), 0, size * sizeof(uptr));
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+tysan_copy_types(const void *daddr, const void *saddr, uptr size) {
+  if (tysan_inited)
+    internal_memmove(shadow_for(daddr), shadow_for(saddr), size * sizeof(uptr));
+}
+
+static const char *getDisplayName(const char *Name) {
+  if (Name[0] == '\0')
+    return "<anonymous type>";
+
+  // Clang generates tags for C++ types that demangle as typeinfo. Remove the
+  // prefix from the generated string.
+  const char TIPrefix[] = "typeinfo name for ";
+
+  const char *DName = Symbolizer::GetOrInit()->Demangle(Name);
+  if (!internal_strncmp(DName, TIPrefix, sizeof(TIPrefix) - 1))
+    DName += sizeof(TIPrefix) - 1;
+
+  return DName;
+}
+
+static void printTDName(tysan_type_descriptor *td) {
+  if (((sptr)td) <= 0) {
+    Printf("<unknown type>");
+    return;
+  }
+
+  switch (td->Tag) {
+  default:
+    DCHECK(0);
+    break;
+  case TYSAN_MEMBER_TD:
+    printTDName(td->Member.Access);
+    if (td->Member.Access != td->Member.Base) {
+      Printf(" (in ");
+      printTDName(td->Member.Base);
+      Printf(" at offset %zu)", td->Member.Offset);
+    }
+    break;
+  case TYSAN_STRUCT_TD:
+    Printf("%s", getDisplayName(
+                     (char *)(td->Struct.Members + td->Struct.MemberCount)));
+    break;
+  }
+}
+
+static tysan_type_descriptor *getRootTD(tysan_type_descriptor *TD) {
+  tysan_type_descriptor *RootTD = TD;
+
+  do {
+    RootTD = TD;
+
+    if (TD->Tag == TYSAN_STRUCT_TD) {
+      if (TD->Struct.MemberCount > 0)
+        TD = TD->Struct.Members[0].Type;
+      else
+        TD = nullptr;
+    } else if (TD->Tag == TYSAN_MEMBER_TD) {
+      TD = TD->Member.Access;
+    } else {
+      DCHECK(0);
+      break;
+    }
+  } while (TD);
+
+  return RootTD;
+}
+
+static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
+                              tysan_type_descriptor *TDB) {
+  // Walk up the tree starting with TDA to see if we reach TDB.
+  uptr OffsetA = 0, OffsetB = 0;
+  if (TDB->Tag == TYSAN_MEMBER_TD) {
+    OffsetB = TDB->Member.Offset;
+    TDB = TDB->Member.Base;
+  }
+
+  if (TDA->Tag == TYSAN_MEMBER_TD) {
+    OffsetA = TDA->Member.Offset;
+    TDA = TDA->Member.Base;
+  }
+
+  do {
+    if (TDA == TDB)
+      return OffsetA == OffsetB;
+
+    if (TDA->Tag == TYSAN_STRUCT_TD) {
+      // Reached root type descriptor.
+      if (!TDA->Struct.MemberCount)
+        break;
+
+      uptr Idx = 0;
+      for (; Idx < TDA->Struct.MemberCount - 1; ++Idx) {
+        if (TDA->Struct.Members[Idx].Offset >= OffsetA)
+          break;
+      }
+
+      OffsetA -= TDA->Struct.Members[Idx].Offset;
+      TDA = TDA->Struct.Members[Idx].Type;
+    } else {
+      DCHECK(0);
+      break;
+    }
+  } while (TDA);
+
+  return false;
+}
+
+static bool isAliasingLegal(tysan_type_descriptor *TDA,
+                            tysan_type_descriptor *TDB) {
+  if (TDA == TDB || !TDB || !TDA)
+    return true;
+
+  // Aliasing is legal is the two types have different root nodes.
+  if (getRootTD(TDA) != getRootTD(TDB))
+    return true;
+
+  return isAliasingLegalUp(TDA, TDB) || isAliasingLegalUp(TDB, TDA);
+}
+
+namespace __tysan {
+class Decorator : public __sanitizer::SanitizerCommonDecorator {
+public:
+  Decorator() : SanitizerCommonDecorator() {}
+  const char *Warning() { return Red(); }
+  const char *Name() { return Green(); }
+  const char *End() { return Default(); }
+};
+} // namespace __tysan
+
+ALWAYS_INLINE
+static void reportError(void *Addr, int Size, tysan_type_descriptor *TD,
+                        tysan_type_descriptor *OldTD, const char *AccessStr,
+                        const char *DescStr, int Offset, uptr pc, uptr bp,
+                        uptr sp) {
+  Decorator d;
+  Printf("%s", d.Warning());
+  Report("ERROR: TypeSanitizer: type-aliasing-violation on address %p"
+         " (pc %p bp %p sp %p tid %llu)\n",
+         Addr, (void *)pc, (void *)bp, (void *)sp, GetTid());
+  Printf("%s", d.End());
+  Printf("%s of size %d at %p with type ", AccessStr, Size, Addr);
+
+  Printf("%s", d.Name());
+  printTDName(TD);
+  Printf("%s", d.End());
+
+  Printf(" %s of type ", DescStr);
+
+  Printf("%s", d.Name());
+  printTDName(OldTD);
+  Printf("%s", d.End());
+
+  if (Offset != 0)
+    Printf(" that starts at offset %d\n", Offset);
+  else
+    Printf("\n");
+
+  if (pc) {
+
+    bool request_fast = StackTrace::WillUseFastUnwind(true);
+    BufferedStackTrace ST;
+    ST.Unwind(kStackTraceMax, pc, bp, 0, 0, 0, request_fast);
+    ST.Print();
+  } else {
+    Printf("\n");
+  }
+}
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+__tysan_check(void *addr, int size, tysan_type_descriptor *td, int flags) {
+  GET_CALLER_PC_BP_SP;
+
+  bool IsRead = flags & 1;
+  bool IsWrite = flags & 2;
+  const char *AccessStr;
+  if (IsRead && !IsWrite)
+    AccessStr = "READ";
+  else if (!IsRead && IsWrite)
+    AccessStr = "WRITE";
+  else
+    AccessStr = "ATOMIC UPDATE";
+
+  tysan_type_descriptor **OldTDPtr = shadow_for(addr);
+  tysan_type_descriptor *OldTD = *OldTDPtr;
+  if (((sptr)OldTD) < 0) {
+    int i = -((sptr)OldTD);
+    OldTDPtr -= i;
+    OldTD = *OldTDPtr;
+
+    if (!isAliasingLegal(td, OldTD))
+      reportError(addr, size, td, OldTD, AccessStr,
+                  "accesses part of an existing object", -i, pc, bp, sp);
+
+    return;
+  }
+
+  if (!isAliasingLegal(td, OldTD)) {
+    reportError(addr, size, td, OldTD, AccessStr, "accesses an existing object",
+                0, pc, bp, sp);
+    return;
+  }
+
+  // These types are allowed to alias (or the stored type is unknown), report
+  // an error if we find an interior type.
+
+  for (int i = 0; i < size; ++i) {
+    OldTDPtr = shadow_for((void *)(((uptr)addr) + i));
+    OldTD = *OldTDPtr;
+    if (((sptr)OldTD) >= 0 && !isAliasingLegal(td, OldTD))
+      reportError(addr, size, td, OldTD, AccessStr,
+                  "partially accesses an object", i, pc, bp, sp);
+  }
+}
+
+Flags __tysan::flags_data;
+
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_shadow_memory_address;
+SANITIZER_INTERFACE_ATTRIBUTE uptr __tysan_app_memory_mask;
+
+#ifdef TYSAN_RUNTIME_VMA
+// Runtime detected VMA size.
+int __tysan::vmaSize;
+#endif
+
+void Flags::SetDefaults() {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description) Name = DefaultValue;
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void RegisterTySanFlags(FlagParser *parser, Flags *f) {
+#define TYSAN_FLAG(Type, Name, DefaultValue, Description)                      \
+  RegisterFlag(parser, #Name, Description, &f->Name);
+#include "tysan_flags.inc"
+#undef TYSAN_FLAG
+}
+
+static void InitializeFlags() {
+  SetCommonFlagsDefaults();
+  {
+    CommonFlags cf;
+    cf.CopyFrom(*common_flags());
+    cf.external_symbolizer_path = GetEnv("TYSAN_SYMBOLIZER_PATH");
+    OverrideCommonFlags(cf);
+  }
+
+  flags().SetDefaults();
+
+  FlagParser parser;
+  RegisterCommonFlags(&parser);
+  RegisterTySanFlags(&parser, &flags());
+  parser.ParseString(GetEnv("TYSAN_OPTIONS"));
+  InitializeCommonFlags();
+  if (Verbosity())
+    ReportUnrecognizedFlags();
+  if (common_flags()->help)
+    parser.PrintFlagDescriptions();
+}
+
+static void TySanInitializePlatformEarly() {
+  AvoidCVE_2016_2143();
+#ifdef TYSAN_RUNTIME_VMA
+  vmaSize = (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1);
+#if defined(__aarch64__) && !SANITIZER_APPLE
+  if (vmaSize != 39 && vmaSize != 42 && vmaSize != 48) {
+    Printf("FATAL: TypeSanitizer: unsupported VMA range\n");
+    Printf("FATAL: Found %d - Supported 39, 42 and 48\n", vmaSize);
+    Die();
+  }
+#endif
+#endif
+
+  __sanitizer::InitializePlatformEarly();
+
+  __tysan_shadow_memory_address = ShadowAddr();
+  __tysan_app_memory_mask = AppMask();
+}
+
+namespace __tysan {
+bool tysan_inited = false;
+bool tysan_init_is_running;
+} // namespace __tysan
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __tysan_init() {
+  CHECK(!tysan_init_is_running);
+  if (tysan_inited)
+    return;
+  tysan_init_is_running = true;
+
+  InitializeFlags();
+  TySanInitializePlatformEarly();
+
+  InitializeInterceptors();
+
+  if (!MmapFixedNoReserve(ShadowAddr(), AppAddr() - ShadowAddr()))
+    Die();
+
+  tysan_init_is_running = false;
+  tysan_inited = true;
+}
+
+#if SANITIZER_CAN_USE_PREINIT_ARRAY
+__attribute__((section(".preinit_array"),
+               used)) static void (*tysan_init_ptr)() = __tysan_init;
+#endif
diff --git a/compiler-rt/lib/tysan/tysan.h b/compiler-rt/lib/tysan/tysan.h
new file mode 100644
index 00000000000000..ec6f9587e9ce58
--- /dev/null
+++ b/compiler-rt/lib/tysan/tysan.h
@@ -0,0 +1,79 @@
+//===-- tysan.h -------------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of TypeSanitizer.
+//
+// Private TySan header.
+//===----------------------------------------------------------------------===//
+
+#ifndef TYSAN_H
+#define TYSAN_H
+
+#include "sanitizer_common/sanitizer_internal_defs.h"
+
+using __sanitizer::sptr;
+using __sanitizer::u16;
+using __sanitizer::uptr;
+
+#include "tysan_platform.h"
+
+extern "C" {
+void tysan_set_type_unknown(const void *addr, uptr size);
+void tysan_copy_types(const void *daddr, const void *saddr, uptr size);
+}
+
+namespace __tysan {
+extern bool tysan_inited;
+extern bool tysan_init_is_running;
+
+void InitializeInterceptors();
+
+enum { TYSAN_MEMBER_TD = 1, TYSAN_STRUCT_TD = 2 };
+
+struct tysan_member_type_descriptor {
+  struct tysan_type_descriptor *Base;
+  struct tysan_type_descriptor *Access;
+  uptr Offset;
+};
+
+struct tysan_struct_type_descriptor {
+  uptr MemberCount;
+  struct {
+    struct tysan_type_descriptor *Type;
+    uptr Offset;
+  } Members[1]; // Tail allocated.
+  // char Name[]; // Tail allocated.
+};
+
+struct tysan_type_descriptor {
+  uptr Tag;
+  union {
+    tysan_member_type_descriptor Member;
+    tysan_struct_type_descriptor Struct;
+  };
+};
+
+inline tysan_type_descriptor **shadow_for(const void *ptr) {
+  return (tysan_type_descriptor **)((((uptr)ptr) & AppMask()) * sizeof(ptr) +
+                                    ShadowAddr());
+}
+
+struct Flags {
+#define TYSAN_FLAG(Type, Name, Defaul...
[truncated]

Copy link

github-actions bot commented Dec 22, 2023

✅ With the latest revision this PR passed the Python code formatter.

@@ -720,7 +726,7 @@ if(COMPILER_RT_SUPPORTED_ARCH)
endif()
message(STATUS "Compiler-RT supported architectures: ${COMPILER_RT_SUPPORTED_ARCH}")

set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;tysan,safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan;asan_abi)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ tysan;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated!

@fhahn fhahn requested a review from Endilll as a code owner April 19, 2024 11:27
@fhahn fhahn changed the base branch from users/fhahn/main.tysan-a-type-sanitizer-runtime-library to users/fhahn/tysan-a-type-sanitizer-clang April 19, 2024 11:29
@Endilll Endilll removed their request for review April 19, 2024 11:54
@llvmbot llvmbot added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Apr 22, 2024
@fhahn
Copy link
Contributor Author

fhahn commented Apr 22, 2024

Added compiler-rt tests for various strict-aliasing violations from the bug tracker I found.

@fhahn fhahn force-pushed the users/fhahn/tysan-a-type-sanitizer-clang branch from ee7ed21 to fb7da09 Compare June 27, 2024 16:09
@fhahn fhahn force-pushed the users/fhahn/tysan-a-type-sanitizer-runtime-library branch from e2caf4e to 733b3ed Compare June 27, 2024 16:10
@tavianator
Copy link
Contributor

I also needed

diff --git a/compiler-rt/cmake/config-ix.cmake b/compiler-rt/cmake/config-ix.cmake
index ab13d8c03683..f480083231a1 100644
--- a/compiler-rt/cmake/config-ix.cmake
+++ b/compiler-rt/cmake/config-ix.cmake
@@ -677,6 +677,7 @@ else()
   filter_available_targets(PROFILE_SUPPORTED_ARCH ${ALL_PROFILE_SUPPORTED_ARCH})
   filter_available_targets(CTX_PROFILE_SUPPORTED_ARCH ${ALL_CTX_PROFILE_SUPPORTED_ARCH})
   filter_available_targets(TSAN_SUPPORTED_ARCH ${ALL_TSAN_SUPPORTED_ARCH})
+  filter_available_targets(TYSAN_SUPPORTED_ARCH ${ALL_TYSAN_SUPPORTED_ARCH})
   filter_available_targets(UBSAN_SUPPORTED_ARCH ${ALL_UBSAN_SUPPORTED_ARCH})
   filter_available_targets(SAFESTACK_SUPPORTED_ARCH
     ${ALL_SAFESTACK_SUPPORTED_ARCH})

to build the runtime on LInux.

@fhahn
Copy link
Contributor Author

fhahn commented Nov 22, 2024

The latest version also includes a new test case compiler-rt/test/tysan/constexpr-subobject.cpp which had a false positive (IIUC the code should have no strict aliasing violation) + a fix

@fhahn
Copy link
Contributor Author

fhahn commented Dec 12, 2024

The Clang and LLVM parts are now ready to land, with just the compiler-rt bits pending approval.

There are a few test failures on Linux which don't fail on macOS, will look at them today probably

Copy link

github-actions bot commented Dec 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn
Copy link
Contributor Author

fhahn commented Dec 12, 2024

Test failures on linux should be fixed now, they were due to debug paths using full paths on Linux and in one case a C++ struct type wasnt' de-mangled on Linux

@fhahn
Copy link
Contributor Author

fhahn commented Dec 16, 2024

ping :)


// Clang generates tags for C++ types that demangle as typeinfo. Remove the
// prefix from the generated string.
const char TIPrefix[] = "typeinfo name for ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we make this a const char*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done together with using strlen thanks


const char *DName = Symbolizer::GetOrInit()->Demangle(Name);
if (!internal_strncmp(DName, TIPrefix, sizeof(TIPrefix) - 1))
DName += sizeof(TIPrefix) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strlen? the compiler should be able to optimize that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

"-mno-omit-leaf-frame-pointer",
"-fno-omit-frame-pointer",
"-fno-optimize-sibling-calls"] +
[config.target_cflags] +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong, target_cflags sounds like a list

} else if (TD->Tag == TYSAN_MEMBER_TD) {
TD = TD->Member.Access;
} else {
DCHECK(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCHECK(false && "invalid enum value")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

import shlex

sh_quote = shlex.quote
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except ImportError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks


def get_required_attr(config, attr_name):
attr_value = getattr(config, attr_name, None)
if attr_value == None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to check if attr_value and return value early.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirming it is expected that empty string, or false, or something like that will now fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted to return if not None, thanks

fhahn added a commit that referenced this pull request Dec 17, 2024
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(#76260) and compiler-rt
changes (#76261)

PR: #76259
fhahn added a commit that referenced this pull request Dec 17, 2024
This patch introduces the Clang components of type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

The Clang changes are mostly formulaic, the one specific change being
that when the TBAA sanitizer is enabled, TBAA is always generated, even
at -O0.

It goes together with the corresponding LLVM changes
(#76259) and compiler-rt
changes (#76261)

PR: #76260
Base automatically changed from users/fhahn/tysan-a-type-sanitizer-clang to main December 17, 2024 15:13
@fhahn fhahn merged commit 641fbf1 into main Dec 17, 2024
6 of 7 checks passed
@fhahn fhahn deleted the users/fhahn/tysan-a-type-sanitizer-runtime-library branch December 17, 2024 18:49
@fhahn fhahn restored the users/fhahn/tysan-a-type-sanitizer-runtime-library branch December 17, 2024 18:53
@fhahn fhahn deleted the users/fhahn/tysan-a-type-sanitizer-runtime-library branch December 17, 2024 18:56
@kazutakahirata
Copy link
Contributor

@fhahn I'm getting:

compiler-rt/lib/tysan/../sanitizer_common/sanitizer_platform_limits_posix.h:604:3: error: anonymous structs are a GNU extension [-Werror,-Wgnu-anonymous-struct]
  struct {
  ^
1 error generated.

while compiling tyscan.cpp. Is there any way you could take a look? Thanks!

@fhahn
Copy link
Contributor Author

fhahn commented Dec 17, 2024

AFAICT this is happening at each place that includes the file, e.g. also if you build AddressSanitizer. Not sure what the best way forward here is as this is not related to TypeSanitizer, but seems to impact all sanitizers?

@kazutakahirata
Copy link
Contributor

AFAICT this is happening at each place that includes the file, e.g. also if you build AddressSanitizer. Not sure what the best way forward here is as this is not related to TypeSanitizer, but seems to impact all sanitizers?

I just posted #120314.

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are no release notes or manual b/c this is not ready to be used by general users on the command line and that will come via another PR?


char *mem = (char *)malloc(4 * sizeof(short));
memset(mem, 0, 4 * sizeof(short));
*(short *)(mem + 2) = x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I am reading the diagnostics correctly it looks like writes to malloc'ed memory does implicit object creation and subsequent read/write will be flagged based on that?

Some more explanatory comments in this test like you have in other tests would be super helpful.

u.i = 42;
u.s = 1;

printf("%d\n", u.i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ this would be a aliasing violation, can we catch that in C++?

@fhahn
Copy link
Contributor Author

fhahn commented Dec 18, 2024

I guess there are no release notes or manual b/c this is not ready to be used by general users on the command line and that will come via another PR?

I added a release note as part of the Clang patch to the Sanitizers section: https://clang.llvm.org/docs/ReleaseNotes.html#sanitizers

Are there sanitizer-specific release notes where this should also be added?

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 9, 2025
This patch introduces the runtime components for type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32197.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

This includes build fixes for Linux from Mingjie Xu.

Depends on #76260 (Clang support), #76259 (LLVM support)

PR: llvm/llvm-project#76261
@seanm
Copy link

seanm commented Jan 9, 2025

@fhahn can TySan be made to trap when it detects a problem? I tried -fsanitize-trap=type but got:

clang++: error: unsupported argument 'type' to option '-fsanitize-trap='

-fsanitize-trap=all does not seem to result in TySan trapping, only logging...

@Enna1
Copy link
Contributor

Enna1 commented Jan 10, 2025

@fhahn can TySan be made to trap when it detects a problem? I tried -fsanitize-trap=type but got:

clang++: error: unsupported argument 'type' to option '-fsanitize-trap='

-fsanitize-trap=all does not seem to result in TySan trapping, only logging...

-fsanitize-trap only supports UBSan and CFI, see https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/SanitizerArgs.cpp#L66.

If we want TySan to stop and abort when it detects first type-based aliasing violation, I would suggest implement halt_on_error support for TySan and default halt_on_error to false. For reference, TSan defaults halt_on_error to false, see https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/tsan/rtl/tsan_flags.inc#L45.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:sanitizer compiler-rt TBAA Type-Based Alias Analysis / Strict Aliasing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants