Skip to content

[TBAA] Emit distinct TBAA tags for pointers with different depths,types. #76612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,10 @@ Non-comprehensive list of changes in this release
- Added support for ``TypeLoc::dump()`` for easier debugging, and improved
textual and JSON dumping for various ``TypeLoc``-related nodes.

- Clang can now emit distinct type-based alias analysis tags for incompatible
pointers, enabling more powerful alias analysis when accessing pointer types.
The new behavior can be enabled using ``-fpointer-tbaa``.

New Compiler Flags
------------------
- ``-fsanitize=implicit-bitfield-conversion`` checks implicit truncation and
Expand Down Expand Up @@ -467,6 +471,9 @@ New Compiler Flags
- For the ARM target, added ``-Warm-interrupt-vfp-clobber`` that will emit a
diagnostic when an interrupt handler is declared and VFP is enabled.

- ``-fpointer-tbaa`` enables emission of distinct type-based alias
analysis tags for incompatible pointers.

Deprecated Compiler Flags
-------------------------

Expand Down
1 change: 1 addition & 0 deletions clang/include/clang/Basic/CodeGenOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ ENUM_CODEGENOPT(StructReturnConvention, StructReturnConventionKind, 2, SRCK_Defa

CODEGENOPT(RelaxAll , 1, 0) ///< Relax all machine code instructions.
CODEGENOPT(RelaxedAliasing , 1, 0) ///< Set when -fno-strict-aliasing is enabled.
CODEGENOPT(PointerTBAA, 1, 0) ///< Whether or not to use distinct TBAA tags for pointers.
CODEGENOPT(StructPathTBAA , 1, 0) ///< Whether or not to use struct-path TBAA.
CODEGENOPT(NewStructPathTBAA , 1, 0) ///< Whether or not to use enhanced struct-path TBAA.
CODEGENOPT(SaveTempLabels , 1, 0) ///< Save temporary labels.
Expand Down
5 changes: 5 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -3379,6 +3379,7 @@ def fstruct_path_tbaa : Flag<["-"], "fstruct-path-tbaa">, Group<f_Group>;
def fno_struct_path_tbaa : Flag<["-"], "fno-struct-path-tbaa">, Group<f_Group>;
def fno_strict_enums : Flag<["-"], "fno-strict-enums">, Group<f_Group>;
def fno_strict_overflow : Flag<["-"], "fno-strict-overflow">, Group<f_Group>;
def fno_pointer_tbaa : Flag<["-"], "fno-pointer-tbaa">, Group<f_Group>;
def fno_temp_file : Flag<["-"], "fno-temp-file">, Group<f_Group>,
Visibility<[ClangOption, CC1Option, CLOption, DXCOption]>, HelpText<
"Directly create compilation output files. This may lead to incorrect incremental builds if the compiler crashes">,
Expand Down Expand Up @@ -3883,6 +3884,7 @@ defm strict_vtable_pointers : BoolFOption<"strict-vtable-pointers",
" overwriting polymorphic C++ objects">,
NegFlag<SetFalse>>;
def fstrict_overflow : Flag<["-"], "fstrict-overflow">, Group<f_Group>;
def fpointer_tbaa : Flag<["-"], "fpointer-tbaa">, Group<f_Group>;
def fdriver_only : Flag<["-"], "fdriver-only">, Flags<[NoXarchOption]>,
Visibility<[ClangOption, CLOption, DXCOption]>,
Group<Action_Group>, HelpText<"Only run the driver.">;
Expand Down Expand Up @@ -7146,6 +7148,9 @@ def fuse_register_sized_bitfield_access: Flag<["-"], "fuse-register-sized-bitfie
def relaxed_aliasing : Flag<["-"], "relaxed-aliasing">,
HelpText<"Turn off Type Based Alias Analysis">,
MarshallingInfoFlag<CodeGenOpts<"RelaxedAliasing">>;
def pointer_tbaa: Flag<["-"], "pointer-tbaa">,
HelpText<"Turn on Type Based Alias Analysis for pointer accesses">,
MarshallingInfoFlag<CodeGenOpts<"PointerTBAA">>;
def no_struct_path_tbaa : Flag<["-"], "no-struct-path-tbaa">,
HelpText<"Turn off struct-path aware Type Based Alias Analysis">,
MarshallingInfoNegativeFlag<CodeGenOpts<"StructPathTBAA">>;
Expand Down
54 changes: 50 additions & 4 deletions clang/lib/CodeGen/CodeGenTBAA.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -185,10 +185,56 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
return getChar();

// Handle pointers and references.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth putting standard citations here:

Suggested change
// Handle pointers and references.
// Handle pointers and references.
//
// C has a very strict rule for pointer aliasing. C23 6.7.6.1p2:
// For two pointer types to be compatible, both shall be identically
// qualified and both shall be pointers to compatible types.
//
// This rule is impractically strict; we want to at least ignore CVR
// qualifiers. Distinguishing by CVR qualifiers would make it UB to
// e.g. cast a `char **` to `const char * const *` and dereference it,
// which is too common and useful to invalidate. C++'s similar types
// rule permits qualifier differences in these nested positions; in fact,
// C++ even allows that cast as an implicit conversion.
//
// Other qualifiers could theoretically be distinguished, especially if
// they involve a significant representation difference. We don't
// currently do so, however.
//
// Computing the pointee type string recursively is implicitly more
// forgiving than the standards require. Effectively, we are turning
// the question "are these types compatible/similar" into "are
// accesses to these types allowed to alias". In both C and C++,
// the latter question has special carve-outs for signedness
// mismatches that only apply at the top level. As a result, we are
// allowing e.g. `int *` l-values to access `unsigned *` objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thank you very much!

// TODO: Implement C++'s type "similarity" and consider dis-"similar"
// pointers distinct.
if (Ty->isPointerType() || Ty->isReferenceType())
return createScalarTypeNode("any pointer", getChar(), Size);
//
// C has a very strict rule for pointer aliasing. C23 6.7.6.1p2:
// For two pointer types to be compatible, both shall be identically
// qualified and both shall be pointers to compatible types.
//
// This rule is impractically strict; we want to at least ignore CVR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this for the default behavior, but I think we should still allow opting into the strict C conformance behavior so that users can sanitize the code for portability to stricter compilers (also, it may allow for different optimization impacts that users might care about... maybe). This can be follow-up work though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

// qualifiers. Distinguishing by CVR qualifiers would make it UB to
// e.g. cast a `char **` to `const char * const *` and dereference it,
// which is too common and useful to invalidate. C++'s similar types
// rule permits qualifier differences in these nested positions; in fact,
// C++ even allows that cast as an implicit conversion.
//
// Other qualifiers could theoretically be distinguished, especially if
// they involve a significant representation difference. We don't
// currently do so, however.
//
// Computing the pointee type string recursively is implicitly more
// forgiving than the standards require. Effectively, we are turning
// the question "are these types compatible/similar" into "are
// accesses to these types allowed to alias". In both C and C++,
// the latter question has special carve-outs for signedness
// mismatches that only apply at the top level. As a result, we are
// allowing e.g. `int *` l-values to access `unsigned *` objects.
if (Ty->isPointerType() || Ty->isReferenceType()) {
llvm::MDNode *AnyPtr = createScalarTypeNode("any pointer", getChar(), Size);
if (!CodeGenOpts.PointerTBAA)
return AnyPtr;
// Compute the depth of the pointer and generate a tag of the form "p<depth>
// <base type tag>".
unsigned PtrDepth = 0;
do {
PtrDepth++;
Ty = Ty->getPointeeType().getTypePtr();
} while (Ty->isPointerType());
// TODO: Implement C++'s type "similarity" and consider dis-"similar"
// pointers distinct for non-builtin types.
if (isa<BuiltinType>(Ty)) {
llvm::MDNode *ScalarMD = getTypeInfoHelper(Ty);
StringRef Name =
cast<llvm::MDString>(
ScalarMD->getOperand(CodeGenOpts.NewStructPathTBAA ? 2 : 0))
->getString();
SmallString<256> OutName("p");
OutName += std::to_string(PtrDepth);
OutName += " ";
OutName += Name;
return createScalarTypeNode(OutName, AnyPtr, Size);
}
return AnyPtr;
}

// Accesses to arrays are accesses to objects of their element types.
if (CodeGenOpts.NewStructPathTBAA && Ty->isArrayType())
Expand Down
3 changes: 3 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5721,6 +5721,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (!Args.hasFlag(options::OPT_fstrict_aliasing, StrictAliasingAliasOption,
options::OPT_fno_strict_aliasing, !IsWindowsMSVC))
CmdArgs.push_back("-relaxed-aliasing");
if (Args.hasFlag(options::OPT_fpointer_tbaa, options::OPT_fno_pointer_tbaa,
false))
CmdArgs.push_back("-pointer-tbaa");
if (!Args.hasFlag(options::OPT_fstruct_path_tbaa,
options::OPT_fno_struct_path_tbaa, true))
CmdArgs.push_back("-no-struct-path-tbaa");
Expand Down
46 changes: 16 additions & 30 deletions clang/test/CodeGen/sanitize-metadata-nosanitize.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
//.
// CHECK: Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(write, argmem: none, inaccessiblemem: none)
// CHECK-LABEL: define dso_local void @escape
// CHECK-SAME: (ptr noundef [[P:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] !pcsections !2 {
// CHECK-SAME: (ptr noundef [[P:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] !pcsections [[META2:![0-9]+]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: ret void
//
Expand All @@ -23,13 +23,13 @@ __attribute__((noinline, not_tail_called)) void escape(const volatile void *p) {

// CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none)
// CHECK-LABEL: define dso_local i32 @normal_function
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] !pcsections !4 {
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] !pcsections [[META4:![0-9]+]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: store ptr [[X]], ptr [[X_ADDR]], align 8, !tbaa [[TBAA6:![0-9]+]]
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections !10
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections [[META11:![0-9]+]]
// CHECK-NEXT: notail call void @escape(ptr noundef nonnull [[X_ADDR]])
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA11:![0-9]+]]
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA12:![0-9]+]]
// CHECK-NEXT: ret i32 [[TMP0]]
//
int normal_function(int *x, int *y) {
Expand All @@ -46,7 +46,7 @@ int normal_function(int *x, int *y) {
// CHECK-NEXT: store ptr [[X]], ptr [[X_ADDR]], align 8, !tbaa [[TBAA6]]
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4
// CHECK-NEXT: notail call void @escape(ptr noundef nonnull [[X_ADDR]])
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA11]]
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA12]]
// CHECK-NEXT: ret i32 [[TMP0]]
//
__attribute__((disable_sanitizer_instrumentation)) int test_disable_sanitize_instrumentation(int *x, int *y) {
Expand All @@ -57,13 +57,13 @@ __attribute__((disable_sanitizer_instrumentation)) int test_disable_sanitize_ins

// CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none)
// CHECK-LABEL: define dso_local i32 @test_no_sanitize_thread
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR3:[0-9]+]] !pcsections !13 {
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR3:[0-9]+]] !pcsections [[META14:![0-9]+]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: store ptr [[X]], ptr [[X_ADDR]], align 8, !tbaa [[TBAA6]]
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections !10
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections [[META11]]
// CHECK-NEXT: notail call void @escape(ptr noundef nonnull [[X_ADDR]])
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA11]]
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA12]]
// CHECK-NEXT: ret i32 [[TMP0]]
//
__attribute__((no_sanitize("thread"))) int test_no_sanitize_thread(int *x, int *y) {
Expand All @@ -74,13 +74,13 @@ __attribute__((no_sanitize("thread"))) int test_no_sanitize_thread(int *x, int *

// CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none)
// CHECK-LABEL: define dso_local i32 @test_no_sanitize_all
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR3]] !pcsections !13 {
// CHECK-SAME: (ptr noundef [[X:%.*]], ptr nocapture noundef readonly [[Y:%.*]]) local_unnamed_addr #[[ATTR3]] !pcsections [[META14]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: store ptr [[X]], ptr [[X_ADDR]], align 8, !tbaa [[TBAA6]]
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections !10
// CHECK-NEXT: store atomic i32 1, ptr [[X]] monotonic, align 4, !pcsections [[META11]]
// CHECK-NEXT: notail call void @escape(ptr noundef nonnull [[X_ADDR]])
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA11]]
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[Y]], align 4, !tbaa [[TBAA12]]
// CHECK-NEXT: ret i32 [[TMP0]]
//
__attribute__((no_sanitize("all"))) int test_no_sanitize_all(int *x, int *y) {
Expand All @@ -89,23 +89,9 @@ __attribute__((no_sanitize("all"))) int test_no_sanitize_all(int *x, int *y) {
return *y;
}
//.
// CHECK: attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(write, argmem: none, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #1 = { mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #2 = { disable_sanitizer_instrumentation mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #3 = { mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "no_sanitize_thread" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #4 = { nounwind "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.
// CHECK: !2 = !{!"sanmd_covered2!C", !3}
// CHECK: !3 = !{i64 0}
// CHECK: !4 = !{!"sanmd_covered2!C", !5}
// CHECK: !5 = !{i64 3}
// CHECK: !6 = !{!7, !7, i64 0}
// CHECK: !7 = !{!"any pointer", !8, i64 0}
// CHECK: !8 = !{!"omnipotent char", !9, i64 0}
// CHECK: !9 = !{!"Simple C/C++ TBAA"}
// CHECK: !10 = !{!"sanmd_atomics2!C"}
// CHECK: !11 = !{!12, !12, i64 0}
// CHECK: !12 = !{!"int", !8, i64 0}
// CHECK: !13 = !{!"sanmd_covered2!C", !14}
// CHECK: !14 = !{i64 2}
// CHECK: attributes #[[ATTR0]] = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(write, argmem: none, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #[[ATTR1]] = { mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #[[ATTR2]] = { disable_sanitizer_instrumentation mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #[[ATTR3]] = { mustprogress nofree norecurse nounwind willreturn memory(write, argmem: readwrite, inaccessiblemem: none) "min-legal-vector-width"="0" "no-trapping-math"="true" "no_sanitize_thread" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
// CHECK: attributes #[[ATTR4:[0-9]+]] = { nounwind "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.
Loading
Loading