Skip to content

[TySan] Fix struct access with different bases #120412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gbMattN
Copy link
Contributor

@gbMattN gbMattN commented Dec 18, 2024

Original pull request here
Fixes issue #105960

If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct.

The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows.

To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.

@llvmbot
Copy link
Member

llvmbot commented Dec 18, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (gbMattN)

Changes

Original pull request here
Fixes issue #105960

If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct.

The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows.

To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.


Full diff: https://github.com/llvm/llvm-project/pull/120412.diff

2 Files Affected:

  • (modified) compiler-rt/lib/tysan/tysan.cpp (+8)
  • (added) compiler-rt/test/tysan/struct-offset-different-base.cpp (+31)
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index 39d78e7c95e0cd..94498287f17d19 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -131,6 +131,14 @@ static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
           break;
       }
 
+      // This offset can't be negative. Therefore we must be accessing something
+      // partially inside the last type
+      // We shouldn't check this if we are on the first member, Idx will
+      // underflow The first member can be offset in rare cases such as
+      // llvm::cl::Option
+      if (TDA->Struct.Members[Idx].Offset > OffsetA && Idx > 0)
+        Idx -= 1;
+
       OffsetA -= TDA->Struct.Members[Idx].Offset;
       TDA = TDA->Struct.Members[Idx].Type;
     } else {
diff --git a/compiler-rt/test/tysan/struct-offset-different-base.cpp b/compiler-rt/test/tysan/struct-offset-different-base.cpp
new file mode 100644
index 00000000000000..da0efd2cb6503c
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-offset-different-base.cpp
@@ -0,0 +1,31 @@
+// RUN: %clangxx_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s --implicit-check-not ERROR < %t.out
+
+// Modified reproducer from https://github.com/llvm/llvm-project/issues/105960
+
+#include <stdio.h>
+
+struct inner {
+  char buffer;
+  int i;
+};
+
+void init_inner(inner *iPtr) { iPtr->i = 200; }
+
+struct outer {
+  inner foo;
+  char buffer;
+};
+
+int main(void) {
+  outer *l = new outer();
+
+  init_inner(&l->foo);
+
+  int access_offsets_with_different_base = l->foo.i;
+  printf("Accessed value is %d\n", access_offsets_with_different_base);
+
+  return 0;
+}
+
+// CHECK: Accessed value is 200

@gbMattN
Copy link
Contributor Author

gbMattN commented Dec 18, 2024

@fhahn ready for review 👍

// This offset can't be negative. Therefore we must be accessing something
// partially inside the last type
// We shouldn't check this if we are on the first member, Idx will
// underflow The first member can be offset in rare cases such as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// underflow The first member can be offset in rare cases such as
// underflow. The first member can be offset in rare cases such as

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it Idx that underflows or OffsetA? Idx will only get incremented AFAICT. Would this be cleared if combined with the search loop above, maybe with a flag indicating that we took the exit due to the offset of the member begin larger than OffsetA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idx would underflow because the next line subtracts one from it. This is so we check the previous member, which we are partially inside. We could merge it into the above loop like this if you think its better?

for (; Idx < TDA->Struct.MemberCount - 1; ++Idx) {
        auto MemberOffset = TDA->Struct.Members[Idx].Offset;
        if (MemberOffset >= OffsetA){
          // Explanitory comment
          if (MemberOffset > OffsetA && Idx)
            Idx -= 1;
          break;
        }
      }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with that we are missing the case where we are accessing the last field (I think the only case covered by the test), so we should probably stick with the current check.

It would be good to add a test cases where we need to adjust Idx for other indices.

(also comments still need adjusting/reflowing

// partially inside the last type
// We shouldn't check this if we are on the first member, Idx will
// underflow The first member can be offset in rare cases such as
// llvm::cl::Option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// llvm::cl::Option
// llvm::cl::Option.

@seanm
Copy link

seanm commented Jan 10, 2025

@gbMattN This indeed sounds like some of the mystery issues TySan is flagging to me. Structs within structs are involved. Could you rebase this against latest? Then I'll check out and try this patch.

Copy link

github-actions bot commented Jan 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch from e8b5ab2 to a7f05f6 Compare January 10, 2025 16:13
@seanm
Copy link

seanm commented Jan 10, 2025

@gbMattN thanks! I've got the patch locally and built a working clang. Running the test suite of a project I'm testing, with your patch, I went from 827 occurrences of "ERROR: TypeSanitizer" down to... 827. :( So I guess my issue is something else...

Comment on lines 134 to 138
// This offset can't be negative. Therefore we must be accessing something
// partially inside the last type
// We shouldn't check this if we are on the first member, Idx will
// underflow, The first member can be offset in rare cases such as
// llvm::cl::Option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment still needs fixing as mentioned earlier.

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch from 043ebd5 to a42ab69 Compare January 13, 2025 12:35
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Left some more suggestions for the comments.

@gbMattN gbMattN force-pushed the users/nagym/fix-member-access-from-different-bases branch from 56ea390 to 6682db9 Compare January 13, 2025 13:48
@goussepi goussepi merged commit 09a8b7c into llvm:main Jan 13, 2025
7 checks passed
kazutakahirata pushed a commit to kazutakahirata/llvm-project that referenced this pull request Jan 13, 2025
Original pull request
[here](llvm#108385)
Fixes issue llvm#105960

If a member in a struct is also a struct, accessing a member partway
through this inner struct currently causes a false positive. This is
because when checking aliasing, the access offset is seen as greater
than the starting offset of the inner struct, so the loop continues one
iteration, and believes we are accessing the member after the inner
struct.

The next member's offset is greater than the offset we are looking for,
so when we subtract the next member's offset from what we are looking
for, the offset underflows.

To fix this, we check if the member we think we are accessing has a
greater offset than the offset we are looking for. If so, we take a step
back. We cannot do this in the loop, since the loop does not check the
final member. This means the penultimate member would still cause false
positives.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants