-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[TySan] Fix struct access with different bases #108385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TySan] Fix struct access with different bases #108385
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-compiler-rt-sanitizer Author: None (gbMattN) ChangesFixes issue #105960 If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct. The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows. To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives. Full diff: https://github.com/llvm/llvm-project/pull/108385.diff 2 Files Affected:
diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index f627851d049e6a..f2cb6faddf45ac 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -128,6 +128,10 @@ static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
break;
}
+ //You can't have negative offset, you must be partially inside the last type
+ if (TDA->Struct.Members[Idx].Offset > OffsetA)
+ Idx -=1;
+
OffsetA -= TDA->Struct.Members[Idx].Offset;
TDA = TDA->Struct.Members[Idx].Type;
} else {
diff --git a/compiler-rt/test/tysan/struct-offset-different-base.cpp b/compiler-rt/test/tysan/struct-offset-different-base.cpp
new file mode 100644
index 00000000000000..c1ef5f8669c280
--- /dev/null
+++ b/compiler-rt/test/tysan/struct-offset-different-base.cpp
@@ -0,0 +1,31 @@
+// RUN: %clangxx_tysan -O0 %s -o %t && %run %t >%t.out 2>&1
+// RUN: FileCheck %s < %t.out
+
+#include <stdio.h>
+
+struct inner {
+ char buffer;
+ int i;
+};
+
+void init_inner(inner *list) {
+ list->i = 0;
+}
+
+struct outer {
+ inner foo;
+ char buffer;
+};
+
+int main(void) {
+ outer *l = new outer();
+
+ init_inner(&l->foo);
+
+ int access_offsets_with_different_base = l->foo.i;
+ printf("%d\n", access_offsets_with_different_base);
+
+ return 0;
+}
+
+// CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation
|
2dffe46
to
d312bd9
Compare
(Manually pinging potential reviewers) @tavianator @fhahn |
91f560d
to
65f1e1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes my reduced testcase but not the unreduced one. I'll try to make a new reduction.
compiler-rt/lib/tysan/tysan.cpp
Outdated
Idx -=1; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idx -=1; | |
Idx -= 1; | |
Here's the new testcase. Not sure if this bug is related or not. It has to do with struct node {
struct node *next;
};
struct list {
struct node *head, **tail;
};
int main(void) {
struct list *list = __builtin_malloc(sizeof(*list));
list->head = 0;
list->tail = &list->head;
struct node *node = __builtin_malloc(sizeof(*node));
node->next = 0;
*list->tail = node;
list->tail = &node->next;
while (list->head) {
struct node *node = list->head;
// list->head = node->next;
__builtin_memcpy(&list->head, &node->next, sizeof(list->head));
node->next = 0;
}
return 0;
} tavianator@tachyon $ ~/code/llvm/llvm-project/build/bin/clang -Wall -g -fsanitize=type foo.c -o foo
tavianator@tachyon $ ./foo
==5885==ERROR: TypeSanitizer: type-aliasing-violation on address 0x55af02a8c2a0 (pc 0x55aef600fb36 bp 0x7ffcbf810cf0 sp 0x7ffcbf810c90 tid 5885)
READ of size 8 at 0x55af02a8c2a0 with type any pointer (in list at offset 0) accesses an existing object of type any pointer (in node at offset 0)
#0 0x55aef600fb35 in main /home/tavianator/code/bfs/foo.c:20:15
|
I guess the bug there is that the memcpy() interceptor literally copies the dynamic type from |
Documenting this here as its part of the same issue: the following reproducer can be made (see the pull request above)
If memcpy is replaced by the commented code, no error is detected. Both code runs the same checking function, but they are inserted at different places in the Transformation pass. This implies that the wrong checks are being inserted for memcpy calls. |
I have consulted with an expert in the strict aliasing rules and we came to the horrifying (to me) conclusion that TySan is actually correct in this case, at least according to the C standard. |
! Oh wow! ... Should the commented out line cause a type violation too? Or is everything in the case above (commented line fine, memcpy not) really the correct behaviour? |
No, But There are more details in this paper, for example: https://web.archive.org/web/20190219170809/https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.pdf |
65f1e1f
to
4bde5de
Compare
This avoids what might be a strict aliasing violation in some models. Link: llvm/llvm-project#108385 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor style and test feedbacks)
compiler-rt/lib/tysan/tysan.cpp
Outdated
@@ -128,6 +128,10 @@ static bool isAliasingLegalUp(tysan_type_descriptor *TDA, | |||
break; | |||
} | |||
|
|||
//You can't have negative offset, you must be partially inside the last type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space at start of comment; better to write the comment in a passive tone, "The offset can't be negative, thus we must be..."
return 0; | ||
} | ||
|
||
// CHECK-NOT: ERROR: TypeSanitizer: type-aliasing-violation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please also include a test for /something/ in the output to ensure that the test is positively checking for something. Otherwise we could replace %clangxx_tysan with any program that succeeds , such as true
, and this test would pass.
4bde5de
to
82e7bec
Compare
Thanks for putting up the patch! I just rebased the patches, hopefully we can get them in soon so it is easier to submit bug-fixes iteratively. I noticed that with this patch, I am seeing segfaults when running |
733b3ed
to
8e6e62d
Compare
d82da2a
to
6f79545
Compare
You can test this locally with the following command:git-clang-format --diff 8e6e62d0dee48a696afd0c7d53d74eaccef97b5e 9a5a609f917762dd864e1b261fc292286a9185a0 --extensions cpp -- compiler-rt/test/tysan/struct-offset-different-base.cpp compiler-rt/lib/tysan/tysan.cpp View the diff from clang-format here.diff --git a/compiler-rt/lib/tysan/tysan.cpp b/compiler-rt/lib/tysan/tysan.cpp
index c89692f6f2..cfa41f0e98 100644
--- a/compiler-rt/lib/tysan/tysan.cpp
+++ b/compiler-rt/lib/tysan/tysan.cpp
@@ -131,8 +131,9 @@ static bool isAliasingLegalUp(tysan_type_descriptor *TDA,
// This offset can't be negative. Therefore we must be accessing something
// partially inside the last type
- // We shouldn't check this if we are on the first member, Idx will underflow
- // The first member can be offset in rare cases such as llvm::cl::Option
+ // We shouldn't check this if we are on the first member, Idx will
+ // underflow The first member can be offset in rare cases such as
+ // llvm::cl::Option
if (TDA->Struct.Members[Idx].Offset > OffsetA && Idx > 0)
Idx -= 1;
|
2d434d0
to
3e24991
Compare
f953d3d
to
9a5a609
Compare
5c0e520
to
6f48a28
Compare
Sorry I didn't mean to close this PR, looks like it happened automatically once I deleted the |
Original pull request [here](#108385) Fixes issue #105960 If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct. The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows. To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.
Original pull request [here](llvm/llvm-project#108385) Fixes issue llvm/llvm-project#105960 If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct. The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows. To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.
Original pull request [here](llvm#108385) Fixes issue llvm#105960 If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct. The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows. To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.
Fixes issue #105960
If a member in a struct is also a struct, accessing a member partway through this inner struct currently causes a false positive. This is because when checking aliasing, the access offset is seen as greater than the starting offset of the inner struct, so the loop continues one iteration, and believes we are accessing the member after the inner struct.
The next member's offset is greater than the offset we are looking for, so when we subtract the next member's offset from what we are looking for, the offset underflows.
To fix this, we check if the member we think we are accessing has a greater offset than the offset we are looking for. If so, we take a step back. We cannot do this in the loop, since the loop does not check the final member. This means the penultimate member would still cause false positives.