-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[TBAA] Don't emit pointer-tbaa for void pointers. #122116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Florian Hahn (fhahn) ChangesWhile there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:
https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Full diff: https://github.com/llvm/llvm-project/pull/122116.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 75e66bae79afdc..3f1a24791ddd81 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -226,6 +226,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
PtrDepth++;
Ty = Ty->getPointeeType()->getBaseElementTypeUnsafe();
} while (Ty->isPointerType());
+
+ // While there are no special rules in the standards regarding void pointers
+ // and strict aliasing, emitting distinct tags for void pointers break some
+ // common idioms and there is no good alternative to re-write the code
+ // without strict-aliasing violations.
+ if (Ty->isVoidType())
+ return AnyPtr;
+
assert(!isa<VariableArrayType>(Ty));
// When the underlying type is a builtin type, we compute the pointee type
// string recursively, which is implicitly more forgiving than the standards
diff --git a/clang/test/CodeGen/tbaa-pointers.c b/clang/test/CodeGen/tbaa-pointers.c
index 4aae2552f107a3..48adac503357f0 100644
--- a/clang/test/CodeGen/tbaa-pointers.c
+++ b/clang/test/CodeGen/tbaa-pointers.c
@@ -208,12 +208,9 @@ int void_ptrs(void **ptr) {
// COMMON-LABEL: define i32 @void_ptrs(
// COMMON-SAME: ptr noundef [[PTRA:%.+]])
// COMMON: [[PTR_ADDR:%.+]] = alloca ptr, align 8
-// DISABLE-NEXT: store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
-// DISABLE-NEXT: [[L0:%.+]] = load ptr, ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
-// DISABLE-NEXT: [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa [[ANYPTR]]
-// DEFAULT-NEXT: store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[P2VOID:!.+]]
-// DEFAULT-NEXT: [[L0:%.+]] = load ptr, ptr [[PTR_ADDR]], align 8, !tbaa [[P2VOID]]
-// DEFAULT-NEXT: [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa [[P1VOID:!.+]]
+// COMMON-NEXT: store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
+// COMMON-NEXT: [[L0:%.+]] = load ptr, ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
+// COMMON-NEXT: [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa [[ANYPTR]]
// COMMON-NEXT: [[BOOL:%.+]] = icmp ne ptr [[L1]], null
// COMMON-NEXT: [[BOOL_EXT:%.+]] = zext i1 [[BOOL]] to i64
// COMMON-NEXT: [[COND:%.+]] = select i1 [[BOOL]], i32 0, i32 1
@@ -254,7 +251,3 @@ int void_ptrs(void **ptr) {
// COMMON: [[INT_TAG]] = !{[[INT_TY:!.+]], [[INT_TY]], i64 0}
// COMMON: [[INT_TY]] = !{!"int", [[CHAR]], i64 0}
// DEFAULT: [[ANYPTR]] = !{[[ANY_POINTER]], [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P2VOID]] = !{[[P2VOID_TY:!.+]], [[P2VOID_TY]], i64 0}
-// DEFAULT: [[P2VOID_TY]] = !{!"p2 void", [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P1VOID]] = !{[[P1VOID_TY:!.+]], [[P1VOID_TY]], i64 0}
-// DEFAULT: [[P1VOID_TY]] = !{!"p1 void", [[ANY_POINTER]], i64 0}
|
Note GCC already treats
|
Note GCC's testcase testsuite/gcc.dg/alias-14.c is testcase for this extension: "as an extension we consider void * universal. Writes to it should alias." |
@pinskia thanks for sharing the GCC context! |
0144b06
to
471ed86
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
I agree that allowing @pinskia, does GCC apply this recursively — e.g. are |
What would be a good place to document this? |
It looks like GCC is handling this as recusive in that a store to a That is:
The conditional will not be optimized away. |
Same behavior in Clang with this patch |
471ed86
to
3dcf901
Compare
Okay, so if the ultimate pointee type is
Hmm, I was hoping that we would have a section in the manual already about aliasing. That seems like a good thing to add, especially now that we've got a small family of options controlling it. Are you interested in working on that? It should probably go on the main manual page right before the section on PGO. |
3dcf901
to
0446809
Compare
Thanks for the suggestion, I added a section to the manual. I hope that's along the lines of what you had in mind? |
clang/docs/UsersManual.rst
Outdated
Strict Aliasing | ||
--------------- | ||
|
||
Clang by default applies C/C++'s strict aliasing rules during optimizations. In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this isn't the case for clang-cl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It looks good now.
I updated the PR to not depend on #123595, if it lands before, I'll add back the reference to the TypeSanitizer doc |
0979a21
to
0e42112
Compare
clang/docs/UsersManual.rst
Outdated
Strict aliasing violations in the source may change program behavior and | ||
``-fno-strict-aliasing`` disables use of the strict aliasing rules. There also | ||
is an experimental :ref:`TypeSanitizer <TypeSanitizer>` to detect strict | ||
aliasing violations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start. I've reworded it to be a little more user-centric; please let me know what you think:
The C and C++ standards require accesses to objects in memory to use l-values of an appropriate type for the object. This is called *strict aliasing* or *type-based alias analysis*. Strict aliasing enhances a variety of powerful memory optimizations, including reordering, combining, and eliminating memory accesses. These optimizations can lead to unexpected behavior in code that violates the strict aliasing rules. For example:
::
void advance(size_t *index, double *data) {
double value = data[*index];
/* Clang may assume that this store does not change the contents of `data`. */
*index += 1;
/* Clang may assume that this store does not change the contents of `index`. */
data[*index] = value;
/* Either of these facts may create significant optimization opportunities
if Clang is able to inline this function. */
}
Strict aliasing can be explicitly enabled with ``-fstrict-aliasing`` and disabled with ``-fno-strict-aliasing``. ``clang-cl`` defaults to ``-fno-strict-aliasing``; see :ref:`Strict aliasing in clang-cl. <clang_cl_strict_aliasing>`. Otherwise, Clang defaults to ``-fstrict-aliasing``.
C and C++ specify slightly different rules for strict aliasing. To improve language interoperability, Clang allows two types to alias if either language would permit it. This includes applying the C++ similar types rule to C, allowing ``int **`` to alias ``int const * const *``. Clang also relaxes the standard aliasing rules in the following ways:
- All integer types of the same size are permitted to alias each other, including signed and unsigned types.
- ``void*`` is permitted to alias any pointer type, ``void**`` is permitted to alias any pointer to pointer type, and so on.
Code which violates strict aliasing has undefined behavior. A program that works in one version of Clang may not work in another because of changes to the optimizer. Clang provides a `:ref:TypeSanitizer <TypeSanitizer>` to help detect violations of the strict aliasing rules, but it is currently still experimental. Code that is known to violate strict aliasing should generally be built with ``-fno-strict-aliasing`` if the violation cannot be fixed.
Clang supports several ways to fix a violation of strict aliasing:
* L-values of the character types ``char`` and ``unsigned char`` (as well as other types, depending on the standard) are permitted to access objects of any type.
* Library functions such as ``memcpy`` and ``memset`` are specified as treating memory as characters and therefore are not limited by strict aliasing. If a value of one type must be reinterpreted as another (e.g. to read the bits of a floating-point number), use ``memcpy`` to copy the representation to an object of the destination type. This has no overhead over a direct l-value access because Clang should reliably optimize calls to these functions to use simple loads and stores when they are used with small constant sizes.
* The attribute ``may_alias`` can be added to a ``typedef`` to give l-values of that type the same aliasing power as the character types.
Clang makes a best effort to avoid obvious miscompilations from strict aliasing by only considering type information when it cannot prove that two accesses must refer to the same memory. However, it is not recommended that programmers intentionally rely on this instead of using one of the solutions above because it is too easy for the compiler's analysis to be blocked in surprising ways.
In Clang 20, Clang strengthened its implementation of strict aliasing for accesses of pointer type. Previously, all accesses of pointer type were permitted to alias each other, but Clang now distinguishes different pointers by their pointee type, except as limited by the relaxations around qualifiers and `void*` described above. The previous behavior of treating all pointers as aliasing can be restored using ``-fno-pointer-tbaa``.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's much better, updated thanks!
clang/docs/UsersManual.rst
Outdated
@@ -5272,6 +5296,8 @@ The Visual C++ Toolset has a slightly more elaborate mechanism for detection. | |||
Restrictions and Limitations compared to Clang | |||
---------------------------------------------- | |||
|
|||
.. _clang_cl_strict_aliasing: | |||
|
|||
Strict Aliasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict Aliasing | |
Strict Aliasing in ``clang-cl`` |
But consider just merging this into the new section above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shortened the section here, linking back to the main section. I kept a sentence here to call out the difference. WDYT?
0e42112
to
7645e97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last minute re-read caught a couple slip-ups, and yeah, as mentioned, please feel free to augment this with other relevant information for users. Also, I'm definitely not sure the separate clang-cl section is worth keeping; it's pretty short, and any extra information in it should probably just be rolled into here.
While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from llvm#119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.
7645e97
to
cce997e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from llvm/llvm-project#119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes llvm/llvm-project#119099. Fixes llvm/llvm-project#122537. PR: llvm/llvm-project#122116
I read about this commit in LLVM weekly. Thanks @fhahn and @rjmccall for adding documentation about strict aliasing! This is always a difficult topic for our users. One remark though. The documentation states:
This is not what was implemented here? Both |
We're documenting a weaker guarantee than we implement, yes. |
OK, thanks. I guess the wording "in Clang 20, [...] Clang now distinguishes different pointers by their pointee type, except as limited by the relaxations around qualifiers and Anyway, and more importantly, does the choice to document a weaker guarantee mean that you expect that the TBAA implementation can be further refined in this respect? (Which does not seem hard.) Or is it too dangerous for Clang to have stricter aliasing rules than GCC? GCC does seem to consider |
I think we can continue to refine our implementation, yeah. And the manual needs to document our guarantees, not our current implementation. We definitely don't consider ourselves bound to be everywhere at least as conservative as GCC. That GCC made alias analysis more conservative around |
Thanks. FWIW, I think this can be implemented as laid out in #126047. |
While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from llvm#119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes llvm#119099. Fixes llvm#122537. PR: llvm#122116 (cherry picked from commit 77d3f8a)
While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from llvm/llvm-project#119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes llvm/llvm-project#119099. Fixes llvm/llvm-project#122537. PR: llvm/llvm-project#122116 (cherry picked from commit 77d3f8a)
While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:
https://clang.godbolt.org/z/8dTv51v8W
An example in the wild is from
#119099
This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.
Fixes #119099.
Fixes #122537.