Skip to content

[TBAA] Don't emit pointer-tbaa for void pointers. #122116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 31, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 8, 2025

While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:

int count_elements(void * values) {
  void **seq = values;
  int count;
  for (count = 0; seq && seq[count]; count++);
  return count;
}

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
#119099

This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.

Fixes #119099.
Fixes #122537.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Florian Hahn (fhahn)

Changes

While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers:

int count_elements(void * values) {
  void **seq = values;
  int count;
  for (count = 0; seq && seq[count]; count++);
  return count;
}

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
#119099

This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now.


Full diff: https://github.com/llvm/llvm-project/pull/122116.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+8)
  • (modified) clang/test/CodeGen/tbaa-pointers.c (+3-10)
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 75e66bae79afdc..3f1a24791ddd81 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -226,6 +226,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
       PtrDepth++;
       Ty = Ty->getPointeeType()->getBaseElementTypeUnsafe();
     } while (Ty->isPointerType());
+
+    // While there are no special rules in the standards regarding void pointers
+    // and strict aliasing, emitting distinct tags for void pointers break some
+    // common idioms and there is no good alternative to re-write the code
+    // without strict-aliasing violations.
+    if (Ty->isVoidType())
+      return AnyPtr;
+
     assert(!isa<VariableArrayType>(Ty));
     // When the underlying type is a builtin type, we compute the pointee type
     // string recursively, which is implicitly more forgiving than the standards
diff --git a/clang/test/CodeGen/tbaa-pointers.c b/clang/test/CodeGen/tbaa-pointers.c
index 4aae2552f107a3..48adac503357f0 100644
--- a/clang/test/CodeGen/tbaa-pointers.c
+++ b/clang/test/CodeGen/tbaa-pointers.c
@@ -208,12 +208,9 @@ int void_ptrs(void **ptr) {
 // COMMON-LABEL: define i32 @void_ptrs(
 // COMMON-SAME: ptr noundef [[PTRA:%.+]])
 // COMMON:        [[PTR_ADDR:%.+]]  = alloca ptr, align 8
-// DISABLE-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
-// DISABLE-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
-// DISABLE-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
-// DEFAULT-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[P2VOID:!.+]]
-// DEFAULT-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[P2VOID]]
-// DEFAULT-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[P1VOID:!.+]]
+// COMMON-NEXT:   store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
+// COMMON-NEXT:   [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
+// COMMON-NEXT:   [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
 // COMMON-NEXT:   [[BOOL:%.+]] = icmp ne ptr [[L1]], null
 // COMMON-NEXT:   [[BOOL_EXT:%.+]] = zext i1 [[BOOL]] to i64
 // COMMON-NEXT:   [[COND:%.+]] = select i1 [[BOOL]], i32 0, i32 1
@@ -254,7 +251,3 @@ int void_ptrs(void **ptr) {
 // COMMON:  [[INT_TAG]] = !{[[INT_TY:!.+]], [[INT_TY]], i64 0}
 // COMMON:  [[INT_TY]] = !{!"int", [[CHAR]], i64 0}
 // DEFAULT: [[ANYPTR]] = !{[[ANY_POINTER]],  [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P2VOID]] = !{[[P2VOID_TY:!.+]], [[P2VOID_TY]], i64 0}
-// DEFAULT: [[P2VOID_TY]] = !{!"p2 void", [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P1VOID]] = !{[[P1VOID_TY:!.+]], [[P1VOID_TY]], i64 0}
-// DEFAULT: [[P1VOID_TY]] = !{!"p1 void", [[ANY_POINTER]], i64 0}

@pinskia
Copy link

pinskia commented Jan 9, 2025

Note GCC already treats void* as being compatiable with all other pointers and has since GCC 6 (when it started to disambiguates accesses to different pointers). I don't think it is documented though.
Another example where clang/LLVM handles void pointers differently from GCC:

extern void abort(void);
int f(void *a, int **b, void *c)
{
  void **e = a;
  int d = **b;
  *e = c;
  return d + **b;
}
int main()
{
  int d = 1;
  int ff = 0;
  int *e = &d;
  if (f(&e, &e, &ff) != 1)
    abort();
}

@pinskia
Copy link

pinskia commented Jan 9, 2025

Note GCC's testcase testsuite/gcc.dg/alias-14.c is testcase for this extension: "as an extension we consider void * universal. Writes to it should alias."

@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

@pinskia thanks for sharing the GCC context!

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 0144b06 to 471ed86 Compare January 9, 2025 11:50
Copy link

github-actions bot commented Jan 9, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@rjmccall
Copy link
Contributor

rjmccall commented Jan 9, 2025

I agree that allowing void* l-values to alias arbitrary pointer objects is probably the right pragmatic choice. We should document it, though.

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

What would be a good place to document this?

@pinskia
Copy link

pinskia commented Jan 9, 2025

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

It looks like GCC is handling this as recusive in that a store to a void** location will conflict with a load from a other* location.

That is:

void f(void ***a, int **b, void **d)
{
  int *c = *b;
  *a = d;
  if (*b != c)
    __builtin_abort();
}

The conditional will not be optimized away.

@fhahn
Copy link
Contributor Author

fhahn commented Jan 9, 2025

@pinskia, does GCC apply this recursively — e.g. are void** l-values treated specially in any way, or are they basically just char** for aliasing purposes?

It looks like GCC is handling this as recusive in that a store to a void** location will conflict with a load from a other* location.

That is:

void f(void ***a, int **b, void **d)
{
  int *c = *b;
  *a = d;
  if (*b != c)
    __builtin_abort();
}

The conditional will not be optimized away.

Same behavior in Clang with this patch

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 471ed86 to 3dcf901 Compare January 9, 2025 19:47
@rjmccall
Copy link
Contributor

rjmccall commented Jan 9, 2025

Okay, so if the ultimate pointee type is void, we're basically treating that as a generic pointer, no matter what the pointer depth is? I guess that makes sense.

What would be a good place to document this?

Hmm, I was hoping that we would have a section in the manual already about aliasing. That seems like a good thing to add, especially now that we've got a small family of options controlling it. Are you interested in working on that? It should probably go on the main manual page right before the section on PGO.

@fhahn
Copy link
Contributor Author

fhahn commented Jan 28, 2025

Okay, so if the ultimate pointee type is void, we're basically treating that as a generic pointer, no matter what the pointer depth is? I guess that makes sense.

What would be a good place to document this?

Hmm, I was hoping that we would have a section in the manual already about aliasing. That seems like a good thing to add, especially now that we've got a small family of options controlling it. Are you interested in working on that? It should probably go on the main manual page right before the section on PGO.

Thanks for the suggestion, I added a section to the manual. I hope that's along the lines of what you had in mind?

Strict Aliasing
---------------

Clang by default applies C/C++'s strict aliasing rules during optimizations. In
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this isn't the case for clang-cl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks good now.

@fhahn
Copy link
Contributor Author

fhahn commented Jan 28, 2025

I updated the PR to not depend on #123595, if it lands before, I'll add back the reference to the TypeSanitizer doc

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 0979a21 to 0e42112 Compare January 28, 2025 20:44
Strict aliasing violations in the source may change program behavior and
``-fno-strict-aliasing`` disables use of the strict aliasing rules. There also
is an experimental :ref:`TypeSanitizer <TypeSanitizer>` to detect strict
aliasing violations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. I've reworded it to be a little more user-centric; please let me know what you think:

The C and C++ standards require accesses to objects in memory to use l-values of an appropriate type for the object. This is called *strict aliasing* or *type-based alias analysis*. Strict aliasing enhances a variety of powerful memory optimizations, including reordering, combining, and eliminating memory accesses. These optimizations can lead to unexpected behavior in code that violates the strict aliasing rules. For example:

::
  void advance(size_t *index, double *data) {
    double value = data[*index];
    /* Clang may assume that this store does not change the contents of `data`. */
    *index += 1;
    /* Clang may assume that this store does not change the contents of `index`. */
    data[*index] = value;
    /* Either of these facts may create significant optimization opportunities
       if Clang is able to inline this function. */
  }

Strict aliasing can be explicitly enabled with ``-fstrict-aliasing`` and disabled with ``-fno-strict-aliasing``. ``clang-cl`` defaults to ``-fno-strict-aliasing``; see :ref:`Strict aliasing in clang-cl. <clang_cl_strict_aliasing>`. Otherwise, Clang defaults to ``-fstrict-aliasing``.

C and C++ specify slightly different rules for strict aliasing. To improve language interoperability, Clang allows two types to alias if either language would permit it. This includes applying the C++ similar types rule to C, allowing ``int **`` to alias ``int const * const *``. Clang also relaxes the standard aliasing rules in the following ways:

- All integer types of the same size are permitted to alias each other, including signed and unsigned types.
- ``void*`` is permitted to alias any pointer type, ``void**`` is permitted to alias any pointer to pointer type, and so on.

Code which violates strict aliasing has undefined behavior. A program that works in one version of Clang may not work in another because of changes to the optimizer. Clang provides a `:ref:TypeSanitizer <TypeSanitizer>` to help detect violations of the strict aliasing rules, but it is currently still experimental. Code that is known to violate strict aliasing should generally be built with ``-fno-strict-aliasing`` if the violation cannot be fixed.

Clang supports several ways to fix a violation of strict aliasing:

* L-values of the character types ``char`` and ``unsigned char`` (as well as other types, depending on the standard) are permitted to access objects of any type.

* Library functions such as ``memcpy`` and ``memset`` are specified as treating memory as characters and therefore are not limited by strict aliasing. If a value of one type must be reinterpreted as another (e.g. to read the bits of a floating-point number), use ``memcpy`` to copy the representation to an object of the destination type. This has no overhead over a direct l-value access because Clang should reliably optimize calls to these functions to use simple loads and stores when they are used with small constant sizes.

* The attribute ``may_alias`` can be added to a ``typedef`` to give l-values of that type the same aliasing power as the character types.

Clang makes a best effort to avoid obvious miscompilations from strict aliasing by only considering type information when it cannot prove that two accesses must refer to the same memory. However, it is not recommended that programmers intentionally rely on this instead of using one of the solutions above because it is too easy for the compiler's analysis to be blocked in surprising ways.

In Clang 20, Clang strengthened its implementation of strict aliasing for accesses of pointer type. Previously, all accesses of pointer type were permitted to alias each other, but Clang now distinguishes different pointers by their pointee type, except as limited by the relaxations around qualifiers and `void*` described above. The previous behavior of treating all pointers as aliasing can be restored using ``-fno-pointer-tbaa``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much better, updated thanks!

@@ -5272,6 +5296,8 @@ The Visual C++ Toolset has a slightly more elaborate mechanism for detection.
Restrictions and Limitations compared to Clang
----------------------------------------------

.. _clang_cl_strict_aliasing:

Strict Aliasing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Strict Aliasing
Strict Aliasing in ``clang-cl``

But consider just merging this into the new section above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened the section here, linking back to the main section. I kept a sentence here to call out the difference. WDYT?

@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 0e42112 to 7645e97 Compare January 29, 2025 12:52
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last minute re-read caught a couple slip-ups, and yeah, as mentioned, please feel free to augment this with other relevant information for users. Also, I'm definitely not sure the separate clang-cl section is worth keeping; it's pretty short, and any extra information in it should probably just be rolled into here.

fhahn added 8 commits January 30, 2025 12:29
While there are no special rules in the standards regarding
void pointers and strict aliasing, emitting distinct tags for void
pointers break some common idioms and there is no good alternative to
re-write the code without strict-aliasing violations. An example is
to count the entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.
@fhahn fhahn force-pushed the clang-pointer-tbaa-void branch from 7645e97 to cce997e Compare January 30, 2025 18:15
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fhahn fhahn merged commit 77d3f8a into llvm:main Jan 31, 2025
9 checks passed
@fhahn fhahn deleted the clang-pointer-tbaa-void branch January 31, 2025 11:38
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 31, 2025
While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm/llvm-project#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes llvm/llvm-project#119099.
Fixes llvm/llvm-project#122537.

PR: llvm/llvm-project#122116
@brunodf-snps
Copy link
Contributor

I read about this commit in LLVM weekly. Thanks @fhahn and @rjmccall for adding documentation about strict aliasing! This is always a difficult topic for our users.

One remark though. The documentation states:

void* is permitted to alias any pointer type, void** is permitted to alias any pointer to pointer type, and so on.

This is not what was implemented here? Both void* and void** get the same TBAA tag:
https://clang.godbolt.org/z/b1ob8vnjc
If I understand correctly, current TBAA will not conclude NoAlias between a void** and an int* access, while the documentation suggests that it will. (Without knowing too much about the background of this change, I'd find the approach of the documentation sensible.)

@rjmccall
Copy link
Contributor

rjmccall commented Feb 4, 2025

We're documenting a weaker guarantee than we implement, yes.

@brunodf-snps
Copy link
Contributor

We're documenting a weaker guarantee than we implement, yes.

OK, thanks. I guess the wording "in Clang 20, [...] Clang now distinguishes different pointers by their pointee type, except as limited by the relaxations around qualifiers and void* described above" led me to believe this was intended as a description of what was currently implemented.

Anyway, and more importantly, does the choice to document a weaker guarantee mean that you expect that the TBAA implementation can be further refined in this respect? (Which does not seem hard.) Or is it too dangerous for Clang to have stricter aliasing rules than GCC? GCC does seem to consider int* and void** to alias, or so I conclude from the extra load of *p1 in test1 here: https://godbolt.org/z/5jjh8MrP8

@rjmccall
Copy link
Contributor

rjmccall commented Feb 5, 2025

I think we can continue to refine our implementation, yeah. And the manual needs to document our guarantees, not our current implementation.

We definitely don't consider ourselves bound to be everywhere at least as conservative as GCC. That GCC made alias analysis more conservative around void* is useful to know, but we want to consider their intent and not the compiler's bug-for-bug exact behavior. The supertype-ish reasoning I laid out above seems like the right idea. Programmers frequently find it useful to write code that works with any kind of pointer. C appears to give them a tool for doing so, void*, but then the strict aliasing rules make it illegal if you need to work with an array of pointers. We're just trying to undo that decision and make reasonable code that's "polymorphic" over pointer types work as intended.

@brunodf-snps
Copy link
Contributor

I think we can continue to refine our implementation, yeah. [...] The supertype-ish reasoning I laid out above seems like the right idea.

Thanks. FWIW, I think this can be implemented as laid out in #126047.

swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 18, 2025
While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes llvm#119099.
Fixes llvm#122537.

PR: llvm#122116
(cherry picked from commit 77d3f8a)
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 18, 2025
While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm/llvm-project#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes llvm/llvm-project#119099.
Fixes llvm/llvm-project#122537.

PR: llvm/llvm-project#122116
(cherry picked from commit 77d3f8a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
6 participants