Skip to content

[clang] Fix nondeterminism in MemberPointerType #137910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

emaxx-google
Copy link
Contributor

@emaxx-google emaxx-google commented Apr 30, 2025

This commit fixes the nondeterminism issue in C++ header module enabled builds which were observed after
#132401.

The issue was related to the fact that the hash set operation in MemberPointerType::Profile() was triggering getMostRecentDecl(). As the latter may trigger the loading of new entities from the external AST source, this
was presumably causing reentrant modification of the hash set (with some probability, leading to creating a
different PCM output - likely after some internal data structure corruption).

The change should otherwise be a no-op, because whether we take a "most recent" or "any" Decl shouldn't
matter since getCanonicalDecl() is called on it anyway inside MemberPointerType::Profile().

We haven't been able to come up with a deterministic regression test for this fix.

This commit fixes the nondeterminism issue in C++ header module enabled
builds which were observed after
llvm#132401.

The issue was related to the fact that the hash set operation in
MemberPointerType::Profile() was triggering getMostRecentDecl(). The
root cause seems to be that the latter was leading to the reentrant
modification of the hash set, with some probability (likely depending
on the actual values of hashes).

We haven't been able to come up with a deterministic regression test for
this fix.
@emaxx-google emaxx-google marked this pull request as ready for review April 30, 2025 01:36
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 30, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 30, 2025

@llvm/pr-subscribers-clang

Author: Maksim Ivanov (emaxx-google)

Changes

This commit fixes the nondeterminism issue in C++ header module enabled builds which were observed after
#132401.

The issue was related to the fact that the hash set operation in MemberPointerType::Profile() was triggering getMostRecentDecl(). The root cause seems to be that the latter was leading to the reentrant modification of the hash set, with some probability (likely depending on the actual values of hashes).

We haven't been able to come up with a deterministic regression test for this fix.


Full diff: https://github.com/llvm/llvm-project/pull/137910.diff

2 Files Affected:

  • (modified) clang/include/clang/AST/Type.h (+2-1)
  • (modified) clang/lib/AST/Type.cpp (+7-3)
diff --git a/clang/include/clang/AST/Type.h b/clang/include/clang/AST/Type.h
index 3e1fb05ad537c..c93cbae767db8 100644
--- a/clang/include/clang/AST/Type.h
+++ b/clang/include/clang/AST/Type.h
@@ -3602,6 +3602,7 @@ class MemberPointerType : public Type, public llvm::FoldingSetNode {
   }
 
   NestedNameSpecifier *getQualifier() const { return Qualifier; }
+  CXXRecordDecl *getCXXRecordDecl() const;
   CXXRecordDecl *getMostRecentCXXRecordDecl() const;
 
   bool isSugared() const;
@@ -3610,7 +3611,7 @@ class MemberPointerType : public Type, public llvm::FoldingSetNode {
   }
 
   void Profile(llvm::FoldingSetNodeID &ID) {
-    Profile(ID, getPointeeType(), getQualifier(), getMostRecentCXXRecordDecl());
+    Profile(ID, getPointeeType(), getQualifier(), getCXXRecordDecl());
   }
 
   static void Profile(llvm::FoldingSetNodeID &ID, QualType Pointee,
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 111a642173418..a149eac7e4555 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -5275,10 +5275,14 @@ void MemberPointerType::Profile(llvm::FoldingSetNodeID &ID, QualType Pointee,
     ID.AddPointer(Cls->getCanonicalDecl());
 }
 
+CXXRecordDecl *MemberPointerType::getCXXRecordDecl() const {
+  return dyn_cast<MemberPointerType>(getCanonicalTypeInternal())
+      ->getQualifier()
+      ->getAsRecordDecl();
+}
+
 CXXRecordDecl *MemberPointerType::getMostRecentCXXRecordDecl() const {
-  auto *RD = dyn_cast<MemberPointerType>(getCanonicalTypeInternal())
-                 ->getQualifier()
-                 ->getAsRecordDecl();
+  auto *RD = getCXXRecordDecl();
   if (!RD)
     return nullptr;
   return RD->getMostRecentNonInjectedDecl();

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why this is a specifically a problem when called from Profile(...) can you elaborate a bit more in the summary.

Does getMostRecentCXXRecordDecl(...) require a comment that documents this behavior so that future users do not fall into the same issue?

@@ -3610,7 +3611,7 @@ class MemberPointerType : public Type, public llvm::FoldingSetNode {
}

void Profile(llvm::FoldingSetNodeID &ID) {
Profile(ID, getPointeeType(), getQualifier(), getMostRecentCXXRecordDecl());
Profile(ID, getPointeeType(), getQualifier(), getCXXRecordDecl());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem that we run this profile in the 'middle' of a TU and the most recent decl changes by the end? If so, this makes sense.

That said, I'd expect this to be the above suggestion in case this member pointer type has a redeclaration that perhaps changes what the actual referenced decl is.

Suggested change
Profile(ID, getPointeeType(), getQualifier(), getCXXRecordDecl());
Profile(ID, getPointeeType(), getQualifier(), getCXXRecordDecl()->getCanonicalDecl());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this suggestion seems not needed because this getCanonicalDecl() call is already made inside the Profile() method:

ID.AddPointer(Cls->getCanonicalDecl());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... then I'm surprised that this patch is needed at all for the purposes of non-determinism. getMostRecentCXXRecordDecl()->getCanonicalDecl() will get the same declaration as getCXXRecordDecl()->getCanonicalDecl().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is transient - the reproducibility rate on the reproducer (https://pastebin.com/6aL6rmBe) is only 1%. Also it's not the pointer we have here that's problematic, but rather the fact of extra AST node loading inside the FoldingSet hash calculation operation.

@emaxx-google
Copy link
Contributor Author

emaxx-google commented Apr 30, 2025

It is not clear to me why this is a specifically a problem when called from Profile(...) can you elaborate a bit more in the summary.

I've added a bit more details - it's basically the fact that the "get most recent" operation triggered the load of new nodes from the PCM file. If all this happens while a hash for another node is calculated in Profile(), it means we're modifying the hash map while doing another operation with it.

At least that's the current theory, based on debugging the "deviating" executions of the compiler and seeing the getMostRecentNonInjectedDecl() call and the deserialization - all during the FoldingSet::NodeEquals() function call.

Does getMostRecentCXXRecordDecl(...) require a comment that documents this behavior so that future users do not fall into the same issue?

Sure, I've added a comment.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I understand the problem this is solving, but I also don't see any harm in it, so I'm leaning towards accept here. Aaron can be the final approval though.

Copy link
Contributor

@mizvekov mizvekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't have time to review this closely until next week.

I think the 'inside profile' is a red herring here, this looks more like a problem from calling it from within deserialization.

We could try getting to the bottom of it and see if we can assert in the problematic cases, to avoid future repeats of the problem.

But the fix here is fine on its own, we don't need the most recent decl in most cases, and we kept one getter only for simplicity.

Though I think the name of the new getter needs more thought, it needs to avoid confusion with similar methods from the base type class.

Though I won't have time to give input on that until the weekend at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants