Skip to content

Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen #129934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

higher-performance
Copy link
Contributor

@higher-performance higher-performance commented Mar 5, 2025

This mitigates a regression introduced in #87824.

The mitigation here are two-fold:

  • We only store pointers to the vector elements, rather than copies of the elements themselves. This saves a lot of memory because the elements are ~5 times larger than pointers.

  • We populate the "seen" set lazily, whenever it is queried. This further reduces memory usage and speeds up queries.

Fixes #129808.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Mar 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-clang

Author: None (higher-performance)

Changes

This is a regression in #87824.

With this change, the use of parent maps (hasParent(), hasAncestor(), etc.) in Clang AST matchers is no longer guaranteed to avoid quadratic slowdown, but in practice it should do so more frequently. I tested against a translation unit that had been slow in the past, and it worked fine on that. If we hit those pathological cases in the future, we can evaluate other approaches.

Fixes #129808


Full diff: https://github.com/llvm/llvm-project/pull/129934.diff

1 Files Affected:

  • (modified) clang/lib/AST/ParentMapContext.cpp (+65-8)
diff --git a/clang/lib/AST/ParentMapContext.cpp b/clang/lib/AST/ParentMapContext.cpp
index e9387ec79c373..03e5236eeb1db 100644
--- a/clang/lib/AST/ParentMapContext.cpp
+++ b/clang/lib/AST/ParentMapContext.cpp
@@ -65,21 +65,78 @@ class ParentMapContext::ParentMap {
   public:
     ParentVector() = default;
     explicit ParentVector(size_t N, const DynTypedNode &Value) {
-      Items.reserve(N);
+      SortedAndUnsortedItems.reserve(N);
       for (; N > 0; --N)
         push_back(Value);
     }
     bool contains(const DynTypedNode &Value) {
-      return Seen.contains(Value);
+      const auto SortBoundary = SortedAndUnsortedItems.begin() + NumSorted;
+      bool Found = std::binary_search(SortedAndUnsortedItems.begin(),
+                                      SortBoundary, Value);
+      Budget += llvm::bit_width(
+          static_cast<size_t>(SortBoundary - SortedAndUnsortedItems.begin()));
+      if (!Found) {
+        auto FoundIt =
+            std::find(SortBoundary, SortedAndUnsortedItems.end(), Value);
+        Budget += FoundIt - SortBoundary;
+        Found |= FoundIt != SortedAndUnsortedItems.end();
+      }
+      SortIfWorthwhile();
+      return Found;
     }
     void push_back(const DynTypedNode &Value) {
-      if (!Value.getMemoizationData() || Seen.insert(Value).second)
-        Items.push_back(Value);
+      ++Budget;
+      if (!Value.getMemoizationData() || !contains(Value)) {
+        SortedAndUnsortedItems.push_back(Value);
+        if (SortedAndUnsortedItems.back() < SortedAndUnsortedItems[NumSorted]) {
+          // Keep the minimum element in the middle to quickly tell us if
+          // merging will be necessary
+          using std::swap;
+          swap(SortedAndUnsortedItems.back(),
+               SortedAndUnsortedItems[NumSorted]);
+        }
+      }
+      VerifyInvariant();
     }
-    llvm::ArrayRef<DynTypedNode> view() const { return Items; }
+    llvm::ArrayRef<DynTypedNode> view() {
+      ++Budget;
+      return SortedAndUnsortedItems;
+    }
+
   private:
-    llvm::SmallVector<DynTypedNode, 2> Items;
-    llvm::SmallDenseSet<DynTypedNode, 2> Seen;
+    void SortIfWorthwhile() {
+      VerifyInvariant();
+      auto SortBoundary = SortedAndUnsortedItems.begin() + NumSorted;
+      if (SortBoundary != SortedAndUnsortedItems.end()) {
+        const size_t NumUnsorted = SortedAndUnsortedItems.end() - SortBoundary;
+        const size_t SortingCost = NumUnsorted * llvm::bit_width(NumUnsorted);
+        const bool NeedMerge = SortBoundary != SortedAndUnsortedItems.begin();
+        // Assume that the naive implementation would copy these elements.
+        // This is just an estimate; it's OK if it's wrong.
+        const size_t MergeCost = SortedAndUnsortedItems.size() + NumUnsorted;
+        if (Budget >= (NeedMerge ? MergeCost : 0) + SortingCost) {
+          std::sort(SortBoundary, SortedAndUnsortedItems.end());
+          if (NeedMerge) {
+            std::inplace_merge(SortedAndUnsortedItems.begin(), SortBoundary,
+                               SortedAndUnsortedItems.end());
+          }
+          Budget = 0;
+          NumSorted = SortedAndUnsortedItems.size();
+        }
+      }
+    }
+
+    void VerifyInvariant() const {
+      assert(
+          !(NumSorted < SortedAndUnsortedItems.size() &&
+            SortedAndUnsortedItems.back() <
+                SortedAndUnsortedItems[NumSorted]) &&
+          "the boundary item must always be the minimum of the unsorted items");
+    }
+
+    llvm::SmallVector<DynTypedNode, 2> SortedAndUnsortedItems;
+    size_t NumSorted = 0;
+    int64_t Budget = 0;
   };
 
   /// Maps from a node to its parents. This is used for nodes that have
@@ -117,7 +174,7 @@ class ParentMapContext::ParentMap {
     if (I == Map.end()) {
       return llvm::ArrayRef<DynTypedNode>();
     }
-    if (const auto *V = dyn_cast<ParentVector *>(I->second)) {
+    if (auto *V = dyn_cast<ParentVector *>(I->second)) {
       return V->view();
     }
     return getSingleDynTypedNodeFromParentMap(I->second);

@higher-performance higher-performance marked this pull request as draft March 5, 2025 21:05
@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch from 0e3c685 to e99bb26 Compare March 5, 2025 21:59
@higher-performance higher-performance marked this pull request as ready for review March 5, 2025 22:07
@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch from e99bb26 to 9f12c58 Compare March 5, 2025 22:09
@higher-performance higher-performance changed the title Reduce memory usage in AST parent map generation by partially reverting quadratic slowdown mitigation Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen Mar 5, 2025
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronBallman should take a look at this as well. We might take a look at SetVector and see if there are any learnings we can take from that. It has a set and a vector together, but might have some interesting things.

Finally... I wonder if we were to just store this as a sorted-vector of DynTypedNode + insert order, then do a copy & sort on access of view.

Can we do some benchmarking to figure out which things are done the 'most often' with what values of N? it might better guide this.

}
void push_back(const DynTypedNode &Value) {
if (!Value.getMemoizationData() || Seen.insert(Value).second)
if (!Value.getMemoizationData() || !contains(Value)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unfortunate to potentially re-calculate in contains here to just potentially immediately re-calculate it. I wonder if there is value to special-case ItemsProcessed == 0 && Items.capacity() == Items.size() to just do a linear search?

Also-also--- I wonder if it makes sense to only do the cache when size() is significant enough? It would be interesting to see if there is a value of N under which the set doesn't really make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could definitely optimize these further like in the ways you mention, but those are just (very small) constant-factor savings in time, at a significant readability and fragility cost. I didn't get the feeling it's worth it to be honest -- do you? Note that the performance graphs almost coincide with Clang 17's right now.

@higher-performance
Copy link
Contributor Author

higher-performance commented Mar 5, 2025

Oh shoot, it looks like my change may be buggy. One of the tests is breaking... I will take a look.

Edit: Fixed below. The fix didn't affect performance.

@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch 4 times, most recently from eaefea5 to bcdf883 Compare March 5, 2025 23:25
@higher-performance
Copy link
Contributor Author

higher-performance commented Mar 6, 2025

Ah sorry I forgot to reply to your review.

We might take a look at SetVector and see if there are any learnings we can take from that. It has a set and a vector together, but might have some interesting things.

I did take a look at this actually, but it looked equivalent to what I was doing before, particularly with respect to storing the expensive keys (DynTypedNode) twice. It just uses SmallVector<T, ...> and DenseSet<T> underneath, whereas I just happened to use SmallDenseSet<T, ...> instead of DenseSet<T>.

Finally... I wonder if we were to just store this as a sorted-vector of DynTypedNode + insert order, then do a copy & sort on access of view.

I don't think that works. You don't want to do a ton of work on every view() -- it's assumed to be cheap, and called often. Also, maintaining sort order gets expensive the more nodes you add; you don't want to do it every insertion.

Furthermore, not only is it that elements without memoization data aren't comparable, but they still need to appear in the view in the same order as they were inserted. This means either they need to be in the same vector as the comparable elements (which makes it impossible to sort the comparable portion), or we need to create a façade over two completely disjoint containers.

The implementation in this PR addresses all of those constraints without any drawbacks, I think. There's less code compared to writing a facade, and performance-wise it feels just about optimal.

Can we do some benchmarking to figure out which things are done the 'most often' with what values of N? it might better guide this.

Re: benchmarking -- note that I did a general benchmark both with the test case that was provided in the associated issue, as well as with an expensive TU that we had internally. It had great performance in both cases. Regarding N -- I assume N is referring to the total container sizes (per the other comment)? The thing is, I don't get the impression there's much room for improvement here. The memory usage was just high by a constant (10x) factor. Some ~5x of that almost certainly came from the data type itself (storing DynTypedNode in the cache which is 40 bytes, instead of DynTypedNode*, which is 8 bytes). Adding laziness on top of that has shrunk the difference so much that the performance plots are already very close to coinciding.

@erichkeane
Copy link
Collaborator

Ah sorry I forgot to reply to your review.

We might take a look at SetVector and see if there are any learnings we can take from that. It has a set and a vector together, but might have some interesting things.

I did take a look at this actually, but it looked equivalent to what I was doing before, particularly with respect to storing the expensive keys (DynTypedNode) twice. It just uses SmallVector<T, ...> and DenseSet<T> underneath, whereas I just happened to use SmallDenseSet<T, ...> instead of DenseSet<T>.

Finally... I wonder if we were to just store this as a sorted-vector of DynTypedNode + insert order, then do a copy & sort on access of view.

I don't think that works. You don't want to do a ton of work on every view() -- it's assumed to be cheap, and called often. Also, maintaining sort order gets expensive the more nodes you add; you don't want to do it every insertion.

I see. Yes, so the problem is that I don't have a good idea here of the uses, I did a first run here. If view is the 'frequent' operation, then my suggestion isn't a good one at least :)

Furthermore, not only is it that elements without memoization data aren't comparable, but they still need to appear in the view in the same order as they were inserted. This means either they need to be in the same vector as the comparable elements (which makes it impossible to sort the comparable portion), or we need to create a façade over two completely disjoint containers.

The implementation in this PR addresses all of those constraints without any drawbacks, I think. There's less code compared to writing a facade, and performance-wise it feels just about optimal.

Can we do some benchmarking to figure out which things are done the 'most often' with what values of N? it might better guide this.

Re: benchmarking -- note that I did a general benchmark both with the test case that was provided in the associated issue, as well as with an expensive TU that we had internally. It had great performance in both cases. Regarding N -- I assume N is referring to the total container sizes (per the other comment)? The thing is, I don't get the impression there's much room for improvement here. The memory usage was just high by a constant (10x) factor. Some ~5x of that almost certainly came from the data type itself (storing DynTypedNode in the cache which is 40 bytes, instead of DynTypedNode*, which is 8 bytes). Adding laziness on top of that has shrunk the difference so much that the performance plots are already very close to coinciding.

Ok, I think I'm ok with all that then. BUT as I said, I'm not the best person to be reviewing this, I've pinged Aaron and hope he'll come along.

Comment on lines 109 to 106
const size_t OldCapacity = Items.capacity();
Items.push_back(Value);
if (OldCapacity != Items.capacity()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite terrifying.
Would we impact performance significantly if we use a map <pointer , index>,
or a vector of pointer that has the same order as the vector of value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't compile without significant refactoring across several files. We need the Items vector to be convertible to llvm::ArrayRef<DynTypedNode>, because that's what view() returns.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits from me so far.

I think there may be some additional performance benefits we could eek out, but I don't think they need to be done in this patch (or done at all, it's speculative). One is that I noticed ASTContext::getTraversalScope() is returning a vector by copy and that doesn't seem necessary. Another is that perhaps we want to use a bloom filter here instead on the assumption that the parent map will always be fairly large. But again, this is all speculation.

@erichkeane
Copy link
Collaborator

Just some minor nits from me so far.

I think there may be some additional performance benefits we could eek out, but I don't think they need to be done in this patch (or done at all, it's speculative). One is that I noticed ASTContext::getTraversalScope() is returning a vector by copy and that doesn't seem necessary. Another is that perhaps we want to use a bloom filter here instead on the assumption that the parent map will always be fairly large. But again, this is all speculation.

Hmm... would SOME duplicates be acceptable? I THINK the original patch removed the duplicates, so they were presumably OK before then? IF that is acceptable (and uses are duplicate-tolerant), a Bloom filter that can reduce the number of duplicates sub-1% would still be a massive-win, right? We could perhaps do some tuning on the size of the filter to get that reasonable.

According to the Wiki article on Bloom filter's intro, 10 bits-per-element is all that is necessary for a 1% false-positive rate, but the details of the article and reference get really "LaTEX-math-stuff" to me, so I don't have a good idea what it would take to get the false positive rate down low.

BUT since the idea is to just reduce/eliminate duplicates without high memory overhead, it seems like it would be a great solution here.

@erichkeane
Copy link
Collaborator

One is that I noticed ASTContext::getTraversalScope() is returning a vector by copy and that doesn't seem necessary.

I noticed this too! I'm actually working on validating a RAC fix for this right now, just to have it return an ArrayRef instead. Both uses of that function just do observation, so it seems reasonable enough

@cor3ntin
Copy link
Contributor

cor3ntin commented Mar 6, 2025

I think we are a bit confused, so can someone tell me if I'm wrong

  • N is always going to be fairly small (and always 1 in the non-template case)
  • We care about the order of parents
  • We are just trying to avoid duplicate
  • Intrusive linked list

Here are some random ideas

  • Modify ParentStack so that it stores pointers such that we never make copies
  • Go back to a simple vector if we are only storing pointers
  • Play with something like a multimap for ParentMap - because ultimately, that's what we are trying to do, and maybe that would avoid the overhead of many SmallVector

@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch from bcdf883 to 1bfd371 Compare March 6, 2025 18:23
@higher-performance
Copy link
Contributor Author

higher-performance commented Mar 6, 2025

@cor3ntin We need to satisfy this constraint, which is not trivial:

llvm::ArrayRef<DynTypedNode> view() const { return Items; }
private:
llvm::SmallVector<DynTypedNode, 2> Items;

i.e. if we stored maps or pointers then we could no longer return an llvm::ArrayRef.

@higher-performance
Copy link
Contributor Author

Checking in - has everybody had a chance to review this? Can we merge?

@AaronBallman
Copy link
Collaborator

Checking in - has everybody had a chance to review this? Can we merge?

Was @cor3ntin correct here?

I think we are a bit confused, so can someone tell me if I'm wrong

N is always going to be fairly small (and always 1 in the non-template case)
We care about the order of parents
We are just trying to avoid duplicate
Intrusive linked list

@higher-performance
Copy link
Contributor Author

Checking in - has everybody had a chance to review this? Can we merge?

Was @cor3ntin correct here?

I think we are a bit confused, so can someone tell me if I'm wrong
N is always going to be fairly small (and always 1 in the non-template case)
We care about the order of parents
We are just trying to avoid duplicate
Intrusive linked list

Depends which part you mean, but I don't believe so - I tried to respond to it here. Ultimately I was saying we can't have an intrusive linked list because of that constraint. The part about "just trying" to avoid duplicates is somewhat ambiguous; the code only seems to want to avoid duplicates nodes that carry memoization data, but not other nodes. It may just be an optimization but I'm not confident enough to say if duplicating those will cause a problem. The part about caring about parent order is true as I've mentioned earlier, and the part about N being small I just don't know -- it probably depends on the codebase and how many parents/ancestors a class has.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@higher-performance
Copy link
Contributor Author

I just realized there is another optimization we could do in push_back (similar motivation as what @erichkeane mentioned here, but different):

We could avoid the duplication check in push_back, and defer it to the contains()/view() accessors, thus making push_back simply do if (!Value.getMemoizationData()) { ... }. If we do such a thing, then we'd probably want to amortize it so that the number of unprocessed entries doesn't grow unboundedly. We could keep the number of unprocessed elements within (say) 25% of the size of the processed elements.

Thoughts on this? I'm inclined to give it a try to see if it's worth the code complexity.

@erichkeane
Copy link
Collaborator

I just realized there is another optimization we could do in push_back (similar motivation as what @erichkeane mentioned here, but different):

We could avoid the duplication check in push_back, and defer it to the contains()/view() accessors, thus making push_back simply do if (!Value.getMemoizationData()) { ... }. If we do such a thing, then we'd probably want to amortize it so that the number of unprocessed entries doesn't grow unboundedly. We could keep the number of unprocessed elements within (say) 25% of the size of the processed elements.

Thoughts on this? I'm inclined to give it a try to see if it's worth the code complexity.

It sounds worth a try, and perhaps could help readability. I think a 'remove dupes' step (aka, std::unique/erase) is much more readable than what is being attempted here.

As far as the unprocessed elements, it would be interesting to do some sort of benchmarking to see if we choose a percent. What MIGHT be useful instead is in push_back if capacity==size, do the 'remove dupes' step. That would keep us necessarily bounded to the handful of powers-of-2, but would also catch us before a particularly expensive step.

@higher-performance
Copy link
Contributor Author

Okay thanks, I'll give it a shot. It might take me a bit to get to it. If in the meantime anyone needs this PR urgently let me know and we can probably merge this and do that in a follow-up.

@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch 2 times, most recently from 0f2ccf1 to a14050e Compare March 13, 2025 01:41
Copy link

github-actions bot commented Mar 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch from a14050e to 6966e56 Compare March 13, 2025 01:47
@higher-performance
Copy link
Contributor Author

So, this is going to sound very silly, but it seems I may have missed the obvious, and the solution may have been extremely simple: llvm::SmallPtrSet, on the return value of clang::DynTypedNode::getMemoizationData().

The reason is that the original code only ever inserts into the vector whenever getMemoizationData() returned non-NULL:

bool Found = ParentStack.back().getMemoizationData() &&
llvm::is_contained(*Vector, ParentStack.back());
if (!Found)
Vector->push_back(ParentStack.back());

Specifically, llvm::is_contained(*Vector, node) (and thus Vector->contains(node)) is only ever called when node.getMemoizationData() returns non-NULL. There was no containment check in the case where memoization data was absent, so we can just use pointer identity on getMemoizationData().

This drastically simplifies everything - please take another look and let me know if you see any issues!

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, I just have one question

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the vector changed back to a SmallVector. This is a much cleaner change, thank you for spotting it!

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... this ends up just being a set-vector with a different identity being stored for the set. So this ends up being ~4 ptr-sizes (IIRC you said that DynTypedNode is 3x the size of a pointer) per-record (though the extra 1 is only for those with memoization data).

Did you run the space benchmark here from the original bug? I definitely believe that this is an improvement (changes from 6x-ptr/element to < 4x-ptr/element), and perhaps the deduplication makes up for the extra data (since it only has to remove ONE duplicate to make up for 3 elements).

AS FAR AS the SmallVector/SmallDenseSet vs std::vector/std::set, the decision here is interesting. SmallVector has a pretty massive stack size in exchange for saving on an allocation (it isn't SSO-like, it is more, "size-of-vector + N, so that our allocation is on-stack unless size > N). So it kinda depends on whether we think MOST uses of this are going to be '1' element, vs '0' or '1+' (that is, for the small-vector suggested by @cor3ntin of size 1). If it is almost always going to be 1, I agree with him. If it is almost always going to be 0 or more than 1, std::vector is probably correct (as in the 0 case, std::vector is smaller, and in the 1+ case, the allocation is happening anyway, and we just have dead stack space again).

@higher-performance higher-performance force-pushed the higher-performance-parent-map-perf-fix branch from 6966e56 to d3923bd Compare March 13, 2025 16:59
@higher-performance
Copy link
Contributor Author

Interesting... this ends up just being a set-vector with a different identity being stored for the set. So this ends up being ~4 ptr-sizes (IIRC you said that DynTypedNode is 3x the size of a pointer) per-record (though the extra 1 is only for those with memoization data).

Originally 5x (40 bytes vs. 8 bytes), not 3x. But it might be even higher now, since that was using SmallDenseSet and this one is using SmallPtrSet -- I'm not sure how each one represents a slot in the table, but I imagine the pointer one is better optimized for pointers.

Did you run the space benchmark here from the original bug?

Yup. With the SmallVector<1> I just changed it back to, it's somewhere between 0.6× to 1.3×, depending on the number of elements. It was something like 0.9× to 1.1× with std::vector.

AS FAR AS the SmallVector/SmallDenseSet vs std::vector/std::set, the decision here is interesting. SmallVector has a pretty massive stack size in exchange for saving on an allocation (it isn't SSO-like, it is more, "size-of-vector + N, so that our allocation is on-stack unless size > N). So it kinda depends on whether we think MOST uses of this are going to be '1' element, vs '0' or '1+' (that is, for the small-vector suggested by @cor3ntin of size 1). If it is almost always going to be 1, I agree with him. If it is almost always going to be 0 or more than 1, std::vector is probably correct (as in the 0 case, std::vector is smaller, and in the 1+ case, the allocation is happening anyway, and we just have dead stack space again).

Yeah I don't know what to expect from user code here, but it seems the difference is small either way -- I'd say let's just go with SmallVector<1> now.

@higher-performance higher-performance merged commit 8c7f0ea into llvm:main Mar 13, 2025
9 of 10 checks passed
@higher-performance higher-performance deleted the higher-performance-parent-map-perf-fix branch March 13, 2025 20:02
@higher-performance
Copy link
Contributor Author

Just merged this, thanks everyone for the reviews! If I hadn't gotten the pushback earlier, I may have never spotted the simplification and optimization.

@higher-performance
Copy link
Contributor Author

/cherry-pick 8c7f0ea

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

/cherry-pick 8c7f0ea

Error: Command failed due to missing milestone.

@higher-performance
Copy link
Contributor Author

/cherry-pick 8c7f0ea

@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

/pull-request #131209

@llvmbot llvmbot moved this from Needs Triage to Done in LLVM Release Status Mar 13, 2025
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Mar 17, 2025
…f nodes have been seen (llvm#129934)

This mitigates a regression introduced in llvm#87824.

The mitigation here is to store pointers the deduplicated AST nodes, rather than copies of the nodes themselves. This allows a pointer-optimized set to be used and saves a lot of memory because `clang::DynTypedNode` is ~5 times larger than a pointer.

Fixes llvm#129808.

(cherry picked from commit 8c7f0ea)
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
…f nodes have been seen (llvm#129934)

This mitigates a regression introduced in llvm#87824.

The mitigation here is to store pointers the deduplicated AST nodes, rather than copies of the nodes themselves. This allows a pointer-optimized set to be used and saves a lot of memory because `clang::DynTypedNode` is ~5 times larger than a pointer.

Fixes llvm#129808.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

Increased memory consumption in ParentMapContext after clang-19
5 participants