-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Remove WeakMap entries whose key is only reachable through the entry value #10932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ping @dstogov @iluuu1994 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks strange to me, but it seems work.
I can't analyse all the possible consequences.
I think we may accept this and revert in case we get any related problems.
Thank you for looking at this. I wasn't able to find papers or implementations of cycle GCs that solved this problem, so I've had to come up with this. It's possibly a bit strange, but maybe I can clarify it or change something? |
The weird thing, that while mark-and-sweep GC stops tracing weak references our backup GC starts do this. I think I understood your changes to original backup GC, because I knew the original algorithm well, and because I have to analyse only the added changes. Once this is committed, it's going to be hard to reconstruct the ideas behind this patch. It would be great to make a formal definition of the algorithm in some README or wiki.php.net page. I tried to cheat the patch, but without success :) |
I'll look at this in detail today, which might take a while because I don't have any knowledge on the GC. However, note that this is a behavioral change because elements only referenced from |
Agreed. I will write something after this is merged. In the meantime, here is a draft, as well as an attempt at a better explanation of what I did in this PR. I will start by reminding us how the cycle GC works. But before that, I think that some context helps to understand it:
The cycle collector algorithmThe cycle collector algorithm is based on the synchronous variant described in "Concurrent Cycle Collection in Reference Counted Systems (David F. Bacon and V.T. Rajan)". The paper makes these two fundamental observations:
Together, these observations imply that we do not need to scan all nodes in order to find garbage cycles. We can discover a garbage cycle by only scanning the nodes reachable from any of its member. Furthermore, not all nodes are a possible member of a garbage cycle: Only the ones whose refcount was decremented to a non-zero value may be a member of a garbage cycle. The discovery is done by trial deletion: If we subtract the reference counts contributed by internal references, the remaining nodes with a non-zero reference count are the only ones with an external reference. If we revert this process recursively from these nodes, the remaining nodes with a zero reference count are garbage. Bellow is the pseudo code, with explanations. Nodes are any refcounted value (objects, arrays, strings, ...). Colors have the following meaning:
Decrement() and Increment()These routines are used to decrement and increment the reference count of nodes during program execution. We know that a node whose reference count was decremented to a non-zero value is possibly a member of a garbage cycle. Such nodes are added to
This is implemented as CollectCycles()
The pseudo code and explanations of subroutines are given bellow. Note: I'm omitting destructors handling for brevity.
This is implemented in MarkRoots()
Before After With R the set of nodes reachable from
At this point, reference counts represent references external to R. Nodes with a non-zero reference count are referenced by a node outside of R.
This is implemented in ScanRoots()ScanRoots reverts the process of MarkRoots() for externally reachable nodes. This is done by scanning the graph starting at After
This is implemented in CollectRoots()
After
This is implemented in FreeGarbageIn
Example runBellow is an example run with the following initial state: After MarkRoots(): We can see that node After ScanRoots(): The refcount of After CollectWhite(): All reference counts have been restored. Not shown on the visualisation is that During FreeGarbage, A is freed first, which frees and removes B and C from Buffer. WeakMapsQuoting the WeakMaps RFC:
The implementation works as follows:
This achieves the goal of a WeakMap: We can map objects to arbitrary values without preventing the objects from being collected, and if an object is collected, the corresponding value is removed from the map. However, if an object used as key is referenced from the value, it will never be automatically collected because its reference count will never reach zero. This PR tries to improve that. Consider a WeakMap initialized like this: <?php
$m = new WeakMap();
$k = new stdClass;
$v = new stdClass;
$v->k = $k;
$m[$k] = $v;
unset($k, $v); If we represent the graph of references of this program as seen by the cycle collector after unset(), we can see that
To allow
Both If we recognize this, we can see that However, while deleting either of Based on that, we can adjust the algorithm as follows:
Bellow is the adjusted pseudo code, with explanations. The set of weakmaps in which a node is used as key is denoted MarkRoots()As before, MarkRoots() cancels the contribution of internal references to refcounts. We follow the reference from a weakmap to its values (as before) as well as the virtual reference from a node to the values mapped to it through all weakmaps. We decrement the refcount only once if we follow both references.
Implementation details:
ScanRoots()As before, In the subroutine We achieve this with the help of the
If either of the
Implementation details: The implementation is non-recusive, so we can not just call Instead, we could add We can reduce the number of nodes added to Here is the reasoning: Given any weakmap M and key K:
Note: We could decide when to increment the refcount of V by looking at the color or M and K only (without using the flags), but not in the non-recursive implementation:
CollectRootS()As before,
Conveniently, after
ComplexityThe algorithm has a complexity of Prior workI was not able to find prior work solving this problem with reference counting GCs. "Ephemerons: a new finalization mechanism" (Barry Hayes) solves it with a tracing GC. Ephemerons are an abstraction for a WeakMap entry. "Eliminating Cycles in Weak Tables (Alexandra Barros, Roberto Ierusalimschy)" builds upon this in the context of Lua, with an other flavor of a tracing GC. In the ephemerons paper, the problem is solved by adding an additional tracing phase: When an ephemeron is encountered during first phase of a trace, the collector does not immediately trace either of the key or the value, but puts it on a queue. In a second phase, the value of queued ephemerons whose key was reached are scanned. This may reach the key of other ephemerons, so the queued ephemerons are scanned again until the queue contains only ephemerons whose key has not been reached. Whilst it seems odd at first to scan WeakMap values, it's actually necessary in all tracing flavors: A full tracing GC needs to scan values whose key is reachable, and a cycle collector needs to scan values whose key may be unreachable. Edit: My original approach was flawed, as it was dependent on the traversing order. This flaw has been fixed and this comment reflects the current state of the PR. |
Values from the keys, or keys from the values? |
Key from the value, thank you. I'll update the comment |
The normal code path will fetch `obj->handlers->get_gc`, so it's much cheaper to identify weakmaps by looking at `obj->handlers->get_gc`.
* up/master: (571 commits) Expose time spent collecting cycles in gc_status() (php#11523) Warn when fpm socket was not registered on the expected path Implement DOMElement::id Fix ? Implement DOMParentNode::replaceChildren() Implement DOMElement::className RFC: Deprecate remains of string evaluated code assertions (php#11671) Prevent decimal int precision loss in number_format() Implement DOMNode::getRootNode() Implement DOMElement::getAttributeNames() Refactor dom_node_node_name_read() to avoid double allocation Handle fragments consisting out of multiple children without a single root correctly Avoid allocations in DOMElement::getAttribute() Avoid string allocation in dom_get_dom1_attribute() for as long as possible Fix use-of-uninitialized-value when calling php_posix_stream_get_fd (php#11694) Reorder list construction in the function php_intpow10 (php#11683) proc_open: Use posix_spawn(3) interface on systems where it is profitable zend_gdb disable gdb detection for FreeBSD < 11. Fix iface const visibility variance check Fix missing iface class const inheritance type check ...
* Actually skip non-cyclic objects in GC. * Add MAY_BE_CYCLIC to more internal classes * Fix dynamic property creation trigger And also breaking phpGH-10932 for now. :) I will try to fix this later.
Quoting the WeakMaps RFC:
Unfortunately, as described in #10043, keys that are referenced by a WeakMap value will never be automatically removed. This causes memory leaks, and can be difficult to avoid in a sufficiently complex application. Since the goal of WeakMaps is to avoid memory leaks and manual memory management, it would be beneficial to improve this.
This PR changes the behavior of WeakMaps such that if a key is only reachable through the corresponding value in a WeakMap, it's considered unreachable and the entry can be removed.
Consider the following example:
After the last statement,
$key
is only reachable through a WeakMap value, but$map[$key]
is not automatically removed.This PR updates the cycle collector to detect this, so that
$key
can be removed in this case. The exact changes are documented in a comment bellow: #10932 (comment).Caveats and breaking changes
Predictability
Unreachable WeakMap entries are removed at an unpredictable time. This affects entries whose key is part of a garbage cycle (this is existing behavior), as well as entries concerned by this PR (this is new behavior, since these was not removed at all before).
Consider the following example:
$map[$key]
is never removed automatically because the reference count of $key is non-zero, and the cycle is not detected by the GC$map[$key]
is removed during the next GC run (at an unpredictable time)Note that this behavior is not entirely new. In the example bellow, the behavior is the same before and after this PR:
$map[$key]
is removed at an unpredictable time, during the next GC run:Entries whose key is not part of a cycle are unaffected. In the example bellow,
$map[$key]
is removed immediately during the execution of theunset($key)
:Iteration
All keys are in fact reachable by iterating a WeakMap. This PR considers that this reachability is weak, so that it does not retain keys that are only reachable through iteration.
Applications that iterate over WeakMap may break due to the changes in this PR, since keys that are only reachable like this may be removed from the map.
Removal during iteration
Currently, a WeakMap entry may be removed during iteration if one of the keys is collected. Due to existing iteration behaviors, this may cause some entries to be skipped:
This is not new behavior, but this may now also happen when collecting a key that is only reachable through a WeakMap value:
Previous version of this PR
The original approach used in this PR had flaws, which have now be fixed. This description represents the current state of the PR.