-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use HT for recursion protection in JSON encode #7589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The instruction count increases by about 40% for a tight loop, from 350 to 493 instructions per element. GC_PROTECTED is fast and will be hard to beat, but 40% seems like a lot.
|
Yeah, the overhead here is non-trivial. Don't really see how to improve on it though. I think the two alternatives would be a) simply ignore any weird interactions by different recursion protections -- it rarely matters or b) switch to a two-level recursion protection: The first user can make use of the GC_PROTECTED flag, while a nested second user would fall back to an HT instead. This would ensure that there is both little performance impact for the average case, and that there is no interference between different recursers. This would be a more intrusive change that needs to cover other users of GC_PROTECTED as well though. |
Object handles could be used as an index into an array of bits -- you could realloc a global array as more handles appear. The relative memory overhead would only be 1/8/sizeof(zend_object) = 0.2%. Doesn't help with arrays, and there's that TODO in the code about removing zend_object.handle, but maybe it could solve the bug at least. |
I think as there's already json specific constant in Zend code ( |
@bukka That would require reserving an additional GC bit. I don't think that makes sense for a JSON-only use case. In any case, closing this one as the overhead is too large. |
if (GC_FLAGS(rc) & GC_IMMUTABLE) { | ||
return SUCCESS; | ||
} | ||
if (zend_hash_index_add_empty_element(&encoder->recursive, (uintptr_t) rc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a note if anyone bases functionality on this in the future or restores this:
As I'd discovered in #7690 - using pointers as hash indexes directly leads to a lot of hash collisions and performance issues. Shifting by ZEND_MM_ALIGNED_OFFSET_LOG2 instead helps noticeably (to work with both malloc and emalloc)
(e.g. if 44-byte zend_array instances (on 64-bit platforms) are aligned to 16 bytes in practice with emalloc on a platform (low bit of a pointer is the byte address), then they'll all collide on the same 1 in 16 hash buckets)
if (GC_FLAGS(rc) & GC_IMMUTABLE) { | ||
return; | ||
} | ||
GC_DELREF(rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the original code was increasing the reference count of the properties table, and decreasing the reference count of the properties table, to prevent the property table from getting freed during iteration.
Now that we're referencing the object, your PR is calling GC_ADDREF and GC_DELREF to prevent the object from getting freed during iteration.
If this PR were to be reopened or a if someone were to base code on this in the future, would it need to check if GC_DELREF returns 0 and free the array/object in question, to avoid leaks if jsonSerialize removed the last reference to a value as a side effect? i.e. call rc_dtor_func if unexpectedly 0
ZEND_API void ZEND_FASTCALL rc_dtor_func(zend_refcounted *p)
A brand new type for unordered sets of non-null (non-0) pointers (only supporting adding, removal, and membership checks) might have even better performance (only for internal use within json.h, not an exported api). https://github.com/igbinary/igbinary/blob/master/src/php7/hash_si_ptr.c along the lines of what is used there
|
@@ -26,11 +26,20 @@ struct _php_json_encoder { | |||
int depth; | |||
int max_depth; | |||
php_json_error_code error_code; | |||
HashTable recursive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately from the previous comment, there's the question of whether the json recursion protection should be per-request (in request globals) rather than per call to json_encoder (in request globals, in RINIT/RSHUTDOWN) instead.
My preference is for the former
E.g. a JsonSerializable::jsonSerialize implementation calling json_encode($this)
would trigger infinite recursion with this PR (but not before this PR) if each call to json_encode had a distinct recursive
hash table instance - since object property tables don't really change to different pointers in practice if it's necessary for json_encode to work.
Just for the record this was addressed by 53aa53f |
The jsonSerialize() method might access recursion-protected objects/arrays, in which case other operations like var_dump() may break. This also fixes https://bugs.php.net/bug.php?id=81524.
@tstarling Thoughts?