-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Implement ARRAY_UNIQUE_UNSORTED option #9788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
From userland perspective, using sorting for optimization or not is just a implementation detail in php interpreter which I don't have to care. So adding an option for that doesn't sound good to me. IMHO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #7806 (closed due to inactivity)
This is functionality that I'd wanted
My personal preference is ARRAY_UNIQUE_IDENTICAL for a flag specific to array_unique, instead of ARRAY_UNIQUE_UNSORTED
I'd implemented something similar in the https://pecl.php.net/teds pecl - https://github.com/TysonAndre/pecl-teds/blob/1.2.6/teds.c#L460-L499 as Teds\unique_values
from https://github.com/TysonAndre/pecl-teds/#iterable-functions (Also see StrictHashMap/StrictHashSet) (returning a list rather than preserving keys)
- That pecl would be slightly different from
zend_is_identical*
for the handling ofNAN
, but it should be possible to get something less than quadratic for normal use cases by copying code from that hash table implementation but using zend_is_identical instead of something that setNAN === NAN
on hash collisions (including in arrays) - It's probably only worthwhile to set up a hash table for large array sizes (8? 16? I haven't benchmarked it)
- For documentation purposes, the handling of NAN and https://en.wikipedia.org/wiki/Signed_zero would be useful to specify (Same as PHP's
===
the way you've implemented in your PR would be my preference, I don't expect reviewers to want to introduce yet another definition of equality)
I'd been postponing proposing that since I was waiting to see what the specifics of the proposal for algebraic types following enums would be (if one was created as followup to php 8.1 enums) and how they'd handle spl_object_id/SplObjectStorage/WeakMap
- I'm referring to https://wiki.php.net/rfc/enumerations#future_scope for algebraic types
- Hopefully it'd be the same instance
I wouldn't recommend it here (sort order of types is arbitrary, but could be useful internally), but for reference : https://github.com/TysonAndre/pecl-teds/tree/1.2.6#stable-comparison also exists
@TysonAndre Thank you for your comments!
I wasn't aware of this PR. The approach is interesting. I think it could work well if we improved it in the ways you suggested:
I would prefer that over adding a
Agreed.
IMO,
The former is likely more desirable if technically feasible. (edit: |
Correct, I meant the internal C hashing functionality of that (with the identity check replaced with zend_is_identical) should work |
Closing this in favor of GH-9804. |
In response to GH-9775. This has a complexity of
O(n^2)
and thus is not meant to replace the other options. This is mostly targeted at smaller arrays that contain uncomparable (but unequatable) elements like enums. I'm not sure if this requires an RFC, but I'll at least send an e-mail to the mailing list.