Skip to content

Implement ARRAY_UNIQUE_UNSORTED option #9788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

iluuu1994
Copy link
Member

In response to GH-9775. This has a complexity of O(n^2) and thus is not meant to replace the other options. This is mostly targeted at smaller arrays that contain uncomparable (but unequatable) elements like enums. I'm not sure if this requires an RFC, but I'll at least send an e-mail to the mailing list.

@jhdxr
Copy link
Member

jhdxr commented Oct 20, 2022

From userland perspective, using sorting for optimization or not is just a implementation detail in php interpreter which I don't have to care. So adding an option for that doesn't sound good to me.

IMHO array_unique should be able to fallback to the O(n^2) solution automatically if incomparable elements are detected.

Copy link
Contributor

@TysonAndre TysonAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #7806 (closed due to inactivity)

This is functionality that I'd wanted

My personal preference is ARRAY_UNIQUE_IDENTICAL for a flag specific to array_unique, instead of ARRAY_UNIQUE_UNSORTED

I'd implemented something similar in the https://pecl.php.net/teds pecl - https://github.com/TysonAndre/pecl-teds/blob/1.2.6/teds.c#L460-L499 as Teds\unique_values from https://github.com/TysonAndre/pecl-teds/#iterable-functions (Also see StrictHashMap/StrictHashSet) (returning a list rather than preserving keys)

  • That pecl would be slightly different from zend_is_identical* for the handling of NAN, but it should be possible to get something less than quadratic for normal use cases by copying code from that hash table implementation but using zend_is_identical instead of something that set NAN === NAN on hash collisions (including in arrays)
  • It's probably only worthwhile to set up a hash table for large array sizes (8? 16? I haven't benchmarked it)
  • For documentation purposes, the handling of NAN and https://en.wikipedia.org/wiki/Signed_zero would be useful to specify (Same as PHP's === the way you've implemented in your PR would be my preference, I don't expect reviewers to want to introduce yet another definition of equality)

I'd been postponing proposing that since I was waiting to see what the specifics of the proposal for algebraic types following enums would be (if one was created as followup to php 8.1 enums) and how they'd handle spl_object_id/SplObjectStorage/WeakMap

I wouldn't recommend it here (sort order of types is arbitrary, but could be useful internally), but for reference : https://github.com/TysonAndre/pecl-teds/tree/1.2.6#stable-comparison also exists

@iluuu1994
Copy link
Member Author

iluuu1994 commented Oct 21, 2022

@TysonAndre Thank you for your comments!

Related to #7806 (closed due to inactivity)

I wasn't aware of this PR. The approach is interesting. I think it could work well if we improved it in the ways you suggested:

  • Limit the comparison to array_unique
  • Make the behavior strictly the same as ===

I would prefer that over adding a StrictHashSet due to its size and complexity. I think a map/set type that allows for arbitrary keys/values has its own merits. But that should probably be discussed separately.

Same as PHP's === the way you've implemented in your PR would be my preference, I don't expect reviewers to want to introduce yet another definition of equality

Agreed.

I'd been postponing proposing that since I was waiting to see what the specifics of the proposal for algebraic types following enums would be

IMO, == and === for ADTs with equivalent values should evaluate to true. I see two approaches:

  • Centralize ADT instances and return existing values on creation
  • Add a special case to zend_is_identical

The former is likely more desirable if technically feasible. (edit: StrictHashMap would actually be perfect for that)

@TysonAndre
Copy link
Contributor

TysonAndre commented Oct 21, 2022

The former is likely more desirable if technically feasible. (edit: StrictHashMap would actually be perfect for that)

Correct, I meant the internal C hashing functionality of that (with the identity check replaced with zend_is_identical) should work

@iluuu1994
Copy link
Member Author

Closing this in favor of GH-9804.

@iluuu1994 iluuu1994 closed this Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants