Skip to content

[soon-to-be-rfc?] Implement GH-10526: array_unique: add way to compare items with an identity check (===) #10567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

Implements GH-10526 based on the idea of @KapitanOczywisty.

Question I'm guessing this probably needs an RFC even though the issue was not tagged as "requires RFC"?

Approach
This adds a SORT_STRICT flag to use in sorting and in array_unique, although it is most useful in the latter case.

If the types are equal we will use the identity check first before using zend_compare to avoid possibly inconsistent results (e.g. null and false are equal according to zend_compare). If the types are equal, but the values are not identical we can rely on zend_compare to perform the comparison without introducing inconsistencies.

If the types are not equal, we will sort in a way that groups the values of the same type together, and within those groups the values are sorted. We determine the order of the types based on the alphabetical order (as exposed to userland!), which is consistent with the inequality comparisons of zend_compare
(e.g. false < object < true etc.)
Note that double comes before long because the name exposed to userland is float which comes alphabetically before long.

… identity check (===)

Implements phpGH-10526.

This adds a SORT_STRICT flag to use in sorting and in array_unique,
although it is most useful in the latter case.

If the types are equal we will use the identity check first before using
zend_compare to avoid possibly inconsistent results (e.g. null and false
are equal according to zend_compare). If the types are equal, but the
values are not identical we can rely on zend_compare to perform the
comparison without introducing inconsistencies.

If the types are not equal, we will sort in a way that groups the
values of the same type together, and within those groups the values
are sorted. We determine the order of the types based on the
alphabetical order (as exposed to userland!), which is consistent with
the inequality comparisons of `zend_compare`
(e.g. false < object < true etc.)
Note that double comes before long because the name exposed to userland
is float which comes alphabetically before long.
@nielsdos nielsdos changed the title Implement GH-10526: array_unique: add way to compare items with an identity check (===) [soon-to-be-rfc] Implement GH-10526: array_unique: add way to compare items with an identity check (===) Feb 11, 2023
@nielsdos nielsdos changed the title [soon-to-be-rfc] Implement GH-10526: array_unique: add way to compare items with an identity check (===) [soon-to-be-rfc?] Implement GH-10526: array_unique: add way to compare items with an identity check (===) Feb 11, 2023
@nielsdos
Copy link
Member Author

nielsdos commented Feb 11, 2023

Well... I just found out about @iluuu1994 RFC of which I wasn't aware of... At least the positive thing is that I didn't spend too much time on this fortunately 😅

@iluuu1994
Copy link
Member

@nielsdos Different approaches were suggested (array_unique, separate methods with or without namespaces, a Set class, a new iterator API). Crippled by indecision, my RFC is standing still. Not sure what the most reasonable approach is.

uint32_t type2 = Z_TYPE(s->val);

if (type1 == type2) {
return zend_compare(&f->val, &s->val);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think zvals should be dereferenced before comparison, otherwise we will end up with different types being compared against each other - which we wanted to avoid.

php-src/Zend/zend_operators.c

Lines 2244 to 2250 in 4ec3664

if (Z_ISREF_P(op1)) {
op1 = Z_REFVAL_P(op1);
continue;
} else if (Z_ISREF_P(op2)) {
op2 = Z_REFVAL_P(op2);
continue;
}

Copy link
Member Author

@nielsdos nielsdos Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I forgot the fact that &int(x)===int(x)

Comment on lines +356 to +369
static const signed char type_to_order_mapping[] = {
[IS_ARRAY] = 0,
[IS_FALSE] = 1,
[IS_DOUBLE] = 2,
[IS_LONG] = 3,
[IS_NULL] = 4,
[IS_OBJECT] = 5,
[IS_REFERENCE] = 6,
[IS_RESOURCE] = 7,
[IS_STRING] = 8,
[IS_TRUE] = 9,
};

return ZEND_NORMALIZE_BOOL(type_to_order_mapping[type1] - type_to_order_mapping[type2]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We determine the order of the types based on the alphabetical order (as exposed to userland!)

Creating order mapping is kind of wasteful, and ordering by type id looks more logical in my opinion (less complex < more complex):

php-src/Zend/zend_types.h

Lines 549 to 558 in 4ec3664

#define IS_UNDEF 0
#define IS_NULL 1
#define IS_FALSE 2
#define IS_TRUE 3
#define IS_LONG 4
#define IS_DOUBLE 5
#define IS_STRING 6
#define IS_ARRAY 7
#define IS_OBJECT 8
#define IS_RESOURCE 9

Final order doesn't matter anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this one tbh. The order at least has to be consistent with the inequality ordering of zend_compare (e.g. false<object<true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is absolutely no reason to be consistent with zend_compare, that function does non-strict comparisons, which is the opposite of what PR is trying to achieve.

Also I just realized that strings are compared using zendi_smart_strcmp which tries to parse numerical strings, so given values "1e2" and "100", the new flag will remove one of them. Another problematic case would be an array: ["1e2"], ["100"]. I don't think zend_compare should be used after all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is absolutely no reason to be consistent with zend_compare, that function does non-strict comparisons, which is the opposite of what PR is trying to achieve.

Yes, but that's not a good reason imo to introduce more inconsistency. The order does not matter like you said, so why not be consistent with zend_compare anyway?

Also I just realized that strings are compared using zendi_smart_strcmp which tries to parse numerical strings, so given values "1e2" and "100", the new flag will remove one of them. Another problematic case would be an array: ["1e2"], ["100"]. I don't think zend_compare should be used after all.

Ugh that's annoying. Fortunately we can just special-case strings.

@iluuu1994
Copy link
Member

iluuu1994 commented Feb 12, 2023

@nielsdos @KapitanOczywisty There's #9804, which achieves something similar through comparison, however c9e6b9b was later implemented using a hash map, which should have better performance. Given that there doesn't seem to be a clear way forward with https://wiki.php.net/rfc/list_assoc_unique I think going back to adding a flag to array_unique might have the least resistance.

@nielsdos
Copy link
Member Author

Yeah I think Ilija's approach is probably best, but with the change for a flag unique to array_unique as he said. It also doesn't have the arbitrary sorting that this PR has which is nice.

@iluuu1994
Copy link
Member

iluuu1994 commented Feb 12, 2023

The hash map approach is fastest, but also quite over-engineered for this use-case. However, we'd like to use it later on for ADTs (to guarantee uniqueness across associated values) which is why it was implemented this way.

@nielsdos
Copy link
Member Author

Makes sense to me. I'll close this PR since your RFC is the best way to go forward.

@nielsdos nielsdos closed this Feb 12, 2023
@iluuu1994
Copy link
Member

@nielsdos If you want to take over that RFC, feel free to do so. I'm busy with other stuff atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants