-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: membership checks on ExtensionArray containing NA values #37867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
7ab6443
7986bc4
466f7cc
a9b75dd
9d0eca1
b0b32ab
c6e42d2
75c45bc
83c9fe4
08c4c98
5a23b1d
4b0c200
92604e9
8a24f0d
fdb9deb
52e2b43
f21890e
6f633c7
d8bdb2e
4e4dbc4
a1583e7
3c2c2b0
237fe45
37219c3
c4a6c36
245c99a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,7 @@ | |
is_array_like, | ||
is_dtype_equal, | ||
is_list_like, | ||
is_scalar, | ||
pandas_dtype, | ||
) | ||
from pandas.core.dtypes.dtypes import ExtensionDtype | ||
|
@@ -354,6 +355,19 @@ def __iter__(self): | |
for i in range(len(self)): | ||
yield self[i] | ||
|
||
def __contains__(self, item) -> bool: | ||
""" | ||
Return for `item in self`. | ||
""" | ||
# comparisons of any item to pd.NA always return pd.NA, so e.g. "a" in [pd.NA] | ||
# would raise a TypeError. The implementation below works around that. | ||
if item is self.dtype.na_value: | ||
return self.isna().any() if self._can_hold_na else False | ||
elif is_scalar(item) and isna(item): | ||
return False | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could maybe be done in a separate PR, as it is FloatingArray specific (and the NaN behaviour is not yet fully fleshed out), but this check will do the wrong thing for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So a I think I would prefer that would be handled in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not multiple "nan types", to be nitpicky, since pd.NA is not a "nan". But yes, it can contains both pd.NA and np.nan at the moment. But that's not yet fully decided though, long discussion at #32265. Happy to hear your thoughts about it.
Yes, will put it on my list of follow to-do items for FloatingArray that I need to open. So don't worry about it here |
||
else: | ||
return (item == self).any() | ||
|
||
def __eq__(self, other: Any) -> ArrayLike: | ||
""" | ||
Return for `self == other` (element-wise equality). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,31 @@ def test_can_hold_na_valid(self, data): | |
# GH-20761 | ||
assert data._can_hold_na is True | ||
|
||
def test_contains(self, data, data_missing): | ||
# GH-37867 | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Tests for membership checks. Membership checks for nan-likes is tricky and | ||
# the settled on rule is: `nan_like in arr` is True if nan_like is | ||
# arr.dtype.na_value and arr.isna().any() is True. Else the check returns False. | ||
|
||
for this_data in [data, data_missing]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see my comments below on how to actually parameterize this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't its possible to parametrize fixtures? do have an example or hint on how that's done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. grep for pytest.getfixturevalue in tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't get it to work, I'll try it again tonight. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, but I would simply leave it as is, using |
||
scalar = this_data[~this_data.isna()][0] | ||
|
||
assert scalar in this_data | ||
assert "124jhujbhjhb5" not in data | ||
|
||
na_value = this_data.dtype.na_value | ||
|
||
if this_data.isna().any(): | ||
assert na_value in this_data | ||
else: | ||
assert na_value not in this_data | ||
|
||
# this_data can never contain other nan-likes than na_value | ||
for na_value_type in {None, np.nan, pd.NA, pd.NaT}: | ||
if na_value_type is na_value: | ||
continue | ||
assert na_value_type not in this_data | ||
|
||
def test_memory_usage(self, data): | ||
s = pd.Series(data) | ||
result = s.memory_usage(index=False) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -143,6 +143,11 @@ def test_custom_asserts(self): | |
with pytest.raises(AssertionError, match=msg): | ||
self.assert_frame_equal(a.to_frame(), b.to_frame()) | ||
|
||
@pytest.mark.xfail(reason="comparison method not implemented on JSONArray") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is than issue number for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue number is the first line in the method? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in the xfail pls There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok. |
||
def test_contains(self, data): | ||
# GH-37867 | ||
super().test_contains(data) | ||
|
||
|
||
class TestConstructors(BaseJSON, base.BaseConstructorsTests): | ||
@pytest.mark.skip(reason="not implemented constructor from dtype") | ||
|
Uh oh!
There was an error while loading. Please reload this page.