Skip to content

BUG: preserve dtype to bool[pyarrow] when calling pyarrow backed Series.isna() #59436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

KevsterAmp
Copy link
Contributor

Test output

============================================================================================================ test session starts =============================================================================================================
platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
PyQt5 5.15.9 -- Qt runtime 5.15.8 -- Qt compiled 5.15.8
rootdir: /Users/kev/self/pandas
configfile: pyproject.toml
plugins: localserver-0.0.0, qt-4.4.0, cov-5.0.0, anyio-4.4.0, hypothesis-6.108.7, cython-0.3.1, xdist-3.6.1
collected 2 items

pandas/tests/series/methods/test_isna.py ..

------------------------------------------------------------------------------------------ generated xml file: /Users/kev/self/pandas/test-data.xml ------------------------------------------------------------------------------------------
============================================================================================================ slowest 30 durations ============================================================================================================

(6 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================================================================================= 2 passed in 0.04s ==============================================================================================================

Reproducible Example

s = pd.Series([0, None, 4, 5], dtype="u1[pyarrow]")
print(f"s:\n{s}\n")
print(f"s.isna():\n{s.isna()}")

Before

s:
0       0
1    <NA>
2       4
3       5
dtype: uint8[pyarrow]

s.isna():
0    False
1     True
2    False
3    False
dtype: bool

After

s:
0       0
1    <NA>
2       4
3       5
dtype: uint8[pyarrow]

s.isna():
0    False
1     True
2    False
3    False
dtype: bool[pyarrow]

@KevsterAmp KevsterAmp changed the title BUG: preserve dtype to bool[pyarrow] when calling Series.isna() BUG: preserve dtype to bool[pyarrow] when calling pyarrow backed Series.isna() Aug 7, 2024
@@ -206,7 +207,18 @@ def _isna(obj):
elif isinstance(obj, ABCSeries):
result = _isna_array(obj._values)
# box
result = obj._constructor(result, index=obj.index, name=obj.name, copy=False)
if isinstance(obj.dtype, ArrowDtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would usually recommend to avoid any additional nested if statements, it makes it extra verbose and harder to read.
If you look at the signature of the constructor you can choose to pass dtype=None if you want to let pandas handle the type internally.

)
else:
result = obj._constructor(
result, index=obj.index, name=obj.name, copy=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result, index=obj.index, name=obj.name, copy=False
new_dtype = "bool[pyarrow]" if isinstance(obj.dtype, ArrowDtype) else None
result = obj._constructor(result, index=obj.index, name=obj.name, copy=False, dtype=new_dtype)

Or something of the sort would be more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to add a test case to make sure your changes are getting covered on a specific situation.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since pyarrow operations are implemented as an ExtensionArray, isna needs to return a numpy array (with it's bool dtype) by defintion https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray.isna.html#pandas.api.extensions.ExtensionArray.isna

We would need discussion whether to break this API for ArrowExtensionArray

@KevsterAmp
Copy link
Contributor Author

Will apply the changes based on the review within this week. Thanks

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result labels Aug 12, 2024
@KevsterAmp
Copy link
Contributor Author

Seems like this breaks some tests on pandas/tests/extension/test_arrow.py Any ideas for a fix with this? i think have to change the tests from asserting bool to bool[pyarrow] as well

@KevsterAmp KevsterAmp requested a review from mroeschke August 21, 2024 22:55
@mroeschke
Copy link
Member

Thanks for the PR, but given the issue #59431, this requires more discussion before opening a PR so closing

@mroeschke mroeschke closed this Sep 9, 2024
@KevsterAmp KevsterAmp deleted the return-bool-pyarrow-on-series-isna branch September 12, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: isna on pyarrow backed Series is returning Series with bool dtype instead of bool[pyarrow]
4 participants