Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
From discussion in https://github.com/pandas-dev/pandas/pull/59330/files#r1695250192
The fact that pd.StringDtype(storage="pyarrow")
returns a int64[pyarrow]
is a mistake with our API. While this is well intentioned for performance reasons, it introduces an inconsistency with our extension types
If we continue to have pd.StringDtype.value_counts
return a pd.Int64Dtype()
we can leave it as an implementation detail whether or not that Int64Dtype
is backed by pyarrow or numpy. The user interface would not need to change at all; users could simply install (or not install) pyarrow and things should continue to work the same. This would long term also align better with PDEP-13 #58455
Feature Description
Make algorithms for the StringDtype that use pd.NA as the missing value sentinel consistently return other pandas extension types, and leave it as an implementation detail if those are backed by pyarrow or not
Alternative Solutions
status quo
Additional Context
No response