Description
Related to the discussion happening in #24674 (comment), further xref #24773, #22384
The question being posed here is what the return type of ExtensionArray.astype(..)
should be.
Currently, it can be both a numpy.ndarray or an ExtensionArray. And the astype
method in the base class is actually advertised as "Cast to a NumPy array with 'dtype'" (this is not fully correct, as currently casting eg a DatetimeArray to period dtype will give a PeriodArray EA, not a numpy array).
However, this gives some discussion about what eg DatetimeArray.astype("datetime64[ns]")
should return, as we actually have a DatetimeArray EA that supports that dtype.
You get similar inconsistencies / dubious cases in eg DatetimeArray.astype('int64')
which currently returns an int64 ndarray, while pd.array(DatetimeArray(), dtype='int64')
will give a PandasArray[int64].
So one idea would be to restrict ExtensionArray
to be a method to convert from one type of ExtensionArray to another type of ExtensionArray, i.e. the output type would be expected to always be an ExtensionArray.
(similarly, in the end, as how ndarray.astype
only gives other ndarrays, or pyarrow.Array.cast
only gives other pyarrow arrays; there are other (more explicit) methods to convert to a numpy array)
In practice this would mean returning a PandasArray for numpy dtypes instead of a numpy ndarray. Regarding storing the result of it in a Series/DataFrame should not change, because then we unpack such a PandasArray anyway.
cc @TomAugspurger @jreback @jbrockmendel @shoyer @h-vetinari