Skip to content

API: ExtensionArray.astype to only return extension arrays ? #24877

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Related to the discussion happening in #24674 (comment), further xref #24773, #22384

The question being posed here is what the return type of ExtensionArray.astype(..) should be.

Currently, it can be both a numpy.ndarray or an ExtensionArray. And the astype method in the base class is actually advertised as "Cast to a NumPy array with 'dtype'" (this is not fully correct, as currently casting eg a DatetimeArray to period dtype will give a PeriodArray EA, not a numpy array).

However, this gives some discussion about what eg DatetimeArray.astype("datetime64[ns]") should return, as we actually have a DatetimeArray EA that supports that dtype.
You get similar inconsistencies / dubious cases in eg DatetimeArray.astype('int64') which currently returns an int64 ndarray, while pd.array(DatetimeArray(), dtype='int64') will give a PandasArray[int64].

So one idea would be to restrict ExtensionArray to be a method to convert from one type of ExtensionArray to another type of ExtensionArray, i.e. the output type would be expected to always be an ExtensionArray.
(similarly, in the end, as how ndarray.astype only gives other ndarrays, or pyarrow.Array.cast only gives other pyarrow arrays; there are other (more explicit) methods to convert to a numpy array)

In practice this would mean returning a PandasArray for numpy dtypes instead of a numpy ndarray. Regarding storing the result of it in a Series/DataFrame should not change, because then we unpack such a PandasArray anyway.

cc @TomAugspurger @jreback @jbrockmendel @shoyer @h-vetinari

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignAstypeEnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further actionPDEP missing valuesIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions