Skip to content

EA: revisit interface #32586

Closed
Closed
@jbrockmendel

Description

@jbrockmendel

This is as good a time as any to revisit the "experimental" EA interface.

My read of the Issues and recollection of threads suggests there are three main groups of topics:

Clarification of the Interface
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  1. _values_for_argsort and values_for_factorize
    • Do we need both? The docs both say they should be order-preserving.
    • Is it safe to return a view? (Categorical.values_for_argsort makes a copy for no obvious reason)
    • What else can they be used for internally? e.g. in CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467 _values_for_argsort is used for ExtensionIndex join_non_unique and join_monotonic
  2. What characteristics should _ndarray_values have? Is it needed? (DEPR: _ndarray_values vs to_numpy vs __array__ #32412) _ndarray_values has been removed
  3. What should _from_sequence accept?
    • Should it only be sequences that are unambiguously this dtype?
    • In particular, should DTA/TDA/PA not accept i8 values?
  4. What should fillna accept? (Should ExtensionBlock.fillna unbox Series / Index? #22954, EA: fillna should accept same type #32414)
    4.5) Require that __iter__ return native types? API: ExtensionArrays and conversion to "native" types (eg in tolist, to_dict, iteration, ..) #29738

Ndarray Compat
^^^^^^^^^^^^^^^^^
5) Headaches have been caused by trivial ndarray methods not being on EA
- #31199 size
- #32342 "T" (just the most recent; this has come up a lot)
- #24583 ravel
6) For arithmetic we're going to need something like either tile or broadcast_to

Methods Needed/Wanted For Index/Series/DataFrame/Block
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7) Suggested Methods (partial list)
- #27264 duplicated
- [x] #23437 _empty
- #28955 apply
- #23179 map
- #22680 hasnas
- [x] #27081 equals
- [x] #24144 _where
- [x] _putmask would be helpful for ExtensionIndex

I suggest we discuss these in order. Before jumping in, is there anything vital missing from this list? (this is only a small subset of the issues on the tracker)

cc @pandas-dev/pandas-core @xhochy

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignExtensionArrayExtending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions