Skip to content

API: take interface for (Extension)Array-likes #20640

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Triggered by #20582, I was looking at the take implementation in ExtensionArray and Categorical (which is already an ExtensionArray subclass) and in the rest of pandas:

  • ExtensionArray.take currently uses the "internal pandas"-like behaviour for take: -1 is an indicator for missing value (the behaviour we need for reindexing etc)
  • Series.take actually uses the numpy behaviour, where negative values (including -1) start counting from the end of the array-like.

To illustrate the difference with a small example:

In [9]: pd.Categorical(['a', 'b', 'c']).take([0, -1])
Out[9]: 
[a, NaN]
Categories (3, object): [a, b, c]

In [10]: pd.Series(['a', 'b', 'c']).take([0, -1])
Out[10]: 
0    a
1    c
dtype: object

This difference is a bit unfortunate IMO. If ExtensionArray.take is a public method (which it is right now), it would be nice if it has consistent behaviour with Series.take.
If we agree on that, I was thinking about following options:

  • make ExtensionArray.take private for now (eg require a _take method for the interface) and keep the "internal pandas"-like behaviour
  • make ExtensionArray.take default behaviour consistent with Series.take, but still have the allow_fill/fill_value arguments so that when they are specified it has the "internal pandas"-like behavour (so that internal code that expects this behaviour which already passes those keywords keeps working)

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions