Skip to content

Handle ExtensionArrays in cut #31389

Open
@TomAugspurger

Description

@TomAugspurger

Followup to #31290. Currently pd.cut doesn't play nicely with all extension arrays. To support them, I think we'll need one addition to the interface.

We need an array of integers to pass to searchsorted in

ids = ensure_int64(bins.searchsorted(x, side=side))
. I think the only requirement is that the integer-encoded values need to have the same ordering as the original values. (I forget the math term for this type of mapping).

It doesn't matter what value is used for missing values, as long as it's distinct.

We can't quite use factorize(arr)[0] since it doesn't have the ordering requirement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions