Skip to content

DEPR: values_for_argsort, values_for_factorize, from_factorized #53501

Open
@jbrockmendel

Description

@jbrockmendel

History: when originally designing EAs there was a hope/thought that many methods could be implemented in terms of a small number of core methods, of which values_for_factorize (vff) and values_for_argsort (vfa) were two of the main ones. Over time we found that many of the places we used these other than factorize/argsort were causing problems and they got pruned.

At this point we are down to only a few internal uses of each. _from_factorized is used only in EA.factorize. vfa is used in EA.argsort, EA.rank, and nargminmax (which in turn is used in EA.argmin/argmax). vff is used in EA.factorize and merge._factorize_keys. #53475 will restore it as being used in hash_pandas_object.

We should deprecate these patterns entirely.

  1. The status quo regarding whether these are required/encouraged is confusing. The solution is to have less stuff.
  2. Because the default vff casts to object, any place we use it on a EA that doesn't override it is slow.
    2b) In factorize_keys we special-case MaskedDtype and ArrowDtype to avoid this performance hit. That special-casing is a code smell.
  3. The merge._factorize_keys usage means authors cannot override a cast to numpy. This would be a huge pain point for potential GPU/distributed EAs.

Implementation-wise, a deprecation could look like:

  1. Deprecate EA.factorize saying in the future it will raise AbstractMethodError.
  2. Move nanargminmax to an EA method _nanargminmax.
  3. Deprecate EA._nanargminmax, EA.argsort, EA.rank saying in the future they will raise AbstractMethodError. Can suggest the vfa pattern for interested authors.
  4. Not sure exactly about merge._factorize_keys, will look into it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasExtensionArrayExtending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions