DEPR: values_for_argsort, values_for_factorize, from_factorized

History: when originally designing EAs there was a hope/thought that many methods could be implemented in terms of a small number of core methods, of which values_for_factorize (vff) and values_for_argsort (vfa) were two of the main ones.  Over time we found that many of the places we used these other than factorize/argsort were causing problems and they got pruned.

At this point we are down to only a few internal uses of each.  _from_factorized is used only in EA.factorize.  vfa is used in EA.argsort, EA.rank, and nargminmax (which in turn is used in EA.argmin/argmax).  vff is used in EA.factorize and merge._factorize_keys.  #53475 will restore it as being used in hash_pandas_object.

We should deprecate these patterns entirely.

1) The status quo regarding whether these are required/encouraged is confusing.  The solution is to _have less stuff_.
2) Because the default vff casts to object, any place we use it on a EA that doesn't override it is slow.
2b) In factorize_keys we special-case MaskedDtype and ArrowDtype to avoid this performance hit.  That special-casing is a code smell.
3) The merge._factorize_keys usage means authors cannot override a cast to numpy.  This would be a huge pain point for potential GPU/distributed EAs.

Implementation-wise, a deprecation could look like:

1) Deprecate EA.factorize saying in the future it will raise AbstractMethodError.
2) Move nanargminmax to an EA method _nanargminmax.
3) Deprecate EA._nanargminmax, EA.argsort, EA.rank saying in the future they will raise AbstractMethodError.  Can suggest the vfa pattern for interested authors.
4) Not sure exactly about merge._factorize_keys, will look into it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DEPR: values_for_argsort, values_for_factorize, from_factorized #53501

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DEPR: values_for_argsort, values_for_factorize, from_factorized #53501

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions