Raise ValueError When Attempting to Rank Object Dtypes

Referencing the comments in #19481, right now rank operations performed against objects have a few issues, namely that they:

- Are inherently ambiguous, relying on lexical encoding AND
- Are not consistent across Series, DataFrame and GroupBy objects with various arguments

To illustrate the latter:

```python
In [1]: vals = ['apple', 'orange', 'banana']
In [2]  pd.Series(vals).rank()  # this will "work"
Out[8]: 
0    1.0
1    3.0
2    2.0
dtype: float64

In [3]: pd.Series(vals).rank(method='first')  # raises
ValueError: first not supported for non-numeric data

In [4]: pd.DataFrame({'key': ['foo'] * 3, 'vals': vals}).groupby('key').rank(method='first')  # should raise?
Out[4]: 
Empty DataFrame
Columns: []
Index: []
```

(see also #19482)

With this change I'd propose that we simply raise ValueError consistently for rank against ``object`` dtypes regardless of which type of object performs the transformation and regardless of arguments. 

One known caveat is that ``Categorical`` types currently use the ``rank_object`` methods in ``algos``. My assumption is that we would want to continue supporting ranking for ordered Categoricals but raise for unordered Categoricals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise ValueError When Attempting to Rank Object Dtypes #19560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Raise ValueError When Attempting to Rank Object Dtypes #19560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions