Description
This was noticed when working on #40288
Example:
df = DataFrame({"A": 3 * [object]})
print(df.rank())
print(df.A.rank())
The last two lines are an empty DataFrame and empty Series respectively.
The docstring for numeric_only says:
For DataFrame objects, rank only numeric columns if set to True.
The current behavior with numeric_only=None
(the default value) is:
Try with all columns. If a TypeError is raised, try with numeric_only=True.
When numeric_only is True and the Series/DataFrame contain no numeric columns, rank then operates on an empty object returning an empty result.
This is causing issues when rank is used in transform lists and dictionaries. Namely, we'd like to have partial-failure for TypeErrors, but rank is returning an empty result instead of raising a TypeError. This could be special-cased, but it seems to me that returning an empty object (at least when numeric_only is not True) is undesirable itself.
Some options I see are:
- Remove None as being an option and replace the default numeric_only=None with numeric_only=False.
- If numeric_only is None and the fallback of selecting only numeric columns results in an empty object, raise a TypeError.
- If selecting only numeric columns results in an empty object (even when numeric_only=True), raise a TypeError.