Skip to content

ENH: Add numeric_only to groupby frame ops #46524

Closed
@rhshadrach

Description

@rhshadrach

Once #46072 is implemented, many groupby ops will be defaulting to numeric_only=False in 2.0. However there are a number of group ops which can only ever work on numeric data. For API consistency, I believe a user trying to operate on non-numeric columns with these ops should raise. Consider the example

df = pd.DataFrame({'a': [1, 1], 'b': [3, 4], 'c': [5, 6]})
df['c'] = df['c'].astype(object)
gb = df.groupby('a')
print(gb.mean())

which gives the output

     b
a     
1  3.5

If a user has a numeric column that accidentally ends up as object dtype, the result will be silently missing expected columns. This is why I think we should run the op with all provided data, regardless if it is numeric or not.

The following groupby ops have no numeric_only argument and act like numeric_only=True, but only make sense on numeric data.

The following groupby ops have no numeric_only argument and act like numeric_only=True, but make sense on non-numeric data.

For both groups of ops, I propose we add the numeric_only argument defaulting to True in 1.5, which emits a warning message that it will default to False in the future. The warning would only be emitted if setting numeric_only to True/False would give rise to different output; i.e. if there are non-numeric columns that could have been operated on.

It's not ideal to add an argument and deprecate the default value in the same minor release (assuming 1.5 is the last minor release in the 1.x series), however I believe it will be of minor impact to users. The alternatives would be not carrying out the deprecation of numeric_only=True or to leave these ops behaving as if numeric_only=True (with no numeric_only argument). Both of these seem like worse alternatives to me.

cc @jreback @jbrockmendel @jorisvandenbossche @simonjayhawkins @Dr-Irv

Metadata

Metadata

Assignees

Labels

DeprecateFunctionality to remove in pandasEnhancementGroupbyNeeds DiscussionRequires discussion from core team before further action

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions