Skip to content

BUG: groupby doesn't identify null values when sort=False #48506

Closed
@rhshadrach

Description

@rhshadrach

Closely related to #48476

As far as I can tell this only occurs when the input dtype to groupby is object.

df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=True, dropna=False)
print(gb.sum())

#      b
# a     
# NaN  6

but with sort=False:

df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=False, dropna=False)
print(gb.sum())

#       b
# a      
# NaN   1
# <NA>  2
# None  3

I think we should prefer the sort=True behavior as that is the default value for now, but prefer sort=False in the long run.

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions