Skip to content

Inconsistent behavior with None between "normal" pandas functions and groupby's last/first/nth #29645

Closed
@kchodorow

Description

@kchodorow

Code Sample, a copy-pastable example if possible

> s1 = pd.Series(['x', 'y', None], index=[0, 0, 0])
> s1.isna()
0    False
0    False
0    True 
dtype: bool

> s2 = pd.Series(['x', 'y', np.nan], index=[0, 0, 0])
> s2.isna()
0    False
0    False
0    True 
dtype: bool

# But groupby doesn't!
> s1.groupby(level=0).last()
0    None
dtype: object

> s2.groupby(level=0).last()
0    y
dtype: object
# Expectation: these should output the same thing.

Problem description

The rest of pandas seems to treat None as the same as np.nan, so it is inconsistent and head-scratching why groupby's functions wouldn't. The expected behavior would be for first/last/nth to treat None & np.nan like every other pandas function does.

Expected Output

> s1.groupby(level=0).last()
0    y
dtype: object

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
...

pandas: 0.24.1
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions