Closed
Description
xref #9236
There seems to be an inconsistency in some GroupBy methods when NaT
is included in the group key.
GroupBy.groups
includesNaT
as a key.GroupBy.ngroups
doesn't countNaT
.GroupBy.__iter__
doesn't returnNaT
group.GroupBy.get_group
fails whenNaT
is specified.
I understand NaT
should be included in the group key according to other function's behaviour, such as dropna
. Is it OK to fix it to include NaT
?
import pandas as pd
import numpy as np
>>> df = pd.DataFrame({'values': np.random.randn(8),
'dt': [np.nan, pd.Timestamp('2013-01-01'), np.nan, pd.Timestamp('2013-02-01'),
np.nan, pd.Timestamp('2013-02-01'), np.nan, pd.Timestamp('2013-01-01')]})
>>> grouped = df.groupby('dt')
>>> grouped.groups
{numpy.datetime64('NaT'): [0, 2, 4, 6], numpy.datetime64('2013-01-01T09:00:00.000000000+0900'): [1, 7], numpy.datetime64('2013-02-01T09:00:00.000000000+0900'): [3, 5]}
>>> grouped.ngroups
2
>>> grouped.indices
# ValueError: DatetimeIndex with NaT cannot be converted to object
>>> grouped.get_group(pd.NaT)
ValueError: DatetimeIndex with NaT cannot be converted to object