Skip to content

BUG: Groupby NaT Handling #6992

Closed
Closed
@sinhrks

Description

@sinhrks

xref #9236

There seems to be an inconsistency in some GroupBy methods when NaT is included in the group key.

  • GroupBy.groups includes NaT as a key.
  • GroupBy.ngroups doesn't count NaT.
  • GroupBy.__iter__ doesn't return NaT group.
  • GroupBy.get_group fails when NaT is specified.

I understand NaT should be included in the group key according to other function's behaviour, such as dropna. Is it OK to fix it to include NaT?

import pandas as pd
import numpy as np
>>> df = pd.DataFrame({'values': np.random.randn(8), 
                   'dt': [np.nan, pd.Timestamp('2013-01-01'), np.nan, pd.Timestamp('2013-02-01'),
                          np.nan, pd.Timestamp('2013-02-01'), np.nan, pd.Timestamp('2013-01-01')]})
>>> grouped = df.groupby('dt')

>>> grouped.groups
{numpy.datetime64('NaT'): [0, 2, 4, 6], numpy.datetime64('2013-01-01T09:00:00.000000000+0900'): [1, 7], numpy.datetime64('2013-02-01T09:00:00.000000000+0900'): [3, 5]}

>>> grouped.ngroups
2

>>> grouped.indices
# ValueError: DatetimeIndex with NaT cannot be converted to object

>>> grouped.get_group(pd.NaT)
ValueError: DatetimeIndex with NaT cannot be converted to object

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions