Skip to content

DataFrame.groupby should return _DataFrameGroupByScalar when grouping by a PeriodIndex #674

Closed
@Daverball

Description

@Daverball

Describe the bug
When performing a groupby on a dataframe with a PeriodIndex the group identifier should be a Period, i.e. a scalar (On that note Period appears to be missing from the definition of Scalar in _typing.pyi)

To Reproduce

import pandas as pd
df = pd.DataFrame({'date': pd.date_range('2020-01-01', '2020-12-31'), 'days': 1})
index = pd.PeriodIndex(df.date, freq='M')
for period, group in df.groupby(index):
    period.start_time  # mypy: attr-defined ­— Tuple[Any, ...] has no attribute "start_time"
  • OS: Linux (OpenSUSE Tumbleweed)
  • Python 3.10
  • mypy 1.0.1
  • pandas-stubs 2.0.1.230501

Additional comments
This should probably extend to most other specialized Index types apart from the base Index and MultiIndex, for the latter it's obvious that it would be non-scalar, for the former it would depend on what Python object is referenced, since converting MultiIndex to and from an Index on a builtins.tuple is quite common, it's probably the right move to treat Index as non-scalar by default or return an Any variant of DataFrameGroupBy that could either be scalar or non-scalar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Generic IndexIssues where a Generic Index would helpGroupbyPeriodPeriod data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions