Skip to content

SeriesGroupBy.first / last loses categorical dtype #33090

Closed
@dsaxton

Description

@dsaxton

On 1.0.3 and master:

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3]})
df["b"] = df["a"].astype("category")

print(df.groupby("a")["b"].first())
print(df.groupby("a")["b"].last())

gives

a
1    1
2    2
3    3
Name: b, dtype: int64
a
1    1
2    2
3    3
Name: b, dtype: int64

but the dtype should still be categorical and not int64. This seemingly wrong output is explicitly tested for here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/groupby/aggregate/test_aggregate.py#L461

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsGroupbyNeeds TestsUnit test(s) needed to prevent regressionsRegressionFunctionality that used to work in a prior pandas versiongood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions