Skip to content

API: groupby with categoricals inconsistent with Series vs DataFrame result #20416

Closed
@jreback

Description

@jreback

from #17594 (comment)

[15] is showing all of the observed values, while [16] is showing the cartesian values. [16] is consistent with the categorical method of grouping.

In [12]: from collections import OrderedDict
    ...: df = pd.DataFrame(OrderedDict([
    ...:     ('food',   pd.Categorical(['BURGER', 'CHIPS', 'SALAD', 'RICE'] * 2)),
    ...:     ('day',    ([0] * 4) + ([1] * 4)),
    ...:     ('weight', np.arange(8)),
    ...: ]))
    ...: 
    ...: df2 = df[df['food'].isin({'BURGER', 'CHIPS'})]
    ...: 

In [13]: df2
Out[13]: 
     food  day  weight
0  BURGER    0       0
1   CHIPS    0       1
4  BURGER    1       4
5   CHIPS    1       5

In [14]: df2.dtypes
Out[14]: 
food      category
day          int64
weight       int64
dtype: object

In [15]: df2.groupby(['day', 'food'])['weight'].mean()
Out[15]: 
day  food  
0    BURGER    0
     CHIPS     1
1    BURGER    4
     CHIPS     5
Name: weight, dtype: int64

In [16]: df2.groupby(['day', 'food']).mean()
Out[16]: 
            weight
day food          
0   BURGER     0.0
    CHIPS      1.0
    RICE       NaN
    SALAD      NaN
1   BURGER     4.0
    CHIPS      5.0
    RICE       NaN
    SALAD      NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions