Closed
Description
from #17594 (comment)
[15] is showing all of the observed values, while [16] is showing the cartesian values. [16] is consistent with the categorical method of grouping.
In [12]: from collections import OrderedDict
...: df = pd.DataFrame(OrderedDict([
...: ('food', pd.Categorical(['BURGER', 'CHIPS', 'SALAD', 'RICE'] * 2)),
...: ('day', ([0] * 4) + ([1] * 4)),
...: ('weight', np.arange(8)),
...: ]))
...:
...: df2 = df[df['food'].isin({'BURGER', 'CHIPS'})]
...:
In [13]: df2
Out[13]:
food day weight
0 BURGER 0 0
1 CHIPS 0 1
4 BURGER 1 4
5 CHIPS 1 5
In [14]: df2.dtypes
Out[14]:
food category
day int64
weight int64
dtype: object
In [15]: df2.groupby(['day', 'food'])['weight'].mean()
Out[15]:
day food
0 BURGER 0
CHIPS 1
1 BURGER 4
CHIPS 5
Name: weight, dtype: int64
In [16]: df2.groupby(['day', 'food']).mean()
Out[16]:
weight
day food
0 BURGER 0.0
CHIPS 1.0
RICE NaN
SALAD NaN
1 BURGER 4.0
CHIPS 5.0
RICE NaN
SALAD NaN