Closed
Description
Pandas version: 0.22
Problem
I try to use the numpy.size() to count the group size for the groups from pandas Dataframe groupby(), and I get strange result.
Can any guys explain how the aggregation of groupby works?
>>> df=pd.DataFrame({'A':[1,1,2,2], 'B':[1,2,3,4],'C':[0.11,0.32,0.93,0.65],'D':["This","That","How","What"]})
>>> df
A B C D
0 1 1 0.11 This
1 1 2 0.32 That
2 2 3 0.93 How
3 2 4 0.65 What
>>> df.groupby('A',as_index=False).agg(np.size)
A B C D
0 1 2 2.0 2
1 2 2 2.0 2
>>> df.groupby('A',as_index=False)['C'].agg(np.size)
A C
0 1 8
1 2 8
>>> df.groupby('A',as_index=False)[['C']].agg(np.size)
A C
0 1 2.0
1 2 2.0
>>> grouped=df.groupby('A', as_index=False)
>>> grouped['C','D'].agg(np.size)
A C D
0 1 2.0 2
1 2 2.0 2
In the code, if we use groupby() following ['C'], the group size is 8, equal to the correct group size * field number, that is 2 * 4; if we use groupby() following column [['C']] or ['C','D'], the group size is right. If setting as_index to True, I can get a Series with correct value. But I want the groupby field 'A' included also, so as_index is set to False,
Expected Output
My expected output is that I can still get a DataFrame which should include column 'A' and 'C', and value of column 'C' is the group size, should be 2 in this code example.