Skip to content

Why do I get strange aggregation result from DataFrame groupby()? #26960

Closed
@tjliupeng

Description

@tjliupeng

Pandas version: 0.22

Problem

I try to use the numpy.size() to count the group size for the groups from pandas Dataframe groupby(), and I get strange result.

Can any guys explain how the aggregation of groupby works?

>>> df=pd.DataFrame({'A':[1,1,2,2], 'B':[1,2,3,4],'C':[0.11,0.32,0.93,0.65],'D':["This","That","How","What"]})
>>> df
   A  B     C     D
0  1  1  0.11  This
1  1  2  0.32  That
2  2  3  0.93   How
3  2  4  0.65  What
>>> df.groupby('A',as_index=False).agg(np.size)
   A  B    C  D
0  1  2  2.0  2
1  2  2  2.0  2
>>> df.groupby('A',as_index=False)['C'].agg(np.size)
   A  C
0  1  8
1  2  8
>>> df.groupby('A',as_index=False)[['C']].agg(np.size)
   A    C
0  1  2.0
1  2  2.0
>>> grouped=df.groupby('A', as_index=False)
>>> grouped['C','D'].agg(np.size)
   A    C  D
0  1  2.0  2
1  2  2.0  2

In the code, if we use groupby() following ['C'], the group size is 8, equal to the correct group size * field number, that is 2 * 4; if we use groupby() following column [['C']] or ['C','D'], the group size is right. If setting as_index to True, I can get a Series with correct value. But I want the groupby field 'A' included also, so as_index is set to False,

Expected Output

My expected output is that I can still get a DataFrame which should include column 'A' and 'C', and value of column 'C' is the group size, should be 2 in this code example.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions