Skip to content

Consistency with groupby as_index #5755

Closed
@hayd

Description

@hayd

Checklist:

I've had this half-written for a while, but I wasn't sure actually explain it. I'm still not sure, but here goes.

Would be intersting to discuss what peoples views on the correctness of these was.

I think there's scope for cleaning up some possible inconsistencies (unfortunately this would mean breaking the API):

In [2]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [3]: g = df.groupby('A', )

In [4]: g1 = df.groupby('A', as_index=False)

In [5]: g.head(1)   # I did a dirty hack to get the index like this!
Out[5]: 
     A  B
A        
1 0  1  2
5 2  5  6

In [6]: g1.head(1)
Out[6]: 
   A  B
0  1  2
2  5  6

In [7]: g.nth(0)
Out[7]: 
   B
A   
1  2
5  6

In [8]: g1.nth(0)  # the index doesn't match the original
Out[8]: 
   A  B
0  1  2
1  5  6

In [9]: g.filter(lambda x: x.B>0)
Out[9]: 
   A  B
0  1  2
1  1  4
2  5  6

In [10]: g1.filter(lambda x: x.B>0)
Out[10]: 
   A  B
0  1  2
1  1  4
2  5  6

In [13]: g.apply(lambda x: x.sum()) 
Out[13]: 
   A  B
A      
1  2  6
5  5  6

In [14]: g1.apply(lambda x: x.sum())  # but as_index=False
Out[14]: 
   A  B
A      
1  2  6
5  5  6

In [15]: g.apply(lambda x: x)  # no index ?
Out[15]: 
   A  B
0  1  2
1  1  4
2  5  6

In [16]: g1.apply(lambda x: x) 
Out[16]: 
   A  B
0  1  2
1  1  4
2  5  6

In [18]: g.agg(lambda x: x.sum())
Out[18]: 
   B
A   
1  6
5  6

In [19]: g1.agg(lambda x: x.sum())  # Note: A is not summed
Out[19]: 
   A  B
0  1  6
1  5  6

My opinion is that head/tail/nth should act like filter (i.e. both ignore the as_index).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions