Closed
Description
Checklist:
- head and tail act like filter
- nth act like filter
- nth on Series should dropna? (discuss?)
- something subtle with some functions in agg/apply (again)
- go through other methods to see if as_index acts correctly
- cythonized vs with apply BUG/TST: verify that groupby apply with a column aggregation does not return the column #7002
- get_group with
as_index=False
, BUG: Applying function on column of Groupby object with as_index=False does not select column #5764 - returned group for a DataFrame.groupby.cumsum(), cumsum sums the groupby column #5614 / jreback@371a7bf / just need to test for transformers (that they ignore
as_index
)
I've had this half-written for a while, but I wasn't sure actually explain it. I'm still not sure, but here goes.
Would be intersting to discuss what peoples views on the correctness of these was.
I think there's scope for cleaning up some possible inconsistencies (unfortunately this would mean breaking the API):
In [2]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
In [3]: g = df.groupby('A', )
In [4]: g1 = df.groupby('A', as_index=False)
In [5]: g.head(1) # I did a dirty hack to get the index like this!
Out[5]:
A B
A
1 0 1 2
5 2 5 6
In [6]: g1.head(1)
Out[6]:
A B
0 1 2
2 5 6
In [7]: g.nth(0)
Out[7]:
B
A
1 2
5 6
In [8]: g1.nth(0) # the index doesn't match the original
Out[8]:
A B
0 1 2
1 5 6
In [9]: g.filter(lambda x: x.B>0)
Out[9]:
A B
0 1 2
1 1 4
2 5 6
In [10]: g1.filter(lambda x: x.B>0)
Out[10]:
A B
0 1 2
1 1 4
2 5 6
In [13]: g.apply(lambda x: x.sum())
Out[13]:
A B
A
1 2 6
5 5 6
In [14]: g1.apply(lambda x: x.sum()) # but as_index=False
Out[14]:
A B
A
1 2 6
5 5 6
In [15]: g.apply(lambda x: x) # no index ?
Out[15]:
A B
0 1 2
1 1 4
2 5 6
In [16]: g1.apply(lambda x: x)
Out[16]:
A B
0 1 2
1 1 4
2 5 6
In [18]: g.agg(lambda x: x.sum())
Out[18]:
B
A
1 6
5 6
In [19]: g1.agg(lambda x: x.sum()) # Note: A is not summed
Out[19]:
A B
0 1 6
1 5 6
My opinion is that head/tail/nth should act like filter (i.e. both ignore the as_index).