Closed
Description
Hi,
When using GroupBy.apply on function that applies pandas.DataFrame.sort_values() with inplace=True and also data-frame slicing, the first group is evaluated N times (N is the original number of the groups).
See the following code and output:
>>> import numpy as np
>>> import pandas as pd
>>>
>>> A = np.repeat('a', 8)
>>> B = np.repeat([1, 2, 3, 4], 2)
>>> data = pd.DataFrame.from_dict({'A': A, 'B': B})
>>> print(data)
A B
0 a 1
1 a 1
2 a 2
3 a 2
4 a 3
5 a 3
6 a 4
7 a 4
>>>
>>> def func(group_data, sort, slice):
... if sort:
... group_data.sort_values('A', inplace=True)
... if slice:
... group_data = group_data[:]
... return group_data
...
>>> B_groups = data.groupby('B')
>>> out_sort_slice = B_groups.apply(lambda x: func(x, sort=True, slice=True))
>>> out_sort_no_slice = B_groups.apply(lambda x: func(x, sort=True, slice=False))
>>> out_no_sort_slice = B_groups.apply(lambda x: func(x, sort=False, slice=True))
>>> out_no_sort_no_slice = B_groups.apply(lambda x: func(x, sort=False, slice=False))
>>>
>>> print(out_sort_slice)
A B
B
1 0 a 1
1 a 1
2 0 a 1
1 a 1
3 0 a 1
1 a 1
4 0 a 1
1 a 1
>>> print(out_sort_no_slice)
A B
0 a 1
1 a 1
2 a 2
3 a 2
4 a 3
5 a 3
6 a 4
7 a 4
>>> print(out_no_sort_slice)
A B
B
1 0 a 1
1 a 1
2 2 a 2
3 a 2
3 4 a 3
5 a 3
4 6 a 4
7 a 4
>>> print(out_no_sort_no_slice)
A B
0 a 1
1 a 1
2 a 2
3 a 2
4 a 3
5 a 3
6 a 4
7 a 4
As you can see, out_sort_no_slice, out_no_sort_slice and out_no_sort_no_slice get the expected outputs, while the output of out_sort_slice contains the first group 4 times. while debugging, you can see the first group evaluated 4 times inside func.
Notice that the data is already sorted by field 'A' and the slicing part keeps all the data in the data-frame.
I'm using Anaconda Python 3.7 with Pandas 0.24.2 on Mac.
Thanks!
Metadata
Metadata
Assignees
Labels
No labels