Skip to content

GroupBy.apply repeats the first group N times when slicing and sorting in the applied function #25892

Closed
@gilbenzvi

Description

@gilbenzvi

Hi,

When using GroupBy.apply on function that applies pandas.DataFrame.sort_values() with inplace=True and also data-frame slicing, the first group is evaluated N times (N is the original number of the groups).

See the following code and output:

>>> import numpy as np
>>> import pandas as pd
>>> 
>>> A = np.repeat('a', 8)
>>> B = np.repeat([1, 2, 3, 4], 2)
>>> data = pd.DataFrame.from_dict({'A': A, 'B': B})
>>> print(data)
   A  B
0  a  1
1  a  1
2  a  2
3  a  2
4  a  3
5  a  3
6  a  4
7  a  4
>>> 
>>> def func(group_data, sort, slice):
...     if sort:
...         group_data.sort_values('A', inplace=True)
...     if slice:
...         group_data = group_data[:]
...     return group_data
... 
>>> B_groups = data.groupby('B')
>>> out_sort_slice = B_groups.apply(lambda x: func(x, sort=True, slice=True))
>>> out_sort_no_slice = B_groups.apply(lambda x: func(x, sort=True, slice=False))
>>> out_no_sort_slice = B_groups.apply(lambda x: func(x, sort=False, slice=True))
>>> out_no_sort_no_slice = B_groups.apply(lambda x: func(x, sort=False, slice=False))
>>> 
>>> print(out_sort_slice)
     A  B
B        
1 0  a  1
  1  a  1
2 0  a  1
  1  a  1
3 0  a  1
  1  a  1
4 0  a  1
  1  a  1
>>> print(out_sort_no_slice)
   A  B
0  a  1
1  a  1
2  a  2
3  a  2
4  a  3
5  a  3
6  a  4
7  a  4
>>> print(out_no_sort_slice)
     A  B
B        
1 0  a  1
  1  a  1
2 2  a  2
  3  a  2
3 4  a  3
  5  a  3
4 6  a  4
  7  a  4
>>> print(out_no_sort_no_slice)
   A  B
0  a  1
1  a  1
2  a  2
3  a  2
4  a  3
5  a  3
6  a  4
7  a  4

As you can see, out_sort_no_slice, out_no_sort_slice and out_no_sort_no_slice get the expected outputs, while the output of out_sort_slice contains the first group 4 times. while debugging, you can see the first group evaluated 4 times inside func.
Notice that the data is already sorted by field 'A' and the slicing part keeps all the data in the data-frame.

I'm using Anaconda Python 3.7 with Pandas 0.24.2 on Mac.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions