Skip to content

Inconsistent groupby-apply output shape and random values returned. #20420

Closed
@jp-belanger

Description

@jp-belanger

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({"A": [1, 2, 3, 4, 5], "B": [6, 7, 8, 9, 0],
                   "C": [1, 1, 1, 2, 2]}, index=range(5))     
a = df.groupby("C").apply(lambda x: x.A)                      
b = df.groupby("C").apply(lambda x: x.A.sort_index())

In [9]: print(df)
   A  B  C
0  1  6  1
1  2  7  1
2  3  8  1
3  4  9  2
4  5  0  2

In [248]: print(a)
C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

In [249]: print(b)
A  0  1   2
C
1  1  2   3
2  4  5  33

Problem description

First the output of .groupby().apply() seems inconsistent, sometimes it returns the "correct" shape as in case A while in case B the output is transposed.

Second, the 33 returned values in case B is not what I would expect it to be. That number changes if I call the function multiple times.

This does not only happens when sort_index() is called but it was the simplest example I could consistently reproduce.

Expected Output

C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-112-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions