Description
Code Sample, a copy-pastable example if possible
df = pd.DataFrame({"A": [1, 2, 3, 4, 5], "B": [6, 7, 8, 9, 0],
"C": [1, 1, 1, 2, 2]}, index=range(5))
a = df.groupby("C").apply(lambda x: x.A)
b = df.groupby("C").apply(lambda x: x.A.sort_index())
In [9]: print(df)
A B C
0 1 6 1
1 2 7 1
2 3 8 1
3 4 9 2
4 5 0 2
In [248]: print(a)
C
1 0 1
1 2
2 3
2 3 4
4 5
Name: A, dtype: int64
In [249]: print(b)
A 0 1 2
C
1 1 2 3
2 4 5 33
Problem description
First the output of .groupby().apply() seems inconsistent, sometimes it returns the "correct" shape as in case A while in case B the output is transposed.
Second, the 33 returned values in case B is not what I would expect it to be. That number changes if I call the function multiple times.
This does not only happens when sort_index() is called but it was the simplest example I could consistently reproduce.
Expected Output
C
1 0 1
1 2
2 3
2 3 4
4 5
Name: A, dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-112-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0