Description
This is the weirdest bug I have seen in Pandas. But I am guessing (hoping) the fix will not be too difficult.
Code Sample
Consider the following two code blocks:
Block 1: key column is called "k"
>>> df1 = pd.DataFrame({'x' : [1,2,3,4,5]*3,
'y' : [10,20,30,40,50]*3,
'z' : [100,200,300,400,500]*3})
>>> df1['k'] = [(0,0,1),(0,1,0),(1,0,0)]*5
Block 2: key column is called "key"
>>> df2 = pd.DataFrame({'x' : [1,2,3,4,5]*3,
'y' : [10,20,30,40,50]*3,
'z' : [100,200,300,400,500]*3})
>>> df2['key'] = [(0,0,1),(0,1,0),(1,0,0)]*5
Note that the same, static data is used, so that nothing else may be different, and hence culpable.
Problem description
Running a simple .groupby().describe()
operation produces the following results:
>>> df1.groupby('k').describe()
# No Result
>>> df2.groupby('key').describe()
x y z
key
(0, 0, 1) count 5.000000 5.000000 5.000000
mean 3.000000 30.000000 300.000000
std 1.581139 15.811388 158.113883
min 1.000000 10.000000 100.000000
25% 2.000000 20.000000 200.000000
50% 3.000000 30.000000 300.000000
75% 4.000000 40.000000 400.000000
max 5.000000 50.000000 500.000000
(0, 1, 0) count 5.000000 5.000000 5.000000
mean 3.000000 30.000000 300.000000
std 1.581139 15.811388 158.113883
min 1.000000 10.000000 100.000000
25% 2.000000 20.000000 200.000000
50% 3.000000 30.000000 300.000000
75% 4.000000 40.000000 400.000000
max 5.000000 50.000000 500.000000
(1, 0, 0) count 5.000000 5.000000 5.000000
mean 3.000000 30.000000 300.000000
std 1.581139 15.811388 158.113883
min 1.000000 10.000000 100.000000
25% 2.000000 20.000000 200.000000
50% 3.000000 30.000000 300.000000
75% 4.000000 40.000000 400.000000
max 5.000000 50.000000 500.000000
Note that groupby().mean()
, sum()
, and a few others work fine. describe()
is the only one I think is causing the problem.
Expected Output
Obviously, the expected output for df1.groupby('k').describe()
should be the same as df2.groupby('key').describe()
.
Output of pd.show_versions()
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.4
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None