Description
Code Sample
Find the dataset bug.csv
here
import pandas as pd
df = pd.read_csv("bug.csv", header=[0,1], index_col=[0])
g = df.groupby(level="property", axis=1)
thisFails = g.sum()
print(thisFails)
The bug possibly applies to multi-level indices as well (not just headers) - haven't checked it though.
Problem description
A groupby object g
fails at aggregating the sum if g
was created on a df with a multi-level header, with grouping along one of the column levels.
The code used to work for pandas 0.24.2 and 0.23.4. But it fails for pandas 0.25.x (0.25.3 as of writing this).
On failure, the following exception occurs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'a'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "bug.py", line 5, in <module>
thisFails = g.sum()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1382, in f
result[col] = self._try_cast(result[col], self.obj[col])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 2994, in __getitem__
return self._getitem_multilevel(key)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 3043, in _getitem_multilevel
loc = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2674, in get_loc
loc = self._get_level_indexer(key, level=0)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2939, in _get_level_indexer
code = level_index.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'a'
Expected Output
property a b c d e f
dataset
case 0 0.0 2.0 0.0 0.0 1.0 0.0
case 1 3.0 3.0 1.0 1.0 0.0 1.0
case 2 1.0 1.0 0.0 0.0 0.0 1.0
case 3 2.0 3.0 0.0 0.0 0.0 0.0
case 4 2.0 0.0 0.0 0.0 1.0 0.0
case 5 2.0 0.0 0.0 0.0 1.0 0.0
case 6 2.0 0.0 0.0 0.0 1.0 0.0
case 7 0.0 0.0 0.0 0.0 1.0 0.0
case 8 0.0 0.0 0.0 0.0 0.0 1.0
case 9 3.0 1.0 0.0 1.0 0.0 0.0
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
pandas : 0.25.3
numpy : 1.17.2
pytz : 2018.4
dateutil : 2.7.2
pip : 19.3.1
setuptools : 41.6.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.1.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None