Skip to content

BUG: melt MultiIndex columns using index columns as identifier variables #34129

Closed
@ashtou

Description

@ashtou
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame(
    [['p', 'q', 'r'],
     ['s', 't', 'u'],
     ['v', 'w', 'x'],
    ],
    index=pd.MultiIndex.from_arrays(
        [list('123'), list('456')],
        names=['ind1', 'ind2']
        ),
    columns=pd.MultiIndex.from_arrays(
        [list('ABC'), list('DEF')])
)

# melt using level1 (or above in other cases) fails
df_l1 = df.reset_index(col_level=1)
print("\nL1 index insert:\n", df_l1)
# NOTE: THIS FAILS!
df_l1 = pd.melt(df_l1, col_level=1, 
    id_vars=['ind1'], value_vars=['D','E'])
print("\nL1 melt:\n", df_l1)

Problem description

Suppose that we have multi-index columns and we would like to melt, using the index as the id_vars:
In this example, if we reset_index(col_level=1) and then melt() will fail as shown below:

L1 index insert:
              A  B  C
  ind1 ind2  D  E  F
0    1    4  p  q  r
1    2    5  s  t  u
2    3    6  v  w  x

FAILS: KeyError: 'ind1'
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'ind1'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
 in 
     17 print("\nL1 index insert:\n", df_l1)
     18 # NOTE: THIS FAILS!
---> 19 df_l1 = pd.melt(df_l1, col_level=1, 
     20     id_vars=['ind1'], value_vars=['D','E'])
     21 print("\nL1 melt:\n", df_l1)

~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
    102     mdata = {}
    103     for col in id_vars:
--> 104         id_data = frame.pop(col)
    105         if is_extension_type(id_data):
    106             id_data = concat([id_data] * K, ignore_index=True)

~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/generic.py in pop(self, item)
    860         3  monkey        NaN
    861         """
--> 862         result = self[item]
    863         del self[item]
    864         try:

~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'ind1'

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-51-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.3
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions