Skip to content

Problems when accessing MultiIndex level with duplicated level names #18872

Closed
@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: mi = pd.MultiIndex.from_product([['i'], ['ii'], ['iii']])

In [3]: mi.rename([1,5,6]).get_level_values(1) # Interpreted as label
Out[3]: Index(['i'], dtype='object', name=1)

In [4]: mi.rename([1,5,1]).get_level_values(1) # Interpreted as index
Out[4]: Index(['ii'], dtype='object', name=5)

In [5]: mi.rename(['a',5,'a']).get_level_values('a') # ValueError is OK, KeyError is not
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
    670                 raise ValueError('The name %s occurs multiple times, use a '
--> 671                                  'level number' % level)
    672             level = self.names.index(level)

ValueError: The name a occurs multiple times, use a level number

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-e8a616f9610f> in <module>()
----> 1 mi.rename(['a',5,'a']).get_level_values('a')

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in get_level_values(self, level)
    975         Index(['d', 'e', 'f'], dtype='object', name='level_2')
    976         """
--> 977         level = self._get_level_number(level)
    978         values = self._get_level_values(level)
    979         return values

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in _get_level_number(self, level)
    673         except ValueError:
    674             if not isinstance(level, int):
--> 675                 raise KeyError('Level %s not found' % str(level))
    676             elif level < 0:
    677                 level += self.nlevels

KeyError: 'Level a not found'

In [6]: mi.rename([1, 'a', 'a']).get_level_values(1) # How am I going to access the second level?!
Out[6]: Index(['i'], dtype='object', name=1)

Problem description

There are different problems, but the root cause is (I think) the same:

  1. the first is trivial: the KeyError in In [5]: should not appear
  2. the second is that the interpretation of an integer changes when there is a duplicate name (difference between Out[3] and Out[4])
  3. the third is that in a situation like In [6], I have no way whatsoever to access the second column, since it is denoted by a duplicated name and its index is also the name of another column (sure, this is a perverse example, but I suspect it can bite in some cases in which users use duplicate labels and pandas internal code adds integer labels)

Expected Output

If we were to design this from scratch, the solution would be simple: prioritize the "index" interpretation of an integer over the "label" interpretation, so that the former is always unambiguous. Is this a too strong API change?

If the answer is "no", I will be happy to implement it, possibly with a temporary warning in those cases where the behaviour will change (that is: requested label is integer and is present in the names).

If the answer is "yes", I would like to at least suppress the KeyError in In [5]: and have In [4]: raise an error rather than return a result inconsistent with In [3]:.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 04db779
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+388.g04db779d4
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions