Skip to content

Support for Enums in MultiIndex #21298

Closed
Closed
@benediamond

Description

@benediamond

Code Sample, a copy-pastable example if possible

import pandas as pd
from enum import Enum

MyEnum = Enum("MyEnum", "A B")

df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[MyEnum, [1, 2]]))  # TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.

df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[pd.Series(MyEnum, dtype="category"), [1, 2]]))  # this workaround successfully executes, but...
df.append({(MyEnum.A, 1): "abc", (MyEnum.B, 2): "xyz"}, ignore_index=True)  # ... this "append" statement then raises the same error.

df.loc[0, [(MyEnum.A, 1), (MyEnum.B, 2)]] = 'abc', 'xyz'  # this works, but is less desirable (can't pass a dict, need to come up with a row indexer, etc.)

Problem description

Though Enums can easily be used as column indexers, strange errors appear to arise when they are used (as one of the factors) in a MultiIndex.

The multiindex (and dataframe) can be created successfully if an (ordered) categorical Series is passed to the constructor. Yet in this case, appending rows in the usual way fails. One can create new rows using .loc, and yet this is not as nice.

This whole situation can be avoided by using strings instead of an Enum. Alternatively, one can use an IntEnum---and yet this essentially uses the underlying integers, instead of the names, as the column indexers.

As the use of enums as columns is perfectly supported in the case of a simple index, it seems a shortcoming that they can't be used in a MultiIndex.

Expected Output

>>> df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[MyEnum, [1, 2]]))
>>> df
Empty DataFrame
Columns: [(MyEnum.A, 1), (MyEnum.A, 2), (MyEnum.B, 1), (MyEnum.B, 2)]
Index: []
>>> df.append({(MyEnum.A, 1): "abc", (MyEnum.B, 2): "xyz"}, ignore_index=True)
  MyEnum.A      MyEnum.B     
         1    2        1    2
0      abc  NaN      NaN  xyz

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions