Skip to content

BUG (regression): Enum (and subclasses) no longer acceptable parameter type for columns #54386

Open
@posita

Description

@posita

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas. (See below for why I'm confident this issue remains.)

Reproducible Example

import enum

import pandas


class MyEnum(enum.IntEnum):
    FIRST = 1
    SECOND = 2
    THIRD = 3
    FOURTH = 4
    FIFTH = 5


print(f"pandas version: {pandas.__version__}")
print(f"is_scalar(MyEnum) == {pandas.core.dtypes.common.is_scalar(MyEnum)}")
print(f"is_list_like(MyEnum) == {pandas.core.dtypes.common.is_list_like(MyEnum)}")
df = pandas.DataFrame(columns=MyEnum)
row = pandas.DataFrame(
    {e: e.value + 1 for e in MyEnum}, columns=MyEnum, index=["plus one"]
)
df = pandas.concat((df, row))
row = pandas.DataFrame(
    {e: e.value * 2 for e in MyEnum}, columns=MyEnum, index=["times two"]
)
df = pandas.concat((df, row))

df.index.name = "Desc"
# DataFrames use enum's values for displaying column names, so we convert them to
# names
df = df.rename(columns={e: e.name for e in MyEnum})
print(df)

Issue Description

When run with pandas~=2.0 (including 2.0.3):

pandas version: 2.0.3
is_scalar(MyEnum) == False
is_list_like(MyEnum) == False
Traceback (most recent call last):
  File "/.../test_case.py", line 17, in <module>
    df = pandas.DataFrame(columns=MyEnum)
  File "/.../python3.10/site-packages/pandas/core/frame.py", line 807, in __init__
    mgr = dict_to_mgr(
  File "/.../python3.10/site-packages/pandas/core/internals/construction.py", line 431, in dict_to_mgr
    arrays = Series(data, index=columns, dtype=object)
  File "/.../python3.10/site-packages/pandas/core/series.py", line 425, in __init__
    index = ensure_index(index)
  File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 7128, in ensure_index
    return Index(index_like, copy=copy)
  File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 522, in __new__
    raise cls._raise_scalar_data_error(data)
  File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 5066, in _raise_scalar_data_error
    raise TypeError(
TypeError: Index(...) must be called with a collection of some kind, <enum 'MyEnum'> was passed

The offending commit appears to be 8020bf1, which looks like it was part of the 2.0.0 development effort. It's hard to tell from #49718 (see also #49348) what the motivation of the additional check was (other than to solve CI issues), but I claim neither familiarity, nor insight, nor competence. I have not run this with pandas-dev/pandas@main, but the relevant check remains unchanged since 2.0.3:

elif not is_list_like(data) and not isinstance(data, memoryview):
# 2022-11-16 the memoryview check is only necessary on some CI
# builds, not clear why
raise cls._raise_scalar_data_error(data)

If it remains desirable to allow enums to be used as column headers, I propose the following change (happy to cut a PR with unit tests, if acceptable):

        # ...
        elif not is_list_like(data) and not isinstance(data, (enum.Enum, memoryview)):
            # 2022-11-16 the memoryview check is only necessary on some CI
            #  builds, not clear why
            raise cls._raise_scalar_data_error(data)
        # ...

See also #21298 and #22551.

Expected Behavior

When run with pandas==1.5.3:

pandas version: 1.5.3
is_scalar(MyEnum) == False
is_list_like(MyEnum) == False
          FIRST SECOND THIRD FOURTH FIFTH
Desc
plus one      2      3     4      5     6
times two     2      4     6      8    10

Installed Versions

% python -c 'import pandas ; pandas.show_versions()'

INSTALLED VERSIONS
------------------
commit           : 0f437949513225922d851e9581723d82120684a6
python           : 3.10.12.final.0
python-bits      : 64
OS               : Linux
OS-release       : 6.2.6-76060206-generic
Version          : #202303130630~1689015125~22.04~ab2190e SMP PREEMPT_DYNAMIC Mon J
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.0.3
numpy            : 1.25.2
pytz             : 2023.3
dateutil         : 2.8.2
setuptools       : 68.0.0
pip              : 23.2.1
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 8.14.0
pandas_datareader: None
bs4              : None
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : 3.7.2
numba            : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : None
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
zstandard        : None
tzdata           : 2023.3
qtpy             : None
pyqt5            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexRelated to the Index class or subclassesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions