Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas. (See below for why I'm confident this issue remains.)
Reproducible Example
import enum
import pandas
class MyEnum(enum.IntEnum):
FIRST = 1
SECOND = 2
THIRD = 3
FOURTH = 4
FIFTH = 5
print(f"pandas version: {pandas.__version__}")
print(f"is_scalar(MyEnum) == {pandas.core.dtypes.common.is_scalar(MyEnum)}")
print(f"is_list_like(MyEnum) == {pandas.core.dtypes.common.is_list_like(MyEnum)}")
df = pandas.DataFrame(columns=MyEnum)
row = pandas.DataFrame(
{e: e.value + 1 for e in MyEnum}, columns=MyEnum, index=["plus one"]
)
df = pandas.concat((df, row))
row = pandas.DataFrame(
{e: e.value * 2 for e in MyEnum}, columns=MyEnum, index=["times two"]
)
df = pandas.concat((df, row))
df.index.name = "Desc"
# DataFrames use enum's values for displaying column names, so we convert them to
# names
df = df.rename(columns={e: e.name for e in MyEnum})
print(df)
Issue Description
When run with pandas~=2.0
(including 2.0.3
):
pandas version: 2.0.3
is_scalar(MyEnum) == False
is_list_like(MyEnum) == False
Traceback (most recent call last):
File "/.../test_case.py", line 17, in <module>
df = pandas.DataFrame(columns=MyEnum)
File "/.../python3.10/site-packages/pandas/core/frame.py", line 807, in __init__
mgr = dict_to_mgr(
File "/.../python3.10/site-packages/pandas/core/internals/construction.py", line 431, in dict_to_mgr
arrays = Series(data, index=columns, dtype=object)
File "/.../python3.10/site-packages/pandas/core/series.py", line 425, in __init__
index = ensure_index(index)
File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 7128, in ensure_index
return Index(index_like, copy=copy)
File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 522, in __new__
raise cls._raise_scalar_data_error(data)
File "/.../python3.10/site-packages/pandas/core/indexes/base.py", line 5066, in _raise_scalar_data_error
raise TypeError(
TypeError: Index(...) must be called with a collection of some kind, <enum 'MyEnum'> was passed
The offending commit appears to be 8020bf1, which looks like it was part of the 2.0.0 development effort. It's hard to tell from #49718 (see also #49348) what the motivation of the additional check was (other than to solve CI issues), but I claim neither familiarity, nor insight, nor competence. I have not run this with pandas-dev/pandas@main, but the relevant check remains unchanged since 2.0.3:
pandas/pandas/core/indexes/base.py
Lines 527 to 530 in 3fe6149
If it remains desirable to allow enums to be used as column headers, I propose the following change (happy to cut a PR with unit tests, if acceptable):
# ...
elif not is_list_like(data) and not isinstance(data, (enum.Enum, memoryview)):
# 2022-11-16 the memoryview check is only necessary on some CI
# builds, not clear why
raise cls._raise_scalar_data_error(data)
# ...
Expected Behavior
When run with pandas==1.5.3
:
pandas version: 1.5.3
is_scalar(MyEnum) == False
is_list_like(MyEnum) == False
FIRST SECOND THIRD FOURTH FIFTH
Desc
plus one 2 3 4 5 6
times two 2 4 6 8 10
Installed Versions
% python -c 'import pandas ; pandas.show_versions()'
INSTALLED VERSIONS
------------------
commit : 0f437949513225922d851e9581723d82120684a6
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 6.2.6-76060206-generic
Version : #202303130630~1689015125~22.04~ab2190e SMP PREEMPT_DYNAMIC Mon J
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.3
numpy : 1.25.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None