Skip to content

BUG: Can't mix tuples and strings in column names with Numpy>=1.24 #50372

Closed
@adrien-berchet

Description

@adrien-berchet

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Use MultiIndex for columns
df = pd.DataFrame([range(0, 4), range(1, 5), range(2, 6)], columns=pd.MultiIndex.from_tuples([("a", "aa"), ("a", "ab"), ("b", "ba"), ("b", "bb")]))

# Adding a column with only one level
df["single_index"] = 0
print(df.columns)
print(df[[("a", "aa"), ("single_index", "")]])

# Use flat column index
df_flat = df.copy()
df_flat.columns = df_flat.columns.to_flat_index()

# Adding a column with only one level
df_flat["new_single_index"] = 0
print(df_flat.columns)
print(df_flat[[("a", "aa")]])  # This works
print(df_flat[["new_single_index"]])  # This works
print(df_flat[[("a", "aa"), "new_single_index"]])  # This fails

Here is the reported error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 print(df_flat[[("a", "aa"), "new_single_index"]])  # This fails

File /mnt/Data/Work/Perso/pandas/pandas/core/frame.py:3683, in DataFrame.__getitem__(self, key)
   3681     if is_iterator(key):
   3682         key = list(key)
-> 3683     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3685 # take() does not accept boolean indexers
   3686 if getattr(indexer, "dtype", None) == bool:

File /mnt/Data/Work/Perso/pandas/pandas/core/indexes/base.py:5629, in Index._get_indexer_strict(self, key, axis_name)
   5627 keyarr = key
   5628 if not isinstance(keyarr, Index):
-> 5629     keyarr = com.asarray_tuplesafe(keyarr)
   5631 if self._index_as_unique:
   5632     indexer = self.get_indexer_for(keyarr)

File /mnt/Data/Work/Perso/pandas/pandas/core/common.py:238, in asarray_tuplesafe(values, dtype)
    235 if isinstance(values, list) and dtype in [np.object_, object]:
    236     return construct_1d_object_array_from_listlike(values)
--> 238 result = np.asarray(values, dtype=dtype)
    240 if issubclass(result.dtype.type, str):
    241     result = np.asarray(values, dtype=object)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Issue Description

When mixing tuples and strings as column names, it is no more possible to select columns with mixed name types (e.g. df_flat[[("a", "aa"), "new_single_index"]]).

Expected Behavior

Selectin columns with both tuple and string names, i.e. df_flat[[("a", "aa"), "new_single_index"]] in the given example, should work.

Installed Versions

INSTALLED VERSIONS ------------------ commit : ca04349 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-56-generic Version : #62~20.04.1-Ubuntu SMP Tue Nov 22 21:24:20 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8

pandas : 2.0.0.dev0+975.gca0434994e
numpy : 1.24.0
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.7.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions