Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Use MultiIndex for columns
df = pd.DataFrame([range(0, 4), range(1, 5), range(2, 6)], columns=pd.MultiIndex.from_tuples([("a", "aa"), ("a", "ab"), ("b", "ba"), ("b", "bb")]))
# Adding a column with only one level
df["single_index"] = 0
print(df.columns)
print(df[[("a", "aa"), ("single_index", "")]])
# Use flat column index
df_flat = df.copy()
df_flat.columns = df_flat.columns.to_flat_index()
# Adding a column with only one level
df_flat["new_single_index"] = 0
print(df_flat.columns)
print(df_flat[[("a", "aa")]]) # This works
print(df_flat[["new_single_index"]]) # This works
print(df_flat[[("a", "aa"), "new_single_index"]]) # This fails
Here is the reported error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 1
----> 1 print(df_flat[[("a", "aa"), "new_single_index"]]) # This fails
File /mnt/Data/Work/Perso/pandas/pandas/core/frame.py:3683, in DataFrame.__getitem__(self, key)
3681 if is_iterator(key):
3682 key = list(key)
-> 3683 indexer = self.columns._get_indexer_strict(key, "columns")[1]
3685 # take() does not accept boolean indexers
3686 if getattr(indexer, "dtype", None) == bool:
File /mnt/Data/Work/Perso/pandas/pandas/core/indexes/base.py:5629, in Index._get_indexer_strict(self, key, axis_name)
5627 keyarr = key
5628 if not isinstance(keyarr, Index):
-> 5629 keyarr = com.asarray_tuplesafe(keyarr)
5631 if self._index_as_unique:
5632 indexer = self.get_indexer_for(keyarr)
File /mnt/Data/Work/Perso/pandas/pandas/core/common.py:238, in asarray_tuplesafe(values, dtype)
235 if isinstance(values, list) and dtype in [np.object_, object]:
236 return construct_1d_object_array_from_listlike(values)
--> 238 result = np.asarray(values, dtype=dtype)
240 if issubclass(result.dtype.type, str):
241 result = np.asarray(values, dtype=object)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Issue Description
When mixing tuples and strings as column names, it is no more possible to select columns with mixed name types (e.g. df_flat[[("a", "aa"), "new_single_index"]]
).
Expected Behavior
Selectin columns with both tuple and string names, i.e. df_flat[[("a", "aa"), "new_single_index"]]
in the given example, should work.
Installed Versions
pandas : 2.0.0.dev0+975.gca0434994e
numpy : 1.24.0
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.7.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None