Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
data = {"a": [0, 1, 2, 3], "b": [1, 3, 3, 1]}
# data_bad = {"a": [0], "b": [0]}
# data_bad = {"a": [4], "b": [4]}
data_bad = {"a": [0, 1], "b": [1, 3]}
df = pd.DataFrame(data)
df_bad = pd.DataFrame(data_bad)
df["y"] = list(zip(df["a"], df["b"]))
df = df.set_index(["a", "b"])
print(df)
print(df_bad)
# print(df.loc[(df_bad["a"], df_bad["b"])]) # does not work
# print(df.loc[(df_bad["a"], df_bad["b"]), :]) # works
# print(df.loc[[(df_bad["a"], df_bad["b"])]]) # does not work
df.loc[(df_bad["a"], df_bad["b"])] = np.nan # blanks all
# df.loc[(df_bad["a"], df_bad["b"]), :] = np.nan # blanks correctly
# df.loc[[(df_bad["a"], df_bad["b"])]] = np.nan # gives error
print(df)
Issue Description
I wanted to post this as a discussion because I don't feel I understand what's going on enough to open a clear issue. It seems that you don't have a discussions tab in this repo though, so I am proceeding here.
It seems that assignment to a multiindex loc
, when given a list is undefined. I say this because you get a an "Index data must be 1-dimensional" error when accessing or assigning using [[]]
to force a dataframe:
df.loc[[(df_bad["a"], df_bad["b"])]]
df.loc[[(df_bad["a"], df_bad["b"])]] = np.nan
What makes me think this is an issue, or unhandled error, is that this
access does not work (gives KeyError
), but the following assigment does:
df.loc[(df_bad["a"], df_bad["b"])]
df.loc[(df_bad["a"], df_bad["b"])] = np.nan
What's more, when giving a column slice, it seems to behave properly (confusing me more):
df.loc[(df_bad["a"], df_bad["b"]), :]
df.loc[(df_bad["a"], df_bad["b"]), :] = np.nan
Can someone shed some more light on what's going on here and what of this is an issue or not?
Expected Behavior
I would expect this to error in all cases if undefined. I would expect the differing behavior when specifying a column slice to be documented.
Installed Versions
/usr/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : 2e218d1
python : 3.11.3.final.0
python-bits : 64
OS : Linux
OS-release : 6.4.1-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Sat, 01 Jul 2023 16:17:21 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.3
numpy : 1.25.0
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.1.2
Cython : 0.29.35
pytest : None
hypothesis : None
sphinx : 7.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None