Skip to content

BUG: Regression in assert_frame_equal with check_like=True and Multiindex #48975

Closed
@boxblox

Description

@boxblox

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df_1 = pd.DataFrame([("a", "a", 1), ("b", "b", 2), ("c", "c", 3)])
df_1[0] = df_1[0].astype(pd.CategoricalDtype(["a", "b", "c"]))
df_1[1] = df_1[1].astype(pd.CategoricalDtype(["a", "b", "c"]))

df_2 = pd.DataFrame([("c", "c", 3), ("b", "b", 2), ("a", "a", 1)])
df_2[0] = df_2[0].astype(pd.CategoricalDtype(["a", "b", "c"]))
df_2[1] = df_2[1].astype(pd.CategoricalDtype(["a", "b", "c"]))

assert_frame_equal(
    df_1.set_index([0, 1]),
    df_2.set_index([0, 1]),
    check_like=True,
)```

Gives an `AssertionError`: 

```---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [24], in <cell line: 9>()
      6 df_2[0] = df_2[0].astype(pd.CategoricalDtype(["a", "b", "c"]))
      7 df_2[1] = df_2[1].astype(pd.CategoricalDtype(["a", "b", "c"]))
----> 9 assert_frame_equal(
     10     df_1.set_index([0, 1]),
     11     df_2.set_index([0, 1]),
     12     check_like=True,
     13 )

    [... skipping hidden 3 frame]

File ~/miniconda3/envs/gams40/lib/python3.10/site-packages/pandas/_testing/asserters.py:318, in assert_index_equal.<locals>._check_types(left, right, obj)
    315 if not exact:
    316     return
--> 318 assert_class_equal(left, right, exact=exact, obj=obj)
    319 assert_attr_equal("inferred_type", left, right, obj=obj)
    321 # Skip exact dtype checking when `check_categorical` is False

    [... skipping hidden 1 frame]

File ~/miniconda3/envs/gams40/lib/python3.10/site-packages/pandas/_testing/asserters.py:677, in raise_assert_detail(obj, message, left, right, diff, index_values)
    674 if diff is not None:
    675     msg += f"\n[diff]: {diff}"
--> 677 raise AssertionError(msg)

AssertionError: MultiIndex level [0] are different

MultiIndex level [0] classes are different
[left]:  CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category', name=0)
[right]: Index(['a', 'b', 'c'], dtype='object', name=0)```

Issue Description

The example passes in Pandas 1.4.4, but fails in 1.5.0. Unclear why AssertionError message shows that the class type of [right] might have changed?

Expected Behavior

Expected consistent behavior between Pandas 1.4.4 and 1.5.0.

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:25:27 PDT 2022; root:xnu-8020.141.5~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0
numpy : 1.23.1
pytz : 2022.4
dateutil : 2.8.2
setuptools : 63.4.1
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

Labels

RegressionFunctionality that used to work in a prior pandas versionTestingpandas testing functions or related to the test suite

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions