Skip to content

BUG: DataFrame.equals returns True when compared elements differ in being np.arrays #43867

Open
@MaxGhenis

Description

@MaxGhenis

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
# Create a minimal DataFrame.
df = pd.DataFrame(dict(x=["a"]))
df
#	x
# 0	a
# Change x to a numpy array.
df_np = df.copy()
df_np.x = [np.array(df.x)]
df_np  # Display that this differs from df.
# 	x
# 0	[a]
df_np.equals(df)
# True

Issue Description

When comparing two DataFrames, and some values are numpy arrays in one and not in another, equals returns True, despite them differing visibly.

Here's the above code in a notebook.

Possibly related: #43008

Expected Behavior

df_np.equals(df) should return False.

Installed Versions

/usr/local/lib/python3.7/dist-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")

INSTALLED VERSIONS

commit : 73c6825
python : 3.7.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.104+
Version : #1 SMP Sat Jun 5 09:50:34 PDT 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.3
numpy : 1.19.5
pytz : 2018.9
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : 0.29.24
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.3
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.13.3
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.25
tables : 3.4.4
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNested DataData where the values are collections (lists, sets, dicts, objects, etc.).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions