Skip to content

Equality between DataFrames misbehaves if columns contain NaN #18455

Closed
@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: s = pd.DataFrame(-1, index=[1, np.nan, 2,],
   ...:                  columns=[3, np.nan, 1])
   ...: 

In [3]: s + s # good
Out[3]: 
       3.0  NaN    1.0
 1.0    -2    -2    -2
NaN     -2    -2    -2
 2.0    -2    -2    -2

In [4]: s == s # bad
Out[4]: 
       3.0 NaN    1.0
 1.0  True  NaN  True
NaN   True  NaN  True
 2.0  True  NaN  True

Problem description

While it is true that np.nan != np.nan, pandas disregards this in indexes (indeed, s.loc[:, np.nan] works), so it should be coherent.

Expected Output

In [4]: s == s
Out[4]: 
       3.0  NaN   1.0
 1.0  True  True  True
NaN   True  True  True
 2.0  True  True  True

Output of pd.show_versions()

INSTALLED VERSIONS

commit: b45325e
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+201.gb45325e28.dirty
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functionsDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions