Description
Problem:
Calling testing.assert_frame_equal
with mismatched indexes and check_like=True
generates unhelpful output.
If you run:
import pandas as pd
df1 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "c"])
df2 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "d"])
pd.testing.assert_frame_equal(df1, df2, check_like=True)
The output will be:
AssertionError: DataFrame.iloc[:, 0] (column name="A") are different
DataFrame.iloc[:, 0] (column name="A") values are different (33.33333 %)
[index]: [a, b, d]
[left]: [1.0, 2.0, nan]
[right]: [1.0, 2.0, 3.0]
The data of the input DataFrames are not actually different (there is no nan), but when check_like=True
the code calls left.reindex_like(right)
before comparing indexes (and columns), in order to ensure that both frames are ordered the same.
However, if the indexes contain different values (rather than the same values in a different order),
the reindex_like
function fills the data values (row or column) for the mismatched index entries with NaNs.
This results in the subsequent index checks passing, but the assert_frame_equals
function failing
with a data not equal error (as above).
Even more confusingly, if the values being compared are not floats then you get a dtype not equal error:
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="A") are different
Attribute "dtype" are different
[left]: float64
[right]: int64
These messages are quite unhelpful, as the mismatch is in the index, and the error should logically be the same as you would get if you ran with check_like=False
.
Applies to:
The code above was run against the latest code from master.
>>> print(pd.__version__)
1.2.0.dev0+950.gd321be6
Solution:
The message for the above assertion failure should be something like:
AssertionError: DataFrame.index are different
DataFrame.index values are different (33.33333 %)
[left]: Index(['a', 'b', 'c'], dtype='object')
[right]: Index(['a', 'b', 'd'], dtype='object')
Which is what you get if you run with check_like=False
.