Open
Description
import pandas as pd
left = pd.DataFrame({"a": [1, 2, pd.NA]}, dtype="Int64")
right = pd.DataFrame({"b": [1, 2, pd.NA]}, dtype="Int64")
pd.merge(left, right, how="inner", left_on="a", right_on="b")
# a b
# 0 1 1
# 1 2 2
# 2 <NA> <NA>
(Above is from 1.0.1 and master)
I think when joining on a nullable column we should not be matching NA
with NA
and should only be joining where we have unambiguous equality (as in SQL). Also worth noting that this is the same as what happens when we have NaN
which also seems incorrect, so could be an opportunity to fix this behavior?
pd.merge(left.astype(float), right.astype(float), how="inner", left_on="a", right_on="b")
# a b
# 0 1.0 1.0
# 1 2.0 2.0
# 2 NaN NaN
Expected Output
a b
0 1 1
1 2 2