Closed
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
def test_astype(self):
df_data = {'a': {4: '162', 7: '85'}, 'b': {4: NA, 7: NA}}
df2_data = {'a': {139: '85'}, 'b': {139: NA}}
df = DataFrame(df_data).astype({
"a": 'string',
"b": 'string',
})
df2 = DataFrame(df2_data).astype({
"a": 'string',
"b": 'string',
})
sum_df = df.set_index(['a', 'b']).combine_first(
df2.set_index(['a', 'b']),
).reset_index()
assert sum_df
Problem description
You can see that in resulting sum_df
there are duplicate rows like ('85', nan). This behavior is incorrect. combine_first() method should join such rows