Skip to content

BUG: DataFrame.combine_first() works incorrect with with string indexes #37519

Closed
@sfilatov96

Description

@sfilatov96
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

def test_astype(self):
        df_data = {'a': {4: '162', 7: '85'}, 'b': {4: NA, 7: NA}}
        df2_data = {'a': {139: '85'}, 'b': {139: NA}}
        df = DataFrame(df_data).astype({
            "a": 'string',
            "b": 'string',
        })
        df2 = DataFrame(df2_data).astype({
            "a": 'string',
            "b": 'string',
        })
        sum_df = df.set_index(['a', 'b']).combine_first(
            df2.set_index(['a', 'b']),
        ).reset_index()
        assert sum_df

Problem description

You can see that in resulting sum_df there are duplicate rows like ('85', nan). This behavior is incorrect. combine_first() method should join such rows

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions