Skip to content

DEPR: combine_first (replace with update(..., join='outer'); for both Series/DF) #21859

Open
@h-vetinari

Description

@h-vetinari

I always found the mechanics of combine_first very unintuitive, and constantly need to look into the docs to see what's happening. I haven't checked the git history, but it seems that the method was a direct response from wesm to a SO question (https://stackoverflow.com/a/9794891). In particular, I think this would be much more intuitive to do with df.update, which is a subset of what #21855 proposes -- it introduces join='outer' for DataFrame.update (currently, only 'left' is supported, but even the source code notes # TODO: Support other joins).

With that new option, df1.combine_first(df2) would be the same as df1.update(df2, join='outer', overwrite=False), only that combine_first has much fewer options and controls (i.e. filter_func and raise_conflict). The only difference is that df.update currently returns None, see #21858.

Since it's quite a well-established function, the deprecation cycle would maybe have to be longer than usual, but I think the update variant is much cleaner, as well as more versatile, than this single-purpose function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions