Description
I always found the mechanics of combine_first
very unintuitive, and constantly need to look into the docs to see what's happening. I haven't checked the git history, but it seems that the method was a direct response from wesm to a SO question (https://stackoverflow.com/a/9794891). In particular, I think this would be much more intuitive to do with df.update
, which is a subset of what #21855 proposes -- it introduces join='outer'
for DataFrame.update
(currently, only 'left'
is supported, but even the source code notes # TODO: Support other joins
).
With that new option, df1.combine_first(df2)
would be the same as df1.update(df2, join='outer', overwrite=False)
, only that combine_first
has much fewer options and controls (i.e. filter_func
and raise_conflict)
. The only difference is that df.update
currently returns None, see #21858.
Since it's quite a well-established function, the deprecation cycle would maybe have to be longer than usual, but I think the update
variant is much cleaner, as well as more versatile, than this single-purpose function.