Description
The state of update/combine_first
in v0.23
:
.update
signature does not match between DataFrame/Series (ENH: unify signature for df.update and Series.update #22358)df.update
has ajoin
-kwarg that only supportsleft
, although the source code itself notes:
# TODO: Support other joins
(ENH: more joins for DataFrame.update #21855).update
is one of the (very) few pandas-methods that's inplace by default, but does not have aninplace
-kwarg (ENH: add inplace-kwarg to df.update #22286).combine_first
is effectively (the not-yet-implemented).update(join='outer')
, has an awkward, non-standard name, and much fewer capabilities than.update
. (DEPR: combine_first (replace with update(..., join='outer'); for both Series/DF) #21859)
I tried to make some steps towards #21855 and #21859 by adding an inplace
-kwarg to df.update
in #22286, which has been stalled in discussion whether update
should ever be inplace at all, resp. how to move away from inplacing generally.
Today, some headway was made with the comment by @jreback:
So we have
.update
(in-place defaults) and.combine_first
which is not very standard terminology.
In an ideal world I think adding.coalesce
is probably the right thing to do (does R use this term?).
which is basically a rename of.combine_first
, and deprecate.update
.
which I'm strongly in favour of (with the caveat that it should use the capabilities of update
; I suggested something similar in #21855; would also solve most of the discussion there). And yes, dplyr uses "coalesce", which itself is inspired by SQL: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf#page.15
This discussion is opened on the advice of @jreback, who would like to involve:
[...] to get some more commentary on this, esp from @jorisvandenbossche and @TomAugspurger (and some off-line discussions that I had with @cpcloud )
Also tagging the other participants of #21855: @gfyoung @toobaz
Summing up this proposal:
- Add
.coalesce
togeneric.py
, à la:
def coalesce(self, other, join='left', overwrite=True, filter_func=None, raise_conflict=False):
which is not inplace and inherited by DataFrame/Series - support different joins, at least:
join='left'|'outer'|'inner'|'right'
(most of the discussion in ENH: more joins for DataFrame.update #21855 is about potentially allowing different joins for different axes, and which keywords to use for that). - (potentially; not essential to the proposal) slowly deprecate
.update
and.combine_first