Skip to content

ENH: coalesce-method (upgrade for update/combine_first) #22812

Open
@h-vetinari

Description

@h-vetinari

The state of update/combine_first in v0.23:

I tried to make some steps towards #21855 and #21859 by adding an inplace-kwarg to df.update in #22286, which has been stalled in discussion whether update should ever be inplace at all, resp. how to move away from inplacing generally.

Today, some headway was made with the comment by @jreback:

So we have .update (in-place defaults) and .combine_first which is not very standard terminology.
In an ideal world I think adding .coalesce is probably the right thing to do (does R use this term?).
which is basically a rename of .combine_first, and deprecate .update.

which I'm strongly in favour of (with the caveat that it should use the capabilities of update; I suggested something similar in #21855; would also solve most of the discussion there). And yes, dplyr uses "coalesce", which itself is inspired by SQL: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf#page.15

This discussion is opened on the advice of @jreback, who would like to involve:

[...] to get some more commentary on this, esp from @jorisvandenbossche and @TomAugspurger (and some off-line discussions that I had with @cpcloud )

Also tagging the other participants of #21855: @gfyoung @toobaz

Summing up this proposal:

  1. Add .coalesce to generic.py, à la:
    def coalesce(self, other, join='left', overwrite=True, filter_func=None, raise_conflict=False):
    which is not inplace and inherited by DataFrame/Series
  2. support different joins, at least: join='left'|'outer'|'inner'|'right' (most of the discussion in ENH: more joins for DataFrame.update #21855 is about potentially allowing different joins for different axes, and which keywords to use for that).
  3. (potentially; not essential to the proposal) slowly deprecate .update and .combine_first

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions