Skip to content

ENH: more joins for DataFrame.update #21855

Open
@h-vetinari

Description

@h-vetinari

This should be non-controversial, as even the source code for DataFrame.update literally says (https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/frame.py#L5054):
# TODO: Support other joins

I tried to look if a previous issue for this exists, but did not find one.

Some thoughts that arise:

  • default for join should clearly be 'left'
  • df1.update(df2, join='right', overwrite=some_boolean) would be the same as df2.update(df1, join='left', overwrite=not some_boolean). IMO this is not a terrible redundancy, as it allows each user to choose the order that more easily fits their thought pattern.
  • df1.combine_first(df2) would be the same as df1.update(df2, join='outer', overwrite=False), only that combine_first has much fewer options and controls (i.e. filter_func and raise_conflict). Actually, I'd very much like to deprecate combine_first before pandas 1.0. Only difference is that update returns None, which should be changed as well IMO -- relevant xrefs: ENH: add inplace-kwarg to update #21858 DEPR: combine_first (replace with update(..., join='outer'); for both Series/DF) #21859
  • this should IMO also support a way to control which axes are joined in what way (edit: the below was the original proposal; better variants are discussed in ENH: more joins for DataFrame.update #21855 (comment)).
    • The first way that came to mind would be with an axis=0|1|None-keyword, like in DataFrame.align. However, upon further investigation, I don't believe this to be a good choice, as anything other than axis=None would implicitly have to choose a join for the other axis to actually decide the index/columns of the result.
    • Since "explicit is better than implicit", I'd like to propose a version with just one kwarg, namely:
    join=['left', 'left']  # same as 'left' (and so on for 'inner'|'outer'|'right')
    join=['left', 'inner'] | ['left', 'outer'] etc. (for all other 12 combinations)
    
    • I'd say list and tuple would be reasonable to allow as containers, but not more.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions