DOC: update the DataFrame.update() docstring #20201

kantologist · 2018-03-10T17:03:52Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.update)  #####################
################################################################################

Modify DataFrame in place using non-NA values from passed
DataFrame.

Aligns on indices.

Parameters
----------
other : DataFrame, or object coercible into a DataFrame
    Index should be similar to one of the columns in this one. If a
    Series is passed, its name attribute must be set, and that will be
    used as the column name in the resulting joined DataFrame.
join : {'left'}, default 'left'
    Indicates which column values overwrite.
overwrite : boolean, default True
    If True then overwrite values for common keys in the calling frame.
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
    Can choose to replace values other than NA. Return True for values
    that should be updated.
raise_conflict : boolean
    If True, will raise an error if the DataFrame and other both
    contain data in the same place.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
   A  B
0  a  d
1  b  y
2  c  e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
   A  B
0  a  x
1  b  d
2  c  e

If ``other`` contains NaNs the corresponding values are not updated
in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
   A      B
0  1    4.0
1  2  500.0
2  3    6.0

See also
--------
DataFrame.merge : For column(s)-on-columns(s) operations

Returns
-------
updated : DataFrame

################################################################################
################################## Validation ##################################
################################################################################

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

DOC: Improved the docstring of DataFrame.update()

TomAugspurger · 2018-03-10T19:33:29Z

pandas/core/frame.py

@@ -4207,17 +4207,23 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
               raise_conflict=False):
        """
        Modify DataFrame in place using non-NA values from passed


May have to reword this so the first line isn't too long? Does the validation script complain about this?

The validation script actually did not complain, but I'll try to adjust it.

TomAugspurger · 2018-03-10T19:34:17Z

pandas/core/frame.py

        join : {'left'}, default 'left'
+            Indicates which column values overwrite.


Is join='right' valid? Can you do some exploration on what happens?

It's actually not valid. I tried it.

I think you can say "Only left join is implemented, keeping the index and columns of the original object"

TomAugspurger · 2018-03-10T19:36:31Z

pandas/core/frame.py

@@ -4276,6 +4282,14 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
        0  1    4.0
        1  2  500.0
        2  3    6.0
+
+        See also


See Also (note capitalization)

should go before the examples.

TomAugspurger · 2018-03-10T19:36:46Z

pandas/core/frame.py

+
+        Returns
+        -------
+        updated : DataFrame


Rerturns goes right after the Parameters section.

DOC: Improved the docstring of DataFrame.update()

jorisvandenbossche

Can you add some more explanation between the different examples ?

jorisvandenbossche · 2018-03-11T14:25:54Z

pandas/core/frame.py

        raise_conflict : boolean
            If True, will raise an error if the DataFrame and other both
            contain data in the same place.

+        Returns
+        -------
+        updated : DataFrame


It actually does not return a DataFrame, as the update is in place.

jorisvandenbossche · 2018-03-11T14:27:07Z

pandas/core/frame.py

        join : {'left'}, default 'left'
+            Indicates which column values overwrite.


I think you can say "Only left join is implemented, keeping the index and columns of the original object"

jorisvandenbossche · 2018-03-11T14:27:21Z

pandas/core/frame.py

        overwrite : boolean, default True
-            If True then overwrite values for common keys in the calling frame
+            If True then overwrite values for common keys in the calling frame.


and what if False?

I think it that case only NA values in the calling object are updated

DOC: Improved the docstring of DataFrame.join()

Added Raises Reformatted join. Added dict.update to see also

codecov · 2018-03-12T14:47:09Z

Codecov Report

Merging #20201 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20201      +/-   ##
==========================================
+ Coverage   91.73%   91.76%   +0.02%     
==========================================
  Files         150      150              
  Lines       49144    49144              
==========================================
+ Hits        45083    45095      +12     
+ Misses       4061     4049      -12

Flag	Coverage Δ
#multiple	`90.14% <100%> (+0.02%)`	⬆️
#single	`41.9% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.18% <100%> (ø)`	⬆️
pandas/core/resample.py	`96.43% <0%> (ø)`	⬆️
pandas/core/series.py	`93.84% <0%> (ø)`	⬆️
pandas/core/indexes/timedeltas.py	`91.03% <0%> (ø)`	⬆️
pandas/core/groupby.py	`92.14% <0%> (ø)`	⬆️
pandas/core/generic.py	`95.85% <0%> (ø)`	⬆️
pandas/core/window.py	`96.3% <0%> (ø)`	⬆️
pandas/plotting/_converter.py	`66.81% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b00c5...ce18a26. Read the comment docs.

TomAugspurger · 2018-03-12T14:47:30Z

Made a couple updates. Mind taking a look @kantologist?

DOC: Improved the docstring of DataFrame.update()

kantologist · 2018-03-12T15:39:30Z

So the only issue is the doc validation script complaining about the absent of a return section. by the way I observed the same pattern of return section for other inplace functions too.

jorisvandenbossche

So the only issue is the doc validation script complaining about the absent of a return section. by the way I observed the same pattern of return section for other inplace functions too.

It's fine to ignore those errors in this case (the methods that don't return something are a minority)

jorisvandenbossche · 2018-03-12T15:43:56Z

pandas/core/frame.py


        Parameters
        ----------
        other : DataFrame, or object coercible into a DataFrame
+            Index should be similar to one of the columns in this one. If a


I don't really understand "Index should be similar to one of the columns in this one". What do you want to say here?

jorisvandenbossche · 2018-03-12T15:45:40Z

pandas/core/frame.py

-            If True then overwrite values for common keys in the calling frame
+            How to handle non-NA values for overlapping keys.
+
+            * True : overwrite values in `self` with values from `other`.


I think we should try to avoid using 'self', as not necessarily every pandas user knows how classes are written.

There is not always an ideal alternative, but I would use "original object" or "calling object" (or use DataFrame instead of object)

jorisvandenbossche · 2018-03-12T15:46:14Z

pandas/core/frame.py

+
+            * True : overwrite values in `self` with values from `other`.
+            * False : only update values that are NA in `self`.
+
        filter_func : callable(1d-array) -> 1d-array<boolean>, default None


"default None" -> "optional"

jorisvandenbossche · 2018-03-12T15:47:35Z

pandas/core/frame.py

@@ -4350,6 +4370,8 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
        1  2  5
        2  3  6

+        The DataFrame's length does not increase as a result of the update.


maybe add something like ", only values at matching index/column labels are updated"

DOC: Improve the docstring of DataFrame.update()

jorisvandenbossche

Thanks for the updates! Fixed up some minor formatting issues

jorisvandenbossche · 2018-03-13T21:41:58Z

Thanks for the PR @kantologist !

kantologist added 2 commits March 10, 2018 17:57

DOC: Improved the docstring of DataFrame.update()

10f998d

Merge remote-tracking branch 'upstream/master' into docstring_update

5d9d44b

DOC: Improved the docstring of DataFrame.update()

TomAugspurger requested changes Mar 10, 2018

View reviewed changes

TomAugspurger added the Docs label Mar 10, 2018

kantologist added 2 commits March 11, 2018 00:25

Merge remote-tracking branch 'upstream/master' into docstring_update

3965924

DOC: Improved the docstring of DataFrame.update()

DOC: Improved the docstring of DataFrame.update()

714ead4

jorisvandenbossche reviewed Mar 11, 2018

View reviewed changes

kantologist and others added 4 commits March 11, 2018 23:26

DOC: Improved the docstring of DataFrame.join()

ee07bc6

Merge remote-tracking branch 'upstream/master' into docstring_update

6443994

DOC: Improved the docstring of DataFrame.join()

DOC: Improved the docstring of DataFrame.join()

2ce7982

Update returns.

ff0b3fa

Added Raises Reformatted join. Added dict.update to see also

kantologist added 4 commits March 12, 2018 16:09

Merge remote-tracking branch 'upstream/master' into docstring_update

4bb134d

DOC: Improved the docstring of DataFrame.update()

Merge remote-tracking branch 'upstream/master' into docstring_update

8167356

DOC: Improved the docstring of DataFrame.update()

DOC: Improved the docstring of DataFrame.update()

8838631

DOC: Improved the docstring of DataFrame.update()

77b7e3b

jorisvandenbossche reviewed Mar 12, 2018

View reviewed changes

kantologist and others added 3 commits March 13, 2018 14:52

DOC: Improve the docstring of DataFrame.update()

4c2e352

Merge remote-tracking branch 'upstream/master' into docstring_update

eade529

DOC: Improve the docstring of DataFrame.update()

Update frame.py

ce18a26

jorisvandenbossche approved these changes Mar 13, 2018

View reviewed changes

jorisvandenbossche merged commit 21ae073 into pandas-dev:master Mar 13, 2018

jorisvandenbossche added this to the 0.23.0 milestone Mar 13, 2018

kantologist deleted the docstring_update branch March 14, 2018 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the DataFrame.update() docstring #20201

DOC: update the DataFrame.update() docstring #20201

kantologist commented Mar 10, 2018

TomAugspurger Mar 10, 2018

kantologist Mar 10, 2018

TomAugspurger Mar 10, 2018

kantologist Mar 10, 2018

jorisvandenbossche Mar 11, 2018

TomAugspurger Mar 10, 2018

TomAugspurger Mar 10, 2018

jorisvandenbossche left a comment

jorisvandenbossche Mar 11, 2018

jorisvandenbossche Mar 11, 2018

jorisvandenbossche Mar 11, 2018

jorisvandenbossche Mar 11, 2018

codecov bot commented Mar 12, 2018 •

edited

Loading

TomAugspurger commented Mar 12, 2018

kantologist commented Mar 12, 2018 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche left a comment

jorisvandenbossche commented Mar 13, 2018

		join : {'left'}, default 'left'
		Indicates which column values overwrite.

DOC: update the DataFrame.update() docstring #20201

DOC: update the DataFrame.update() docstring #20201

Conversation

kantologist commented Mar 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 12, 2018 • edited Loading

Codecov Report

TomAugspurger commented Mar 12, 2018

kantologist commented Mar 12, 2018 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 13, 2018

codecov bot commented Mar 12, 2018 •

edited

Loading

kantologist commented Mar 12, 2018 •

edited

Loading