Skip to content

DOC: update the DataFrame.update() docstring #20201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 13, 2018

Conversation

kantologist
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.update)  #####################
################################################################################

Modify DataFrame in place using non-NA values from passed
DataFrame.

Aligns on indices.

Parameters
----------
other : DataFrame, or object coercible into a DataFrame
    Index should be similar to one of the columns in this one. If a
    Series is passed, its name attribute must be set, and that will be
    used as the column name in the resulting joined DataFrame.
join : {'left'}, default 'left'
    Indicates which column values overwrite.
overwrite : boolean, default True
    If True then overwrite values for common keys in the calling frame.
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
    Can choose to replace values other than NA. Return True for values
    that should be updated.
raise_conflict : boolean
    If True, will raise an error if the DataFrame and other both
    contain data in the same place.

Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
   A  B
0  a  d
1  b  y
2  c  e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
   A  B
0  a  x
1  b  d
2  c  e

If ``other`` contains NaNs the corresponding values are not updated
in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
   A      B
0  1    4.0
1  2  500.0
2  3    6.0

See also
--------
DataFrame.merge : For column(s)-on-columns(s) operations

Returns
-------
updated : DataFrame

################################################################################
################################## Validation ##################################
################################################################################

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@@ -4207,17 +4207,23 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
raise_conflict=False):
"""
Modify DataFrame in place using non-NA values from passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May have to reword this so the first line isn't too long? Does the validation script complain about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation script actually did not complain, but I'll try to adjust it.

join : {'left'}, default 'left'
Indicates which column values overwrite.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is join='right' valid? Can you do some exploration on what happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not valid. I tried it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can say "Only left join is implemented, keeping the index and columns of the original object"

@@ -4276,6 +4282,14 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
0 1 4.0
1 2 500.0
2 3 6.0

See also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also (note capitalization)

should go before the examples.


Returns
-------
updated : DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerturns goes right after the Parameters section.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more explanation between the different examples ?

raise_conflict : boolean
If True, will raise an error if the DataFrame and other both
contain data in the same place.

Returns
-------
updated : DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does not return a DataFrame, as the update is in place.

join : {'left'}, default 'left'
Indicates which column values overwrite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can say "Only left join is implemented, keeping the index and columns of the original object"

overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame
If True then overwrite values for common keys in the calling frame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what if False?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it that case only NA values in the calling object are updated

@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

Merging #20201 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20201      +/-   ##
==========================================
+ Coverage   91.73%   91.76%   +0.02%     
==========================================
  Files         150      150              
  Lines       49144    49144              
==========================================
+ Hits        45083    45095      +12     
+ Misses       4061     4049      -12
Flag Coverage Δ
#multiple 90.14% <100%> (+0.02%) ⬆️
#single 41.9% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <100%> (ø) ⬆️
pandas/core/resample.py 96.43% <0%> (ø) ⬆️
pandas/core/series.py 93.84% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.03% <0%> (ø) ⬆️
pandas/core/groupby.py 92.14% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/core/window.py 96.3% <0%> (ø) ⬆️
pandas/plotting/_converter.py 66.81% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b00c5...ce18a26. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Made a couple updates. Mind taking a look @kantologist?

@kantologist
Copy link
Contributor Author

kantologist commented Mar 12, 2018

So the only issue is the doc validation script complaining about the absent of a return section. by the way I observed the same pattern of return section for other inplace functions too.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only issue is the doc validation script complaining about the absent of a return section. by the way I observed the same pattern of return section for other inplace functions too.

It's fine to ignore those errors in this case (the methods that don't return something are a minority)


Parameters
----------
other : DataFrame, or object coercible into a DataFrame
Index should be similar to one of the columns in this one. If a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand "Index should be similar to one of the columns in this one". What do you want to say here?

If True then overwrite values for common keys in the calling frame
How to handle non-NA values for overlapping keys.

* True : overwrite values in `self` with values from `other`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid using 'self', as not necessarily every pandas user knows how classes are written.

There is not always an ideal alternative, but I would use "original object" or "calling object" (or use DataFrame instead of object)


* True : overwrite values in `self` with values from `other`.
* False : only update values that are NA in `self`.

filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default None" -> "optional"

@@ -4350,6 +4370,8 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
1 2 5
2 3 6

The DataFrame's length does not increase as a result of the update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add something like ", only values at matching index/column labels are updated"

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Fixed up some minor formatting issues

@jorisvandenbossche jorisvandenbossche merged commit 21ae073 into pandas-dev:master Mar 13, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 13, 2018
@jorisvandenbossche
Copy link
Member

Thanks for the PR @kantologist !

@kantologist kantologist deleted the docstring_update branch March 14, 2018 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants