Skip to content

DOC: 'replace' docstring lacking / too complex #17673

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

I think the replace docstring is lacking in many ways (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html):

  • The explanation of to_replace keyword is both way too complex and lacking an explanation of the simple cases:
    • the most simple case of a scalar value is not clearly mentioned (like df.replace(to_replace=0, replace=1), it is mentioned in the 'str' explanation, but it is not specific to strings)
    • the simplest dict case of df.replace({to_replace: replacement}) is not mentioned (the dict explanation starts with explanation of nested dicts)
    • I would personally rewrite this whole explanation of this keyword, start with basic cases, and only after that (or in the notes) explain the complex cases.
  • There is a reference to the examples section for "examples of each of those", but there is no examples section. We should add one.
  • In the 'see also' section it references reindex, asfreq and fillna. fillna is fine, but I fail to see the link with the first two. I would rather add a reference to where to replace values based on a boolean condition (and the 'see also' should not just refer to the other methods, but also include a sentence on why / the difference)
  • The docstring also uses NDFrame, and this should never be in a public docstring (failing substituion of docstring in generic)
  • I would personally also write separate docstrings for the series and dataframe case. This will give some duplication, but I think this gives room to simplify the docstring (or certainly for the simpler Series.replace case). (xref Series.replace and DataFrame.replace have same docstring? #13852)

See the tutorial docs (https://pandas.pydata.org/pandas-docs/stable/missing_data.html#replacing-generic-values) with some actual examples.

Underlying reason is that this function of course can do way too many things at the same time (or the same things in too many different ways) ... (orthogonal to this, we could maybe also think if certain functionality could be moved into its own function).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions