Skip to content

DOC: Series.diff with boolean dtype does not return a series of dtype float #57565

Open
@from-nowhere

Description

@from-nowhere

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html#pandas.Series.diff
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html#pandas.DataFrame.diff

Documentation problem

The documentation for pandas.Series.diff and pandas.DataFrame.diff states that no matter the dtype of the original series/column, the output will be of dtype float64. This is not true for series/columns of dtypes bool -- the output here is of dtype object.

For example:

import pandas as pd
# pd.__version__ == '2.2.0'

s = pd.Series([True, True, False, False, True])
d = s.diff()

# d.dtype is now 'object'

Indeed, the underlying function algorithms.diff explicitly differentiates between boolean and integer dtypes.

Suggested fix for documentation

The Notes section should read something like this:

Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub` and the result's dtype is ``object``.
Otherwise, the result is calculated according to the current dtype in {klass},
however the dtype of the result is always float64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsDtype ConversionsUnexpected or buggy dtype conversionsNeeds DiscussionRequires discussion from core team before further actionTransformationse.g. cumsum, diff, rank

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions