Skip to content

ENH: make Series.ptp() handle missing values #11163

Closed
@ajcr

Description

@ajcr

Currently (in master), Series.ptp() is just implemented using np.ptp() and so the method will return nan for any Series that has one or more missing values:

>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan

It is simple to write s.max() - s.min() instead, but the ptp() result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp() method to ignore NaN.

If there is any agreement as to whether ptp() should be changed, I would like to work on a pull request!


Extending the idea, it might be useful to have both DataFrame.ptp() and groupby.ptp() methods.

For this example DataFrame...

df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
                   'b': [3, 11, 72, 46, 32],
                   'c': [1.2, 6.7, 13.9, np.nan, -7.7],
                   'd': ['v', 'w', 'x', 'y', 'z']})

...I would expect the following behaviour:

>>> df.ptp()
a      1
b     69
c   12.7
dtype: float64

>>> df.ptp(axis=1)
0     2.0
1     9.0
2    70.0
3    45.0
4    39.7
dtype: float64

>>> df.groupby('a').ptp()
    b    c
a         
1  43  8.9
2  61  7.2

Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions