Skip to content

API: sum of Series of all NaN should return 0 or NaN ? #9422

Closed
@shoyer

Description

@shoyer

Summary

The question is what the sum of a Series of all NaNs should return (which is equivalent to an empty Series after skipping the NaNs): NaN or 0?

In [1]: s = Series([np.nan])                 

In [2]: s.sum(skipna=True)  # skipping NaNs is the default
Out[2]: nan or 0     <---- DISCUSSION POINT

In [3]: s.sum(skipna=False)
Out[3]: nan

The reason this is a discussion point has the following cause: the internal nansum implementation of pandas returns NaN. But, when bottleneck is installed, pandas will use bottlenecks implementation of nansum, which returns 0 (for the versions >= 1.0).
Bottleneck changed the behaviour from returning NaN to returning 0 to model it after numpy's nansum function.

This has the very annoying consequence that depending on whether bottleneck is installed or not (which is only an optional dependency), you get a different behaviour.

So the decision we need to make, is to either:

  • adapt pandas internal implementation to return 0, so in all cases 0 is returned for all NaN/empty series.
  • workaround bottlenecks behaviour or not use it for nansum, in order to consistently return NaN instead of 0
  • choose one of both above as the default, but have an option to switch behaviour

Original title: nansum in bottleneck 1.0 will return 0 for all NaN arrays instead of NaN

xref pydata/bottleneck#96
xref #9421

This matches a change from numpy 1.8 -> 1.9.

We should address this for pandas 0.16.

Should we work around the new behavior (probably the simplest choice) or change nansum in pandas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions