Description
Summary
The question is what the sum of a Series of all NaNs should return (which is equivalent to an empty Series after skipping the NaNs): NaN or 0?
In [1]: s = Series([np.nan])
In [2]: s.sum(skipna=True) # skipping NaNs is the default
Out[2]: nan or 0 <---- DISCUSSION POINT
In [3]: s.sum(skipna=False)
Out[3]: nan
The reason this is a discussion point has the following cause: the internal nansum implementation of pandas returns NaN. But, when bottleneck is installed, pandas will use bottlenecks implementation of nansum, which returns 0 (for the versions >= 1.0).
Bottleneck changed the behaviour from returning NaN to returning 0 to model it after numpy's nansum function.
This has the very annoying consequence that depending on whether bottleneck is installed or not (which is only an optional dependency), you get a different behaviour.
So the decision we need to make, is to either:
- adapt pandas internal implementation to return 0, so in all cases 0 is returned for all NaN/empty series.
- workaround bottlenecks behaviour or not use it for nansum, in order to consistently return NaN instead of 0
- choose one of both above as the default, but have an option to switch behaviour
Original title: nansum in bottleneck 1.0 will return 0 for all NaN arrays instead of NaN
xref pydata/bottleneck#96
xref #9421
- Tests are turned off for bottleneck >1.0 (xref TST: test_nanops turns off bottneck for all tests after #10986)
This matches a change from numpy 1.8 -> 1.9.
We should address this for pandas 0.16.
Should we work around the new behavior (probably the simplest choice) or change nansum in pandas?