Skip to content

pandas.DataFrame.sum() skipna behavior does not match documentation #15542

Closed
@jglicoes

Description

@jglicoes

Code Sample, a copy-pastable example if possible

`
import pandas as pd

df = pd.DataFrame([[0,0],[1,1],[np.NaN,0],[np.NaN,1],[np.NaN,np.NaN]], columns = ['a','b'])

df
Out[3]:
a b
0 0.0 0.0
1 1.0 1.0
2 NaN 0.0
3 NaN 1.0
4 NaN NaN

df.sum(axis=1, skipna=True)
Out[4]:
0 0.0
1 2.0
2 0.0
3 1.0
4 0.0
dtype: float64
`

Problem description

The documentation for pandas.DataFrame.sum indicates that:

skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

However, it appears that the actual behavior under the skipna option within this particular environment is for the summation routine to evaluate full rows of na values to 0 rather than the documented value of NA.

Currently this behavior only occurs on within my local windows environment, and NOT within the linux environment my organization's grid runs on. I have provided output for pd.versions for both environments below to help localize this issue.

Expected Output

df.sum(axis=1, skipna=True)
Out[6]:
0 0.0
1 2.0
2 0.0
3 1.0
4 NaN
dtype: float64

Output of pd.show_versions()

Windows version in which issue is present

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Linux version in which issue is NOT present

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-642.15.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.4.1
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.2
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions