Skip to content

pd.expanding is incorrectly calculating window size when axis=1 #13753

Closed
@seanlaw

Description

@seanlaw

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [0, 1, 2, np.nan, 4], 
                   'B': [0, 1, 2, np.nan, 4], 
                   'C': [0, 1, 2, np.nan, 4], 
                   'D': [0, 1, 2, np.nan, 4], 
                   'E': [0, 1, 2, np.nan, 4], 
                   'F': [0, 1, 2, np.nan, 4]})

print df.expanding(axis=1).sum()

Expected Output

     A    B     C     D     E     F
0  0.0  0.0   0.0   0.0   0.0   0.0
1  1.0  2.0   3.0   4.0   5.0   5.0
2  2.0  4.0   6.0   8.0  10.0  10.0
3  NaN  NaN   NaN   NaN   NaN   NaN
4  4.0  8.0  12.0  16.0  20.0  20.0

However, the correct result should be:

     A    B     C     D     E     F
0  0.0  0.0   0.0   0.0   0.0   0.0
1  1.0  2.0   3.0   4.0   5.0   6.0
2  2.0  4.0   6.0   8.0  10.0  12.0
3  NaN  NaN   NaN   NaN   NaN   NaN
4  4.0  8.0  12.0  16.0  20.0  24.0

Notice that the last column E is different. I've tracked this down and found that the _get_window function (for expanding) fails to return the correct number of windows when the following conditions are met:

  1. axis=1 is used instead of axis=0 (default)
  2. The number of rows in the dataframe is less than the number of columns

This is caused by the fact that the object is using len(obj) in determining the window size. Instead, it should be using obj.shape[self.axis]

output of pd.show_versions()


commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.18.1+237.ge357ea1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: 0.7.0
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions