Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
from scipy import signal
def equal(win_len: int, **kwargs):
assert win_len == 5
return np.array([1.0, 1.0, 1.0, 1.0, 1.0])
def linear(win_len: int, **kwargs):
assert win_len == 5
return np.array([0.2, 0.4, 0.6, 0.8, 1.0])
signal.windows.equal_weight = equal
signal.windows.linear_decay = linear
s = pd.Series([1.0, 0.0] * 10)
print(list(s))
print(list(s.rolling(window=5).var(ddof=0)))
print(list(s.rolling(window=5, win_type="equal_weight").var(ddof=0)))
print(list(s.rolling(window=5, win_type="linear_decay").var(ddof=0)))
Issue Description
The 4 prints of the code snippet generates the following
[1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0]
[nan, nan, nan, nan, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24]
[nan, nan, nan, nan, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24]
[nan, nan, nan, nan, 0.24000000000000005, 0.24888888888888897, 0.2222222222222223, 0.24888888888888902, 0.1955555555555556, 0.24000000000000007, 0.248888888888889, 0.22222222222222232, 0.24888888888888897, 0.19555555555555562, 0.23999999999999996, 0.24888888888888888, 0.2222222222222222, 0.24888888888888897, 0.1955555555555555, 0.23999999999999996]
However the last line is wrong. The alternating 1/0 sequence, when in a window of size 5 (i.e. either 10101
or 01010
), should have a variance of 0.24 when the weight is either all equal or linear decay (feel free to verify this). This means the last 3 prints should all generate
[nan, nan, nan, nan, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24]
but it's not the case as the elments in the last line are pretty off 0.24 - the first non-Nan value is close enough, but the four numbers after that are pretty off.
Expected Behavior
list(s.rolling(window=5, win_type="linear_decay").var(ddof=0))
should be
[nan, nan, nan, nan, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24]
or something really close to that
Installed Versions
INSTALLED VERSIONS
commit : 0f43794
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1042-azure
Version : #49-Ubuntu SMP Tue Jul 11 17:28:46 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.3
numpy : 1.21.5
pytz : 2022.1
dateutil : 2.8.2
setuptools : 59.6.0
pip : 22.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.31.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
Note the issue is also there if you use the latest version of numpy
and scipy
(which are 1.25.2
and 1.11.1
respectively as of filing)