Description
I noticed (in #7884) that ewmvar
, ewmstd
, ewmvol
, ewmcov
, rolling_var
, and rolling_std
return 0.0
for a single value (assuming min_periods=0
); whereas Series.std
, Series.var
, ewmcorr
, expanding_cov
, expanding_corr
, rolling_cov
, and rolling_corr
all return NaN
for a single value. expanding_std
and expanding_var
produce Value Error: min_periods (2) must be <= window (1)
.
I think all of these should all return NaN
for a single value. At any rate, I would expect greater consistency one way or the other.
Mildly related, when calculating the correlation of a constant series with itself, Series.corr()
, expanding_corr
, and rolling_corr
return NaN
, while ewmcorr
sometimes returns NaN
, sometimes 1
and sometimes -1
, due to numerical accuracy issues. Actually, as shown in a separate comment below, rolling_corr
also produces inconsistent results for a constant subseries following different prior values.
Inconsistencies in calculating second moments of a single point:
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 2.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from pandas import Series, ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, rolling_std, ewmcorr, expanding_cov, expanding_corr,
expanding_std, expanding_var, rolling_cov, rolling_corr
In [2]: s = Series([1.])
In [3]: ewmvar(s, halflife=2., min_periods=0)
Out[3]:
0 0
dtype: float64
In [4]: ewmstd(s, halflife=2., min_periods=0)
Out[4]:
0 0
dtype: float64
In [5]: ewmvol(s, halflife=2., min_periods=0)
Out[5]:
0 0
dtype: float64
In [6]: ewmcov(s, s, halflife=2., min_periods=0)
Out[6]:
0 0
dtype: float64
In [7]: rolling_var(s, window=2, min_periods=0)
Out[7]:
0 0
dtype: float64
In [8]: rolling_std(s, window=2, min_periods=0)
Out[8]:
0 0
dtype: float64
In [9]: s.std()
Out[9]: nan
In [10]: s.var()
Out[10]: nan
In [11]: ewmcorr(s, s, halflife=2., min_periods=0)
Out[11]:
0 NaN
dtype: float64
In [12]: expanding_cov(s, s, min_periods=0)
Out[12]:
0 NaN
dtype: float64
In [13]: expanding_corr(s, s, min_periods=0)
Out[13]:
0 NaN
dtype: float64
In [16]: rolling_cov(s, s, window=3, min_periods=0)
Out[16]:
0 NaN
dtype: float64
In [17]: rolling_corr(s, s, window=3, min_periods=0)
Out[17]:
0 NaN
dtype: float64
In [14]: expanding_std(s, min_periods=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-7320ee579e6a> in <module>()
----> 1 expanding_std(s, min_periods=0)
C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
825 return func(arg, window, minp, **kwds)
826 return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827 center=center, **kwargs)
828
829 return f
C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
345 result = np.apply_along_axis(calc, axis, values)
346 else:
--> 347 result = calc(values)
348
349 rs = return_hook(result)
C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
339 arg = _conv_timerule(arg, freq, how)
340 calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341 **kwds)
342 return_hook, values = _process_data_structure(arg)
343 # actually calculate the moment. Faster way to do this?
C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
823 def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
824 minp = check_minp(minp, window)
--> 825 return func(arg, window, minp, **kwds)
826 return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
827 center=center, **kwargs)
C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(*a, **kw)
604 how='median')
605
--> 606 _ts_std = lambda *a, **kw: _zsqrt(algos.roll_var(*a, **kw))
607 rolling_std = _rolling_func(_ts_std, 'Unbiased moving standard deviation.',
608 check_minp=_require_min_periods(1))
C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()
C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()
ValueError: min_periods (2) must be <= window (1)
In [15]: expanding_var(s, min_periods=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-28e2018aee68> in <module>()
----> 1 expanding_var(s, min_periods=0)
C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
825 return func(arg, window, minp, **kwds)
826 return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827 center=center, **kwargs)
828
829 return f
C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
345 result = np.apply_along_axis(calc, axis, values)
346 else:
--> 347 result = calc(values)
348
349 rs = return_hook(result)
C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
339 arg = _conv_timerule(arg, freq, how)
340 calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341 **kwds)
342 return_hook, values = _process_data_structure(arg)
343 # actually calculate the moment. Faster way to do this?
C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
823 def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
824 minp = check_minp(minp, window)
--> 825 return func(arg, window, minp, **kwds)
826 return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
827 center=center, **kwargs)
C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()
C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()
ValueError: min_periods (2) must be <= window (1)
In [20]: show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.2
numpy: 1.9.0b1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
Instability in ewmcorr
of a constant series with itself:
In [1]: from pandas import Series, ewmcorr, expanding_corr, rolling_corr
In [2]: s = Series([5., 5., 5.])
In [3]: s.corr(s)
Out[3]: nan
In [4]: expanding_corr(s, s)
Out[4]:
0 NaN
1 NaN
2 NaN
dtype: float64
In [5]: rolling_corr(s, s, window=3)
Out[5]:
0 NaN
1 NaN
2 NaN
dtype: float64
In [6]: ewmcorr(s, s, halflife=3.)
Out[6]:
0 -1
1 -1
2 -1
dtype: float64
In [9]: ewmcorr(Series([3., 3., 3.]), halflife=3.)
Out[9]:
0 NaN
1 1
2 1
dtype: float64