Skip to content

BUG: Inconsistencies in calculating second moments of a single value #7900

Closed
@seth-p

Description

@seth-p

I noticed (in #7884) that ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, and rolling_std return 0.0 for a single value (assuming min_periods=0); whereas Series.std, Series.var, ewmcorr, expanding_cov, expanding_corr, rolling_cov, and rolling_corr all return NaN for a single value. expanding_std and expanding_var produce Value Error: min_periods (2) must be <= window (1).

I think all of these should all return NaN for a single value. At any rate, I would expect greater consistency one way or the other.

Mildly related, when calculating the correlation of a constant series with itself, Series.corr(), expanding_corr, and rolling_corr return NaN, while ewmcorr sometimes returns NaN, sometimes 1 and sometimes -1, due to numerical accuracy issues. Actually, as shown in a separate comment below, rolling_corr also produces inconsistent results for a constant subseries following different prior values.

Inconsistencies in calculating second moments of a single point:

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pandas import Series, ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, rolling_std, ewmcorr, expanding_cov, expanding_corr,
 expanding_std, expanding_var, rolling_cov, rolling_corr
In [2]: s = Series([1.])

In [3]: ewmvar(s, halflife=2., min_periods=0)
Out[3]:
0    0
dtype: float64

In [4]: ewmstd(s, halflife=2., min_periods=0)
Out[4]:
0    0
dtype: float64

In [5]: ewmvol(s, halflife=2., min_periods=0)
Out[5]:
0    0
dtype: float64

In [6]: ewmcov(s, s, halflife=2., min_periods=0)
Out[6]:
0    0
dtype: float64

In [7]: rolling_var(s, window=2, min_periods=0)
Out[7]:
0    0
dtype: float64

In [8]: rolling_std(s, window=2, min_periods=0)
Out[8]:
0    0
dtype: float64

In [9]: s.std()
Out[9]: nan

In [10]: s.var()
Out[10]: nan

In [11]: ewmcorr(s, s, halflife=2., min_periods=0)
Out[11]:
0   NaN
dtype: float64

In [12]: expanding_cov(s, s, min_periods=0)
Out[12]:
0   NaN
dtype: float64

In [13]: expanding_corr(s, s, min_periods=0)
Out[13]:
0   NaN
dtype: float64

In [16]: rolling_cov(s, s, window=3, min_periods=0)
Out[16]:
0   NaN
dtype: float64

In [17]: rolling_corr(s, s, window=3, min_periods=0)
Out[17]:
0   NaN
dtype: float64


In [14]: expanding_std(s, min_periods=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-7320ee579e6a> in <module>()
----> 1 expanding_std(s, min_periods=0)

C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
    825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827                                center=center, **kwargs)
    828
    829     return f

C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
    345         result = np.apply_along_axis(calc, axis, values)
    346     else:
--> 347         result = calc(values)
    348
    349     rs = return_hook(result)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
    339     arg = _conv_timerule(arg, freq, how)
    340     calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341                           **kwds)
    342     return_hook, values = _process_data_structure(arg)
    343     # actually calculate the moment. Faster way to do this?

C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
    823         def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
    824             minp = check_minp(minp, window)
--> 825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
    827                                center=center, **kwargs)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(*a, **kw)
    604                                how='median')
    605
--> 606 _ts_std = lambda *a, **kw: _zsqrt(algos.roll_var(*a, **kw))
    607 rolling_std = _rolling_func(_ts_std, 'Unbiased moving standard deviation.',
    608                             check_minp=_require_min_periods(1))

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()

ValueError: min_periods (2) must be <= window (1)

In [15]: expanding_var(s, min_periods=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-28e2018aee68> in <module>()
----> 1 expanding_var(s, min_periods=0)

C:\Python34\lib\site-packages\pandas\stats\moments.py in f(arg, min_periods, freq, center, **kwargs)
    825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
--> 827                                center=center, **kwargs)
    828
    829     return f

C:\Python34\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, how, args, kwa
rgs, **kwds)
    345         result = np.apply_along_axis(calc, axis, values)
    346     else:
--> 347         result = calc(values)
    348
    349     rs = return_hook(result)

C:\Python34\lib\site-packages\pandas\stats\moments.py in <lambda>(x)
    339     arg = _conv_timerule(arg, freq, how)
    340     calc = lambda x: func(x, window, minp=minp, args=args, kwargs=kwargs,
--> 341                           **kwds)
    342     return_hook, values = _process_data_structure(arg)
    343     # actually calculate the moment. Faster way to do this?

C:\Python34\lib\site-packages\pandas\stats\moments.py in call_cython(arg, window, minp, args, kwargs, **kwds)
    823         def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
    824             minp = check_minp(minp, window)
--> 825             return func(arg, window, minp, **kwds)
    826         return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
    827                                center=center, **kwargs)

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos.roll_var (pandas\algos.c:28990)()

C:\Python34\lib\site-packages\pandas\algos.pyd in pandas.algos._check_minp (pandas\algos.c:16394)()

ValueError: min_periods (2) must be <= window (1)


In [20]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.2
numpy: 1.9.0b1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

Instability in ewmcorr of a constant series with itself:

In [1]: from pandas import Series, ewmcorr, expanding_corr, rolling_corr

In [2]: s = Series([5., 5., 5.])

In [3]: s.corr(s)
Out[3]: nan

In [4]: expanding_corr(s, s)
Out[4]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [5]: rolling_corr(s, s, window=3)
Out[5]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [6]: ewmcorr(s, s, halflife=3.)
Out[6]:
0   -1
1   -1
2   -1
dtype: float64

In [9]: ewmcorr(Series([3., 3., 3.]), halflife=3.)
Out[9]:
0   NaN
1     1
2     1
dtype: float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions