Skip to content

ENH: Add support for calculating EWMA with a time component #34839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Jul 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
f220181
Add documentaiton for new time and half-life parameters
Jun 9, 2020
a3d490f
clarify only for mean
Jun 9, 2020
f90f8a7
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 10, 2020
c7ce08b
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 10, 2020
66c9799
update docstring and calculate distances in the __init__
Jun 12, 2020
144bcaf
refactor distances, remove note about only applying to mean
Jun 12, 2020
4297cae
Have ewm functions accept distances
Jun 12, 2020
d200bb7
Only commit to mean
Jun 12, 2020
f3e3b6b
fetch an ewma_time_function
Jun 12, 2020
7ee69be
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 14, 2020
be0ad74
Fill out more ewm_time
Jun 14, 2020
28bf607
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 14, 2020
52fdea5
Add typing for new arguments
Jun 14, 2020
6059bc2
Clarify docs
Jun 14, 2020
78f3399
Grammar
Jun 14, 2020
f3925b4
Finish algorithm, clean constructors
Jun 14, 2020
9b53518
Begin adding tests
Jun 14, 2020
77d8c9d
Adjust some parameters to fix tests
Jun 15, 2020
6b630b3
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 17, 2020
4a214a1
Add result of ewm operation
Jun 17, 2020
7276505
lint
Jun 17, 2020
62b9050
isort
Jun 17, 2020
4d1535f
Add spacing in colons
Jun 17, 2020
5300594
Add whatsnew entry
Jun 17, 2020
1f085f1
Lint cython file
Jun 17, 2020
693bee2
Fix typing validations
Jun 17, 2020
1a4c61f
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 17, 2020
93bcb3d
lint
Jun 17, 2020
ae8c7d2
spelling
Jun 17, 2020
7a707a2
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 19, 2020
016764f
Add versionadded tags
Jun 19, 2020
d809e51
Get correct error message
Jun 19, 2020
37962fb
Merge split string
Jun 19, 2020
90e2a28
black file
Jun 19, 2020
5e45c3f
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 21, 2020
e2e875f
Correct algo to match base case with halflife=1
Jun 21, 2020
4a9c494
Include nans in the test
Jun 21, 2020
5a1f244
Add test with variable spaced times
Jun 21, 2020
b9b3dfe
Add doc example
Jun 21, 2020
76a02b4
change EWM abbreviation
Jun 21, 2020
2d681aa
Lint
Jun 22, 2020
067e4fb
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 23, 2020
d0dff28
Add documentation to computation.rst
Jun 23, 2020
7477f0e
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 25, 2020
39c241d
let center_of_mass_calculation raise any halflife errors
Jun 25, 2020
999cffb
Add back halflife check
Jun 26, 2020
988a378
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 26, 2020
58ca9fc
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 26, 2020
c430349
Change halflife check
Jun 26, 2020
a5c06de
Merge remote-tracking branch 'upstream/master' into time_ewma
Jun 27, 2020
e540287
Add ASV
Jun 27, 2020
a8e86e3
Test timezones
Jun 27, 2020
ed141d3
Lint
Jun 28, 2020
1debd80
Merge remote-tracking branch 'upstream/master' into time_ewma
Jul 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions asv_bench/benchmarks/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,18 @@ class EWMMethods:
def setup(self, constructor, window, dtype, method):
N = 10 ** 5
arr = (100 * np.random.random(N)).astype(dtype)
times = pd.date_range("1900", periods=N, freq="23s")
self.ewm = getattr(pd, constructor)(arr).ewm(halflife=window)
self.ewm_times = getattr(pd, constructor)(arr).ewm(
halflife="1 Day", times=times
)

def time_ewm(self, constructor, window, dtype, method):
getattr(self.ewm, method)()

def time_ewm_times(self, constructor, window, dtype, method):
self.ewm.mean()


class VariableWindowMethods(Methods):
params = (
Expand Down
19 changes: 19 additions & 0 deletions doc/source/user_guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,25 @@ and **alpha** to the EW functions:
one half.
* **Alpha** specifies the smoothing factor directly.

.. versionadded:: 1.1.0

You can also specify ``halflife`` in terms of a timedelta convertible unit to specify the amount of
time it takes for an observation to decay to half its value when also specifying a sequence
of ``times``.

.. ipython:: python

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df
times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()

The following formula is used to compute exponentially weighted mean with an input vector of times:

.. math::

y_t = \frac{\sum_{i=0}^t 0.5^\frac{t_{t} - t_{i}}{\lambda} x_{t-i}}{0.5^\frac{t_{t} - t_{i}}{\lambda}},

Here is an example for a univariate time series:

.. ipython:: python
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ Other enhancements
- :meth:`DataFrame.to_excel` can now also write OpenOffice spreadsheet (.ods) files (:issue:`27222`)
- :meth:`~Series.explode` now accepts ``ignore_index`` to reset the index, similarly to :meth:`pd.concat` or :meth:`DataFrame.sort_values` (:issue:`34932`).
- :meth:`read_csv` now accepts string values like "0", "0.0", "1", "1.0" as convertible to the nullable boolean dtype (:issue:`34859`)
- :class:`pandas.core.window.ExponentialMovingWindow` now supports a ``times`` argument that allows ``mean`` to be calculated with observations spaced by the timestamps in ``times`` (:issue:`34839`)

.. ---------------------------------------------------------------------------

Expand Down
61 changes: 53 additions & 8 deletions pandas/_libs/window/aggregations.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ from libc.stdlib cimport malloc, free

import numpy as np
cimport numpy as cnp
from numpy cimport ndarray, int64_t, float64_t, float32_t
from numpy cimport ndarray, int64_t, float64_t, float32_t, uint8_t
cnp.import_array()


Expand Down Expand Up @@ -1752,6 +1752,51 @@ def roll_weighted_var(float64_t[:] values, float64_t[:] weights,
# ----------------------------------------------------------------------
# Exponentially weighted moving average

def ewma_time(ndarray[float64_t] vals, int minp, ndarray[int64_t] times,
int64_t halflife):
"""
Compute exponentially-weighted moving average using halflife and time
distances.

Parameters
----------
vals : ndarray[float_64]
minp : int
times : ndarray[int64]
halflife : int64

Returns
-------
ndarray
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a followup it should be possible to implement ewma with this one

cdef:
Py_ssize_t i, num_not_nan = 0, N = len(vals)
bint is_not_nan
float64_t last_result
ndarray[uint8_t] mask = np.zeros(N, dtype=np.uint8)
ndarray[float64_t] weights, observations, output = np.empty(N, dtype=np.float64)

if N == 0:
return output

last_result = vals[0]

for i in range(N):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you you put this in a nogil block? do we have timings on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to nogil this but could not since we are doing array arithmetic & broadcasting.

            weights = 0.5 ** ((times[i] - times[mask.view(np.bool_)]) / halflife)
            observations = vals[mask.view(np.bool_)]
            last_result = np.sum(weights * observations) / np.sum(weights)

I can add an ASV for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure on the asv.

I think you can convert the mask view to a np.take operation (but can always just add an issue for this) if needed

is_not_nan = vals[i] == vals[i]
num_not_nan += is_not_nan
if is_not_nan:
mask[i] = 1
weights = 0.5 ** ((times[i] - times[mask.view(np.bool_)]) / halflife)
observations = vals[mask.view(np.bool_)]
last_result = np.sum(weights * observations) / np.sum(weights)

if num_not_nan >= minp:
output[i] = last_result
else:
output[i] = NaN

return output


def ewma(float64_t[:] vals, float64_t com, bint adjust, bint ignore_na, int minp):
"""
Expand All @@ -1761,9 +1806,9 @@ def ewma(float64_t[:] vals, float64_t com, bint adjust, bint ignore_na, int minp
----------
vals : ndarray (float64 type)
com : float64
adjust: int
ignore_na: bool
minp: int
adjust : int
ignore_na : bool
minp : int

Returns
-------
Expand Down Expand Up @@ -1831,10 +1876,10 @@ def ewmcov(float64_t[:] input_x, float64_t[:] input_y,
input_x : ndarray (float64 type)
input_y : ndarray (float64 type)
com : float64
adjust: int
ignore_na: bool
minp: int
bias: int
adjust : int
ignore_na : bool
minp : int
bias : int

Returns
-------
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10518,6 +10518,7 @@ def ewm(
adjust=True,
ignore_na=False,
axis=0,
times=None,
):
axis = self._get_axis_number(axis)
return ExponentialMovingWindow(
Expand All @@ -10530,6 +10531,7 @@ def ewm(
adjust=adjust,
ignore_na=ignore_na,
axis=axis,
times=times,
)

cls.ewm = ewm
Expand Down
99 changes: 85 additions & 14 deletions pandas/core/window/ewm.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
import datetime
from functools import partial
from textwrap import dedent
from typing import Optional, Union

import numpy as np

from pandas._libs.tslibs import Timedelta
import pandas._libs.window.aggregations as window_aggregations
from pandas._typing import FrameOrSeries
from pandas._typing import FrameOrSeries, TimedeltaConvertibleTypes
from pandas.compat.numpy import function as nv
from pandas.util._decorators import Appender, Substitution, doc

from pandas.core.dtypes.common import is_datetime64_ns_dtype
from pandas.core.dtypes.generic import ABCDataFrame

from pandas.core.base import DataError
import pandas.core.common as com
import pandas.core.common as common
from pandas.core.window.common import _doc_template, _shared_docs, zsqrt
from pandas.core.window.rolling import _flex_binary_moment, _Rolling

Expand All @@ -32,7 +35,7 @@ def get_center_of_mass(
halflife: Optional[float],
alpha: Optional[float],
) -> float:
valid_count = com.count_not_none(comass, span, halflife, alpha)
valid_count = common.count_not_none(comass, span, halflife, alpha)
if valid_count > 1:
raise ValueError("comass, span, halflife, and alpha are mutually exclusive")

Expand Down Expand Up @@ -76,10 +79,17 @@ class ExponentialMovingWindow(_Rolling):
span : float, optional
Specify decay in terms of span,
:math:`\alpha = 2 / (span + 1)`, for :math:`span \geq 1`.
halflife : float, optional
halflife : float, str, timedelta, optional
Specify decay in terms of half-life,
:math:`\alpha = 1 - \exp\left(-\ln(2) / halflife\right)`, for
:math:`halflife > 0`.

If ``times`` is specified, the time unit (str or timedelta) over which an
observation decays to half its value. Only applicable to ``mean()``
and halflife value will not apply to the other functions.

.. versionadded:: 1.1.0

alpha : float, optional
Specify smoothing factor :math:`\alpha` directly,
:math:`0 < \alpha \leq 1`.
Expand Down Expand Up @@ -124,6 +134,18 @@ class ExponentialMovingWindow(_Rolling):
axis : {0, 1}, default 0
The axis to use. The value 0 identifies the rows, and 1
identifies the columns.
times : str, np.ndarray, Series, default None

.. versionadded:: 1.1.0

Times corresponding to the observations. Must be monotonically increasing and
``datetime64[ns]`` dtype.

If str, the name of the column in the DataFrame representing the times.

If 1-D array like, a sequence with the same shape as the observations.

Only applicable to ``mean()``.

Returns
-------
Expand Down Expand Up @@ -159,6 +181,17 @@ class ExponentialMovingWindow(_Rolling):
2 1.615385
3 1.615385
4 3.670213

Specifying ``times`` with a timedelta ``halflife`` when computing mean.

>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
B
0 0.000000
1 0.585786
2 1.523889
3 1.523889
4 3.233686
"""

_attributes = ["com", "min_periods", "adjust", "ignore_na", "axis"]
Expand All @@ -168,20 +201,49 @@ def __init__(
obj,
com: Optional[float] = None,
span: Optional[float] = None,
halflife: Optional[float] = None,
halflife: Optional[Union[float, TimedeltaConvertibleTypes]] = None,
alpha: Optional[float] = None,
min_periods: int = 0,
adjust: bool = True,
ignore_na: bool = False,
axis: int = 0,
times: Optional[Union[str, np.ndarray, FrameOrSeries]] = None,
):
self.com: Optional[float]
self.obj = obj
self.com = get_center_of_mass(com, span, halflife, alpha)
self.min_periods = max(int(min_periods), 1)
self.adjust = adjust
self.ignore_na = ignore_na
self.axis = axis
self.on = None
if times is not None:
if isinstance(times, str):
times = self._selected_obj[times]
if not is_datetime64_ns_dtype(times):
raise ValueError("times must be datetime64[ns] dtype.")
if len(times) != len(obj):
raise ValueError("times must be the same length as the object.")
if not isinstance(halflife, (str, datetime.timedelta)):
raise ValueError(
"halflife must be a string or datetime.timedelta object"
)
self.times = np.asarray(times.astype(np.int64))
self.halflife = Timedelta(halflife).value
# Halflife is no longer applicable when calculating COM
# But allow COM to still be calculated if the user passes other decay args
if common.count_not_none(com, span, alpha) > 0:
self.com = get_center_of_mass(com, span, None, alpha)
else:
self.com = None
else:
if halflife is not None and isinstance(halflife, (str, datetime.timedelta)):
raise ValueError(
"halflife can only be a timedelta convertible argument if "
"times is not None."
)
self.times = None
self.halflife = None
self.com = get_center_of_mass(com, span, halflife, alpha)

@property
def _constructor(self):
Expand Down Expand Up @@ -277,14 +339,23 @@ def mean(self, *args, **kwargs):
Arguments and keyword arguments to be passed into func.
"""
nv.validate_window_func("mean", args, kwargs)
window_func = self._get_roll_func("ewma")
window_func = partial(
window_func,
com=self.com,
adjust=self.adjust,
ignore_na=self.ignore_na,
minp=self.min_periods,
)
if self.times is not None:
window_func = self._get_roll_func("ewma_time")
window_func = partial(
window_func,
minp=self.min_periods,
times=self.times,
halflife=self.halflife,
)
else:
window_func = self._get_roll_func("ewma")
window_func = partial(
window_func,
com=self.com,
adjust=self.adjust,
ignore_na=self.ignore_na,
minp=self.min_periods,
)
return self._apply(window_func)

@Substitution(name="ewm", func_name="std")
Expand Down
8 changes: 7 additions & 1 deletion pandas/tests/window/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from datetime import datetime
from datetime import datetime, timedelta

import numpy as np
from numpy.random import randn
Expand Down Expand Up @@ -302,3 +302,9 @@ def series():
def which(request):
"""Turn parametrized which as fixture for series and frame"""
return request.param


@pytest.fixture(params=["1 day", timedelta(days=1)])
def halflife_with_times(request):
"""Halflife argument for EWM when times is specified."""
return request.param
Loading