Skip to content

ENH: passing custom win_type to rolling window #51540

Open
@nvasilevv

Description

@nvasilevv

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

This change would allow the user to define and pass custom window types to win_type for a rolling window. The current implementation relies on the set of functions defined in scipy.window. This could be viewed as a somewhat of a restriction. The implementation of win_type is very fast and would be nice to be open to the public. The way I see it, the currently available alternative is to create a function which generates the weights and applies them to a pd.Series object. Then this function could be passed to a .rolling().apply(lambda s: func(s)). Though possible, my experience is that this could be slow even when engine='numba' is passed as kwargs.

if not isinstance(self.win_type, str):
raise ValueError(f"Invalid win_type {self.win_type}")
signal = import_optional_dependency(
"scipy.signal", extra="Scipy is required to generate window weight."
)
self._scipy_weight_generator = getattr(signal, self.win_type, None)
if self._scipy_weight_generator is None:
raise ValueError(f"Invalid win_type {self.win_type}")

Feature Description

Similar to the BaseIndexer, win_type could support both str and a Callable. A subclassing could be undesirable here, though also plausible.

Then the part below should be adjusted.

if not isinstance(self.win_type, str):
raise ValueError(f"Invalid win_type {self.win_type}")
signal = import_optional_dependency(
"scipy.signal", extra="Scipy is required to generate window weight."
)
self._scipy_weight_generator = getattr(signal, self.win_type, None)
if self._scipy_weight_generator is None:
raise ValueError(f"Invalid win_type {self.win_type}")

This could become something like this

if type(self.win_type) not in [str, Callable]:
    raise ValueError(f"Invalid win_type {self.win_type}")
if isinstance(self.win_type, Callable):
    self._scipy_weight_generator = self.win_type
else:
    self._scipy_weight_generator = getattr(signal, self.win_type, None)
 if self._scipy_weight_generator is None:
    raise ValueError(f"Invalid win_type {self.win_type}")

In the end, the user will be resposible for providing a properly defined function. Kwargs are also supported, as they are at the moment as well.

Alternative Solutions

The alternative is to use rolling(...).apply(...) with custom weights and it is available at the moment. However, as mentioned above, this is slow even when numba is used and the weights function is properly vectorized. There might be another one which skips my mind at the moment 🤔.

Additional Context

Here is a minimum working example. It uses the exponential function as weights. The "built-in" win_type implementation is by far the fastest, even when compared to rolling + numba.

from scipy import signal
import numpy as np
import pandas as pd

halflife = 120
window = 500
lags = 60
tolerance = 1e-10

weights_2 = signal.windows.exponential(window, tau=-(halflife/np.log(2)), center=0, sym=False)
weights_2 = weights_2 / weights_2.sum()

df = pd.DataFrame(np.random.random(size=(1000, 1000)))
argument = 1 + df
argument[argument <= 0] = np.nan
tau = -(halflife / np.log(2))

%%time
alt_1 = np.log(argument)\
          .fillna(0)\
          .shift(lags)\
          .rolling(window=window, win_type="exponential")\
          .mean(tau=-(halflife / np.log(2)), center=0, sym=False)\
          .replace(0, np.nan)

CPU times: total: 688 ms
Wall time: 688 ms

%%time
alt_2 = np.log(argument)\
          .fillna(0)\
          .shift(lags)\
          .rolling(window)\
          .apply(lambda a: np.average(a, weights=weights_2), engine='numba', raw=True)\
          .replace(0, np.nan)

CPU times: total: 2.7 s
Wall time: 2.77 s

%%time
alt_3 = np.log(argument)\
          .fillna(0)\
          .shift(lags)\
          .rolling(window)\
          .apply(lambda a: np.average(a, weights=weights_2))\
          .replace(0, np.nan)

CPU times: total: 30.4 s
Wall time: 30.4 s

pd.testing.assert_frame_equal(alt_1, alt_2, rtol=tolerance)
pd.testing.assert_frame_equal(alt_1, alt_3, rtol=tolerance)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions