Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
This change would allow the user to define and pass custom window types to win_type
for a rolling window. The current implementation relies on the set of functions defined in scipy.window
. This could be viewed as a somewhat of a restriction. The implementation of win_type
is very fast and would be nice to be open to the public. The way I see it, the currently available alternative is to create a function which generates the weights and applies them to a pd.Series object. Then this function could be passed to a .rolling().apply(lambda s: func(s))
. Though possible, my experience is that this could be slow even when engine='numba'
is passed as kwargs.
pandas/pandas/core/window/rolling.py
Lines 1136 to 1143 in 87fd0b5
Feature Description
Similar to the BaseIndexer
, win_type
could support both str
and a Callable
. A subclassing could be undesirable here, though also plausible.
Then the part below should be adjusted.
pandas/pandas/core/window/rolling.py
Lines 1136 to 1143 in 87fd0b5
This could become something like this
if type(self.win_type) not in [str, Callable]:
raise ValueError(f"Invalid win_type {self.win_type}")
if isinstance(self.win_type, Callable):
self._scipy_weight_generator = self.win_type
else:
self._scipy_weight_generator = getattr(signal, self.win_type, None)
if self._scipy_weight_generator is None:
raise ValueError(f"Invalid win_type {self.win_type}")
In the end, the user will be resposible for providing a properly defined function. Kwargs are also supported, as they are at the moment as well.
Alternative Solutions
The alternative is to use rolling(...).apply(...)
with custom weights and it is available at the moment. However, as mentioned above, this is slow even when numba is used and the weights function is properly vectorized. There might be another one which skips my mind at the moment 🤔.
Additional Context
Here is a minimum working example. It uses the exponential function as weights. The "built-in" win_type
implementation is by far the fastest, even when compared to rolling + numba.
from scipy import signal
import numpy as np
import pandas as pd
halflife = 120
window = 500
lags = 60
tolerance = 1e-10
weights_2 = signal.windows.exponential(window, tau=-(halflife/np.log(2)), center=0, sym=False)
weights_2 = weights_2 / weights_2.sum()
df = pd.DataFrame(np.random.random(size=(1000, 1000)))
argument = 1 + df
argument[argument <= 0] = np.nan
tau = -(halflife / np.log(2))
%%time
alt_1 = np.log(argument)\
.fillna(0)\
.shift(lags)\
.rolling(window=window, win_type="exponential")\
.mean(tau=-(halflife / np.log(2)), center=0, sym=False)\
.replace(0, np.nan)
CPU times: total: 688 ms
Wall time: 688 ms
%%time
alt_2 = np.log(argument)\
.fillna(0)\
.shift(lags)\
.rolling(window)\
.apply(lambda a: np.average(a, weights=weights_2), engine='numba', raw=True)\
.replace(0, np.nan)
CPU times: total: 2.7 s
Wall time: 2.77 s
%%time
alt_3 = np.log(argument)\
.fillna(0)\
.shift(lags)\
.rolling(window)\
.apply(lambda a: np.average(a, weights=weights_2))\
.replace(0, np.nan)
CPU times: total: 30.4 s
Wall time: 30.4 s
pd.testing.assert_frame_equal(alt_1, alt_2, rtol=tolerance)
pd.testing.assert_frame_equal(alt_1, alt_3, rtol=tolerance)