Open
Description
Currently the exponential functions, such as pd.ewma
, use a min_periods
argument to ensure there's enough data to specify generate a valid value. While this works well for the rolling functions, it's not effective for exponential functions because points have weight forever, albeit ever decreasing:
In [4]: series=pd.Series(range(200))
In [5]: series[20:190]=pd.np.nan
In [6]: pd.ewma(series, span=10, min_periods=15)
Out[6]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 10.277660
15 11.172348
16 12.080052
17 12.999407
18 13.929141
19 14.868084
20 14.868084
21 14.868084
22 14.868084
23 14.868084
24 14.868084
25 14.868084
26 14.868084
27 14.868084
28 14.868084
29 14.868084
...
170 14.868084
171 14.868084
172 14.868084
173 14.868084
174 14.868084
175 14.868084
176 14.868084
177 14.868084
178 14.868084
179 14.868084
180 14.868084
181 14.868084
182 14.868084
183 14.868084
184 14.868084
185 14.868084
186 14.868084
187 14.868084
188 14.868084
189 14.868084
190 190.000000
191 190.550000
192 191.132890
193 191.748020
194 192.394502
195 193.071240
196 193.776953
197 194.510212
198 195.269468
199 196.053089
dtype: float64
I think what we want is to have a min_weight
argument, so if you specify 0.5, it needs 50% of the weight in order to calculate a value. For rolling functions, this would be equivalent to min_periods
being half of window
.
What are people's thoughts?