Description
I've made a PR(#17871) to get pipe funcionality on GroupBy objects.
Resampler
should IMO have the same .pipe
behaviour as GroupBy objects. Then you could do the below in a single pass:
df.resample('3M').pipe(lambda resampled: resampled.Open.first() - resampled.close.last())
That is, for each resampled period, you could reuse a Resampler
objects multiple time in a pipe. The alternative would be to do it in several lines, which would be less readable, or to use apply, which would be slower.
Currently, however, Resampler.pipe
/DatetimeResampler.pipe
implicitly converts to a dataframe of mean before piping, which to me seems wrong/unintuitive:
>>> df.resample('3M').pipe(lambda x: x.max() - x.min())
C:\Users\TP\Anaconda3\envs\pandasdev\Scripts\ipython-script.py:1: FutureWarning:
.resample() is now a deferred operation
You called pipe(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean. Use .resample(...).mean() instead
To demonstrate:
First set-up:
>>> d = = pd.date_range('2017-01-01', periods=4)
>>> df = pd.DataFrame(dict(B=[1,2,3, 4]), index=d)
>>> r = df.resample('2D')
if we call pipe we get an uexpected result:
>>> r.pipe(lambda x: x.max() - x.min())
B 2.0
dtype: float64
The reason for the unexpected result is that under the hood the above is:
>>> r.mean().pipe(lambda x: x.max() - x.min())
B 2.0
dtype: float64
Expected would be:
>>> def diff(r):
...: return r.max() - r.mean()
>>> diff(r) # r.pipe(diff) should give the same result as this
B
2017-01-01 0.5
2017-01-03 0.5
IMO, if the user wants the mean before piping, he should just himself call mean before pipe.
Adding a pipe was very easy for GroupBy and has very good use cases. I propose adding a (proper) pipe to Resampler also.