Skip to content

BUG/ENH: Add how kwarg to rolling_* functions [fix #6297] #6845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2014

Conversation

jsexauer
Copy link
Contributor

@jsexauer jsexauer commented Apr 9, 2014

Fixes #6297

Figured while I was in the area, would submit a PR to do this too. Let me know if you want to see more unit tests than the one.

@jreback
Copy link
Contributor

jreback commented Apr 9, 2014

median should also have a how=median I think

pls add a similar test for min/ median as well

the docs in computation.rst might need a bit of tweaking as well

@jreback jreback added this to the 0.14.0 milestone Apr 9, 2014
@jsexauer
Copy link
Contributor Author

Updated to include median, additional tests, and added documentation in computation.rst

@jreback
Copy link
Contributor

jreback commented Apr 12, 2014

can u add a note near the how that u updated in computation.rst to explain when the how is used (eg only in intra day resample a when the moment freq is > day)

@jorisvandenbossche
Copy link
Member

Expressing a concern: I don't know if I find this a very good idea. Some things that worry me:

  • I think it is actually a lot more explicit to just ask the user to do the resample themselves, instead of adding yet an extra argument and trying to do two things in one. So for example instead of doing:

    pd.rolling_max(series, window=1, freq='D', how='max')
    

    just do

    pd.rolling_max(series.resample(freq='D', how='max'), window=1)
    

    This is a little longer, but personally I find this more explicit and clear what is going on for someone reading this code.

  • as was noted in the issue, it is not really a bug, but just a strange implementation (but a documented one). And changing the default value for how to max for max or median for median, etc can also possibly break code, and given my previous point, I don't know if this is worth it.

  • Maybe the main reason why I am hesitant about this is that I have always found this freq keyword argument very confusing (although it is clear what it does, the explanation in the docstring was in the past just totally not clear, but this is better now). When I first encountered this, I expected that I could use freq to specify the window as a frequency when having timeseries data (eg a frequency of '15min' instead of window=15 if your data are every minute). A functionality that I would find much more useful.

But it's maybe just a stupid feeling, and you can certainly convince me that this is a good idea. Even now writing it down my hesitation is already not that strong anymore :-)

@jorisvandenbossche
Copy link
Member

Hmm, thinking more thoroughly about it, with some use case in my head. If you have an irregular timeseries, it is indeed resampling it to a regular timeseries what you want, otherwise the rolling function does not make much sense. My idea of specifying the window as a frequency when not having a regular timeseries is also not really well defined what this should do, without first resampling it.
So forget what I said. This functionality is indeed useful within the rolling_ functions.

Concern of breaking backwards compatibility still holds, but maybe not too many people will rely on this? (for min/max/median) It is also not really possible to do this with a deprecation warning I think.


- ``how``: optionally specify method for down or re-sampling. Default is
is min for ``rolling_min``, max for ``rolling_max``, median for ``rolling_median``,
and mean for all other rolling functions. See :meth:`DataFrame.resample`'s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some spaces are missing at the beginning of the line to align this with the previous lines.

@jorisvandenbossche
Copy link
Member

Added some comments on the docs.

For the release notes. I think there should also be a mention in the API changes section? (for the change of the default for min/max/median)

@jreback
Copy link
Contributor

jreback commented Apr 12, 2014

@jorisvandenbossche I think this actually a bug (but can put in the API section)
if u want a rolling_max then it should really be the max and not the max of a meaned daily number which does not make much sense

only affects max,min,median

@jorisvandenbossche
Copy link
Member

@jreback Yeah, I agree after some better thinking that is normally (in most cases) not what you want, it's just that is was documented. So in that sense certainly an api change.

But you can forget my concerns :-)

@jreback
Copy link
Contributor

jreback commented Apr 21, 2014

@jsexauer can you rebase on master; looks fine otherwise

@jorisvandenbossche ?

ping when ready

@jorisvandenbossche
Copy link
Member

Yes, I am OK with this, just my two formatting comments of above.

@jsexauer
Copy link
Contributor Author

@jreback Ready for merging.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

ok...ping when green

@jsexauer
Copy link
Contributor Author

It is green...?

jreback added a commit that referenced this pull request Apr 22, 2014
BUG/ENH: Add how kwarg to rolling_* functions [fix #6297]
@jreback jreback merged commit 92852bd into pandas-dev:master Apr 22, 2014
@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Datetime Datetime data dtype Frequency DateOffsets Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rolling_max on a TimeSeries with freq='D' returns incorrect results
3 participants