Skip to content

BUG: resample closed='left' not binning correctly. #4197

Closed
@nehalecky

Description

@nehalecky

related: http://stackoverflow.com/questions/21329425/resampling-a-pandas-dataframe-with-loffset-introduces-an-additional-offset-of-an

Hey pandas team. Sorry to have gone MIA the past week, super busy with work. I promise (and look forward to) contributing more soon. :)

Still, I wanted to note that I came across what I believe to a be a bug in resample() when trying to change the interval of the binning with closed='left'. I know that there have been a few changes to the resample() API since Wes' book, however, I don't believe they changed this functionality, but I have been wrong before :)

Bug can be reproduced using the example from Wes' book, generating 12 mins of data like:

In [3]: rng = pd.date_range('1/1/2000', periods=12, freq='T')
In [4]: ts = pd.Series(np.arange(12), index=rng)
In [5]: ts
Out[5]: 
2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int64

we can do a simple resample to 5 mins like:

In [6]: ts.resample('5min', how='sum')
Out[6]: 
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

For my use, I need this resampling to be 'backwards looking' so that the summations at each resampled timestamp include the previous 4 minutes. Documentation (and Wes' book) suggest this is achieved by binning with closed='left', however, this results in the same output as above:

In [7]: ts.resample('5min', how='sum', closed='left')
Out[8]: 
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

I was looking for the following result (note that the first timestamp is at 00:05:00 and with hanging data dropped):

2000-01-01 00:05:00    10
2000-01-01 00:10:00    35
Freq: 5T, dtype: int64

I am able to generate this by combining loffset='5min' and then slicing into the resultant Series to remove the:

In [10]: ts.resample('5min', how='sum', closed='left', loffset='5min')[:-1]
Out[10]: 
2000-01-01 00:05:00    10
2000-01-01 00:10:00    35
Freq: 5T, dtype: int64

but this is hardly ideal as it's not known in advance if time series ends with a timestamp that resolves equally to the final timestamp of the resampling procedure!

Apologies if I am missing something—any thoughts, help or guidance is welcomed!
Thanks so much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselvesResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions