Closed
Description
(edit):
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
The first item is at 9:30, which is not divisible by 8 minutes. We'd like df.resample('8T', base='start').first()
to be equivalent to df.resample('8T', base=2)
.
In [13]: df.resample("8T", base=2).first()
Out[13]:
A
2014-09-01 09:30:00 0
2014-09-01 09:38:00 8
2014-09-01 09:46:00 16
2014-09-01 09:54:00 24
2014-09-01 10:02:00 32
It seems that for 1Min bar data, resample() with sampling frequency of any multiple of 8 has a bug. The code below illustrates the bug when resampling is done at [3, 5, 6, 8, 16] Min. For both 3 and 5 frequency, the first entry of the resampled dataframe index starts at the base timestamp (9:30 in this case) while for frequencies 8 and 16, the resampled index starts at 9:26 and 9:18 respectively.
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
for freq in [3, 5, 6, 8, 16]:
print(freq)
print(df.resample(str(freq) + 'Min', how='first', base=30).head(2))
produces the following output:
3
A
2014-09-01 09:30:00 0
2014-09-01 09:33:00 3
5
A
2014-09-01 09:30:00 0
2014-09-01 09:35:00 5
6
A
2014-09-01 09:30:00 0
2014-09-01 09:36:00 6
8
A
2014-09-01 09:26:00 0
2014-09-01 09:34:00 4
16
A
2014-09-01 09:18:00 0
2014-09-01 09:34:00 4