Closed
Description
I've got a pandas data frame defined like this, using pandas 0.11.0:
last_4_weeks_range = pandas.date_range(
start=datetime.datetime(2001, 5, 4), periods=28)
last_4_weeks = pandas.DataFrame(
[{'REST_KEY': 1, 'DLY_TRN_QT': 80, 'DLY_SLS_AMT': 90,
'COOP_DLY_TRN_QT': 30, 'COOP_DLY_SLS_AMT': 20}] * 28 +
[{'REST_KEY': 2, 'DLY_TRN_QT': 70, 'DLY_SLS_AMT': 10,
'COOP_DLY_TRN_QT': 50, 'COOP_DLY_SLS_AMT': 20}] * 28,
index=last_4_weeks_range.append(last_4_weeks_range))
last_4_weeks.sort(inplace=True)
and when I go to resample it:
In [265]: last_4_weeks.resample('7D', how='sum')
Out[265]:
COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY
2001-05-04 280 560 700 1050 21
2001-05-11 280 560 700 1050 21
2001-05-18 280 560 700 1050 21
2001-05-25 280 560 700 1050 21
2001-06-01 0 0 0 0 0
I end up with an extra empty bin I wouldn't expect to see -- 2001-06-01. I wouldn't expect that bin to be there, as my 28 days are evenly divisible into the 7 day resample I'm performing. I've tried messing around with the closed kwarg, but I can't escape that extra bin. This seems like a bug, and it messes up my mean calculations when I try to do
In [266]: last_4_weeks.groupby('REST_KEY').resample('7D', how='sum').mean(level=0)
Out[266]:
COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY
REST_KEY
1 112 168 504 448 5.6
2 112 280 56 392 11.2
as the numbers are being divided by 5 rather than 4. (I also wouldn't expect REST_KEY to show up in the aggregation columns as it's part of the groupby, but that's really a smaller problem.)