Description
Code Sample, a copy-pastable example if possible
PANDAS 0.18 code :
df=pd.DataFrame(np.ones((150,4)),columns=['A','B','C','D'],
index=pd.date_range('2014-01-01',freq='D',periods=150))
df2=pd.DataFrame(np.zeros((150,4)),columns=['A','B','C','D'],
index=pd.date_range('2014-01-01',freq='D',periods=150))
df=pd.concat([df,df2])
print df.groupby('B').mean()
print df.groupby('B').resample('MS').mean().head()
print 'shape : ',df.groupby('B').resample('MS').mean().shape
print df.groupby('B').apply(lambda x:x.resample('MS').mean()).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('MS').mean()).shape
print df.groupby('B').mean()
print df.groupby('B').resample('H').mean().head()
print 'shape : ',df.groupby('B').resample('H').mean().shape
print df.groupby('B').apply(lambda x:x.resample('H').mean()).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('H').mean()).shape
print 'pd version', pd.__version__
PANDAS 0.17 equivalent code:
df=pd.DataFrame(np.ones((150,4)),columns=['A','B','C','D'],index=pd.date_range('2014-01-01',freq='D',periods=150))
df2=pd.DataFrame(np.zeros((150,4)),columns=['A','B','C','D'],index=pd.date_range('2014-01-01',freq='D',periods=150))
df=pd.concat([df,df2])
print df.groupby('B').mean()
print df.groupby('B').resample('MS').head()
print 'shape : ',df.groupby('B').resample('MS').shape
print df.groupby('B').apply(lambda x:x.resample('MS')).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('MS')).shape
print df.groupby('B').mean()
print df.groupby('B').resample('H').head()
print 'shape : ',df.groupby('B').resample('H').shape
print df.groupby('B').apply(lambda x:x.resample('H')).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('H')).shape
print 'pd version', pd.__version__
Expected Output
Pandas 0.18 code Output :
A C D
B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A C D
B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-01-02 0.0 0.0 0.0 0.0
2014-01-03 0.0 0.0 0.0 0.0
2014-01-04 0.0 0.0 0.0 0.0
2014-01-05 0.0 0.0 0.0 0.0
shape : (300, 4)
A B C D
B
0.0 2014-01-01 00:00:00 0.0 0.0 0.0 0.0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.18.0
Pandas 0.17 equivalent code Output :
A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 0 0 0
2014-02-01 0 0 0
2014-03-01 0 0 0
2014-04-01 0 0 0
2014-05-01 0 0 0
shape : (10, 3)
A B C D
B
0 2014-01-01 0 0 0 0
2014-02-01 0 0 0 0
2014-03-01 0 0 0 0
2014-04-01 0 0 0 0
2014-05-01 0 0 0 0
shape : (10, 4)
A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 00:00:00 0 0 0
2014-01-01 01:00:00 NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN
shape : (7154, 3)
A B C D
B
0 2014-01-01 00:00:00 0 0 0 0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.17.1
ISSUES :
in pandas 0.18.0 the column B is not dropped when applying resample afterwards (it should be dropped and put in index like with the simple example using .mean() after groupby).
in pandas 0.18.0 the behavior is correct when downsampling (example with 'MS') but is wrong when upsampling (example with 'H') The dataframe is not upsampled in that case and stays at freq='D'
A workaround is to use df.groupby('B').apply(lambda x: x.resample.mean()) but it's inelegant to say the least and does not solve the issue of B being not dropped in columns.