Skip to content

Lots of unexpected behavior using resample after groupby #12923

Closed
@BreitA

Description

@BreitA

Code Sample, a copy-pastable example if possible

PANDAS 0.18 code :

df=pd.DataFrame(np.ones((150,4)),columns=['A','B','C','D'],
index=pd.date_range('2014-01-01',freq='D',periods=150))
df2=pd.DataFrame(np.zeros((150,4)),columns=['A','B','C','D'],
index=pd.date_range('2014-01-01',freq='D',periods=150))

df=pd.concat([df,df2])

print df.groupby('B').mean()
print df.groupby('B').resample('MS').mean().head()
print 'shape : ',df.groupby('B').resample('MS').mean().shape
print df.groupby('B').apply(lambda x:x.resample('MS').mean()).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('MS').mean()).shape
print df.groupby('B').mean()
print df.groupby('B').resample('H').mean().head()
print 'shape : ',df.groupby('B').resample('H').mean().shape
print df.groupby('B').apply(lambda x:x.resample('H').mean()).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('H').mean()).shape
print 'pd version', pd.__version__

PANDAS 0.17 equivalent code:

df=pd.DataFrame(np.ones((150,4)),columns=['A','B','C','D'],index=pd.date_range('2014-01-01',freq='D',periods=150))
df2=pd.DataFrame(np.zeros((150,4)),columns=['A','B','C','D'],index=pd.date_range('2014-01-01',freq='D',periods=150))

df=pd.concat([df,df2])

print df.groupby('B').mean()
print df.groupby('B').resample('MS').head()
print 'shape : ',df.groupby('B').resample('MS').shape
print df.groupby('B').apply(lambda x:x.resample('MS')).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('MS')).shape
print df.groupby('B').mean()
print df.groupby('B').resample('H').head()
print 'shape : ',df.groupby('B').resample('H').shape
print df.groupby('B').apply(lambda x:x.resample('H')).head()
print 'shape : ',df.groupby('B').apply(lambda x:x.resample('H')).shape
print 'pd version', pd.__version__

Expected Output

Pandas 0.18 code Output :

   A    C    D

B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A C D
B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-01-02 0.0 0.0 0.0 0.0
2014-01-03 0.0 0.0 0.0 0.0
2014-01-04 0.0 0.0 0.0 0.0
2014-01-05 0.0 0.0 0.0 0.0
shape : (300, 4)
A B C D
B
0.0 2014-01-01 00:00:00 0.0 0.0 0.0 0.0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.18.0

Pandas 0.17 equivalent code Output :

A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 0 0 0
2014-02-01 0 0 0
2014-03-01 0 0 0
2014-04-01 0 0 0
2014-05-01 0 0 0
shape : (10, 3)
A B C D
B
0 2014-01-01 0 0 0 0
2014-02-01 0 0 0 0
2014-03-01 0 0 0 0
2014-04-01 0 0 0 0
2014-05-01 0 0 0 0
shape : (10, 4)
A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 00:00:00 0 0 0
2014-01-01 01:00:00 NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN
shape : (7154, 3)
A B C D
B
0 2014-01-01 00:00:00 0 0 0 0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.17.1

ISSUES :

in pandas 0.18.0 the column B is not dropped when applying resample afterwards (it should be dropped and put in index like with the simple example using .mean() after groupby).
in pandas 0.18.0 the behavior is correct when downsampling (example with 'MS') but is wrong when upsampling (example with 'H') The dataframe is not upsampled in that case and stays at freq='D'

A workaround is to use df.groupby('B').apply(lambda x: x.resample.mean()) but it's inelegant to say the least and does not solve the issue of B being not dropped in columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions