Closed
Description
import pandas as pd
from itertools import cycle, islice
N = 4
df = pd.DataFrame(index=pd.date_range('2000', periods=N))
df['col1'] = list(islice(cycle(['A', 'B']), N))
df['col2'] = list(islice(cycle(['a', 'b', 'c']), N))
print(df)
returns
col1 col2
2000-01-01 A a
2000-01-02 B b
2000-01-03 A c
2000-01-04 B a
If I then do the following groupbys, lines 2 and 4 return the same result (a multiindex), but line 3 returns the result of running .unstack
on line 1.
print(df.groupby(['col1', pd.TimeGrouper('W')]).size()) # 1. unstacked
print(df.groupby(['col2', pd.TimeGrouper('W')]).size()) # 2. unstacked
print(df.groupby('col1').resample('W').size()) # 3. multiindex
print(df.groupby('col2').resample('W').size()) # 4. unstacked
col1
A 2000-01-02 1
2000-01-09 1
B 2000-01-02 1
2000-01-09 1
dtype: int64
col2
a 2000-01-02 1
2000-01-09 1
b 2000-01-02 1
c 2000-01-09 1
dtype: int64
2000-01-02 2000-01-09
col1
A 1 1
B 1 1
col2
a 2000-01-02 1
2000-01-09 1
b 2000-01-02 1
c 2000-01-09 1
dtype: int64
This seems like an inconsistency to me, in the sense that, if 1 and 3 do not return the same result, neither should 2 and 4. Is this a bug, or is df.groupby(col).resample('x')
supposed to behave like this, and is it supposed to behave differently to df.groubpy[(col, pd.TimeGrouper(x)])
?
Note that in df
col1 has two groups of identical size, while col2 has unequal sized groups. I'm not sure if that's the problem, but this inconsistency goes away for some choices of N
(e.g. all four lines above return multiindex for N=3
and N=24
?!)
print(df.groupby('col1').size())
print(df.groupby('col2').size())
col1
A 2
B 2
dtype: int64
col2
a 2
b 1
c 1
dtype: int64