Skip to content

df.groupby(col).resample('x') returning both unstacked or multiindex result #13255

Closed
@mikepqr

Description

@mikepqr
import pandas as pd
from itertools import cycle, islice
N = 4
df = pd.DataFrame(index=pd.date_range('2000', periods=N))
df['col1'] = list(islice(cycle(['A', 'B']), N))
df['col2'] = list(islice(cycle(['a', 'b', 'c']), N))
print(df)

returns

           col1 col2
2000-01-01    A    a
2000-01-02    B    b
2000-01-03    A    c
2000-01-04    B    a

If I then do the following groupbys, lines 2 and 4 return the same result (a multiindex), but line 3 returns the result of running .unstack on line 1.

print(df.groupby(['col1', pd.TimeGrouper('W')]).size())  # 1. unstacked
print(df.groupby(['col2', pd.TimeGrouper('W')]).size())  # 2. unstacked
print(df.groupby('col1').resample('W').size())  # 3. multiindex
print(df.groupby('col2').resample('W').size())  # 4. unstacked
col1
A     2000-01-02    1
      2000-01-09    1
B     2000-01-02    1
      2000-01-09    1
dtype: int64
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64
      2000-01-02  2000-01-09
col1
A              1           1
B              1           1
col2
a     2000-01-02    1
      2000-01-09    1
b     2000-01-02    1
c     2000-01-09    1
dtype: int64

This seems like an inconsistency to me, in the sense that, if 1 and 3 do not return the same result, neither should 2 and 4. Is this a bug, or is df.groupby(col).resample('x') supposed to behave like this, and is it supposed to behave differently to df.groubpy[(col, pd.TimeGrouper(x)])?

Note that in df col1 has two groups of identical size, while col2 has unequal sized groups. I'm not sure if that's the problem, but this inconsistency goes away for some choices of N (e.g. all four lines above return multiindex for N=3 and N=24?!)

print(df.groupby('col1').size())
print(df.groupby('col2').size())
col1
A    2
B    2
dtype: int64
col2
a    2
b    1
c    1
dtype: int64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions