Skip to content

BUG: custom window not working in groupby #35755

Closed
@MaxHalford

Description

@MaxHalford
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
readings = pd.DataFrame(
    [
        ('A', 'Saturday', 101),
        ('A', 'Sunday', 88),
        ('A', 'Saturday', 103),
        ('A', 'Sunday', 82),
        ('A', 'Saturday', 100),
        ('B', 'Saturday', 27),
        ('B', 'Sunday', 13),
        ('B', 'Saturday', 21),
        ('B', 'Sunday', 17),
        ('B', 'Saturday', 25)
    ],
    columns=['building', 'day', 'reading']
)

class ShiftedWindow(pd.api.indexers.BaseIndexer):

    def __init__(self, window_size):
        self.window_size = window_size

    def get_window_bounds(self, num_values=0, min_periods=None, center=None, closed=None):

        starts = np.arange(-self.window_size, num_values - self.window_size)
        ends = starts + self.window_size
        starts[:self.window_size] = 0
        
        return starts, ends

readings.groupby('building')['reading'].rolling(window=ShiftedWindow(2), min_periods=1).mean()

Problem description

I've defined a custom window that uses the previous values, and therefore ignores the current value. It's very useful for, say, target encoding on time series.

Expected Output

I would be expecting the following output:

>>> readings.groupby('building')['reading'].apply(lambda x: x.shift(1).rolling(2, min_periods=1).mean())
0      NaN
1    101.0
2     94.5
3     95.5
4     92.5
5      NaN
6     27.0
7     20.0
8     17.0
9     19.0
Name: reading, dtype: float64

Instead, I'm getting:

>>> readings.groupby('building')['reading'].rolling(window=ShiftedWindow(2), min_periods=1).mean()
building   
A         0    101.0
          1     88.0
          2    103.0
          3     82.0
          4    100.0
B         5     27.0
          6     13.0
          7     21.0
          8     17.0
          9     25.0
Name: reading, dtype: float64

I've checked and my custom window works as expected without using groupby. I've checked to see if get_window_bounds gets called when a groupby is used, and the answer is no. Basically, it seems that my custom window is being ignored entirely.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 19.5.0
Version : Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fsspec : 0.5.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.9
tables : 3.5.2
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.45.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions