Skip to content

API: Should groupby.rolling be treated as a transform #51751

Open
@rhshadrach

Description

@rhshadrach

I don't have a lot of familiarity of window ops, but thought this was an interesting question from our Reddit AMA:

# example DataFrame with 3 columns: date, id and a random value
dates = list(pd.date_range(start="2019-01-01", end="2019-12-01", freq="MS"))
length = len(dates)
n = 2
ids = sorted(list(range(n)) * length)
values = np.random.randint(low=0, high=10, size=length).tolist()
df = pd.DataFrame({"date": dates * n, "id": ids, "value": values * n})

print(df.head())
#         date  id  value
# 0 2019-01-01   0      8
# 1 2019-02-01   0      8
# 2 2019-03-01   0      7
# 3 2019-04-01   0      2
# 4 2019-05-01   0      1

print(df.rolling(3, on="date")["value"].max().head())
# 0    NaN
# 1    NaN
# 2    8.0
# 3    8.0
# 4    7.0
# Name: value, dtype: float64

print(df.groupby("id").rolling(3, on="date")["value"].max().head())
# id  date      
# 0   2019-01-01    NaN
#     2019-02-01    NaN
#     2019-03-01    8.0
#     2019-04-01    8.0
#     2019-05-01    7.0
# Name: value, dtype: float64

When doing DataFrame.rolling with a reduction, the operation is a transform. With this, should DataFrameGroupBy.rolling then also be treated like a transform; i.e. not include the groupers or on column in the result, but rather return the index of the original frame? Like other transforms, users can then use the original object if they want to recover the corresponding on values or grouping data.

cc @mroeschke

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorGroupbyNeeds DiscussionRequires discussion from core team before further actionWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions