Skip to content

Cython-optimized expanding apply transformations #12430

Closed
@bschreck

Description

@bschreck

I was generating lots of features for time-dependent data, and ended up writing a lot of expanding apply operations in Cython. Would the community want something like this? Imagine you have a dataframe with an "entity" column, a "time" column and some numeric "feature" column, and you want to calculate the expanding sum/mean/mode/etc. of the feature column for each entity.

This is currently not optimized well in Pandas especially for computing the mode for categorical variables where keeping track of state saves a lot of time.

Example of a use case:

import cython_opt # cython functions are defined here

# df like {"project_id": [1,1,1,1,2,2,2,2], "value": [3,4,5,6, 10,11,12,13]}
df.groupby(level='project_id')['value'].transform(lambda x: cython_opt.expanding_mean(x.values))
# output like {"project_id": [1,1,1,1,2,2,2,2], "value": [3, 3.5, 4, 4.5, 10, 10.5, 11, 11.5]}

I can provide lots more examples and functions (I wrote around 20 of these)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapEnhancementPerformanceMemory or execution speed performanceWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions