Cython-optimized expanding apply transformations

I was generating lots of features for time-dependent data, and ended up writing a lot of expanding apply operations in Cython. Would the community want something like this? Imagine you have a dataframe with an "entity" column, a "time" column and some numeric "feature" column, and you want to calculate the expanding sum/mean/mode/etc. of the feature column for each entity.

This is currently not optimized well in Pandas especially for computing the mode for categorical variables where keeping track of state saves a lot of time.

Example of a use case:

```
import cython_opt # cython functions are defined here

# df like {"project_id": [1,1,1,1,2,2,2,2], "value": [3,4,5,6, 10,11,12,13]}
df.groupby(level='project_id')['value'].transform(lambda x: cython_opt.expanding_mean(x.values))
# output like {"project_id": [1,1,1,1,2,2,2,2], "value": [3, 3.5, 4, 4.5, 10, 10.5, 11, 11.5]}
```

I can provide lots more examples and functions (I wrote around 20 of these)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cython-optimized expanding apply transformations #12430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cython-optimized expanding apply transformations #12430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions