Closed
Description
I was generating lots of features for time-dependent data, and ended up writing a lot of expanding apply operations in Cython. Would the community want something like this? Imagine you have a dataframe with an "entity" column, a "time" column and some numeric "feature" column, and you want to calculate the expanding sum/mean/mode/etc. of the feature column for each entity.
This is currently not optimized well in Pandas especially for computing the mode for categorical variables where keeping track of state saves a lot of time.
Example of a use case:
import cython_opt # cython functions are defined here
# df like {"project_id": [1,1,1,1,2,2,2,2], "value": [3,4,5,6, 10,11,12,13]}
df.groupby(level='project_id')['value'].transform(lambda x: cython_opt.expanding_mean(x.values))
# output like {"project_id": [1,1,1,1,2,2,2,2], "value": [3, 3.5, 4, 4.5, 10, 10.5, 11, 11.5]}
I can provide lots more examples and functions (I wrote around 20 of these)