REGR: groupby.transform with a UDF performance

```
pd.options.mode.copy_on_write = False  # True
size = 10_000
df = pd.DataFrame(
    {
        'a': np.random.randint(0, 100, size),
        'b': np.random.randint(0, 100, size),
        'c': np.random.randint(0, 100, size),
    }
).set_index(['a', 'b']).sort_index()

gb = df.groupby(['a', 'b'])

%timeit gb.transform(lambda x: x == x.shift(-1).fillna(0))

# 2.0.x - CoW=False
# 1.46 s ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 
# 2.0.x - CoW=True
# 1.47 s ± 6.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 
# main - CoW=False
# 4.35 s ± 50.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 
# main - CoW=True
# 9.11 s ± 76.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Encountered this trying to update some code to use CoW. The regression exists without CoW, but is also worse with it. Haven't done any investigation yet as to why.

cc @phofl @jorisvandenbossche 

PS: This code have not been using transform with a UDF :smile:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: groupby.transform with a UDF performance #55256

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

REGR: groupby.transform with a UDF performance #55256

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions