Skip to content

Performance regression in stat_ops.FrameMultiIndexOps.time_op #35050

Closed
@TomAugspurger

Description

@TomAugspurger

https://pandas.pydata.org/speed/pandas/index.html#stat_ops.FrameMultiIndexOps.time_op?p-level=1&p-op=%27std%27&commits=c9144ca54dcc924995acae3d9dcb890a5802d7c0

import pandas as pd
import numpy as np

levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
    np.arange(10).repeat(10000),
    np.tile(np.arange(100).repeat(100), 10),
    np.tile(np.tile(np.arange(100), 100), 10),
]
index = pd.MultiIndex(levels=levels, codes=codes)
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
%timeit df.groupby(level=1).std()

Points to #34372 (cc @rhshadrach), but there was an earlier slowdown.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions