Skip to content

Resample on MultiIndex level takes much more time than on normal column #28635

Closed
@pgrudzinski

Description

@pgrudzinski

Code Sample

import pandas as pd
import numpy as np

n=80000
g=5

index = pd.MultiIndex.from_product([
    np.arange(g),
    pd.to_timedelta(np.arange(n), unit='s')
])
data = pd.DataFrame(
    np.random.randint(0,1000,size=(len(index))),
    index=index
)

%timeit data.groupby(level=0).resample('10s',level=1).mean()

# 3.93 s ± 295 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit data.reset_index(1).groupby(level=0).resample('10s',on='level_1').mean()

# 157 ms ± 3.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Problem description

resample seem to take much more time when resampling on a level of MultiIndex instead of normal data column. The second, faster approach is more convoluted and is not what first comes to mind.

Expected Output

Both operations should around the same amount of time with second possibly slightly more, because of additional reset_index operation. If the difference is expected than first operation should show warning hinting on optimal solution.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions