Skip to content

DataFrame.corr(method="kendall") calculation is slow #28329

Closed
@dsaxton

Description

@dsaxton
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

df.corr(method="kendall")
# 21.6 s ± 686 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DataFrame.corr(method="kendall") doesn't scale particularly well, perhaps because it's the only named correlation method that isn't Cythonized at the moment (we just call kendalltau from scipy repeatedly in a Python for loop: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7454). It may be worthwhile to try to implement something more efficient within _libs/algos.pyx.

Relevant discussion: #28151

Metadata

Metadata

Assignees

Labels

PerformanceMemory or execution speed performance

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions