Closed
Description
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(1000, 300))
df.corr(method="kendall")
# 21.6 s ± 686 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
DataFrame.corr(method="kendall")
doesn't scale particularly well, perhaps because it's the only named correlation method that isn't Cythonized at the moment (we just call kendalltau
from scipy
repeatedly in a Python for loop: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7454). It may be worthwhile to try to implement something more efficient within _libs/algos.pyx
.
Relevant discussion: #28151