Closed
Description
Code Sample
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(1000, 300))
#Using Spearman method: On my computer about 12.19 seconds
df.corr(method = 'spearman')
#Using detour: On my computer about 0.72 seconds
rank_df = df.rank()
rank_df.corr(method = 'pearson')
Problem description
I found that when I calculate Spearman correlation via the detour method it's way faster. The output (correlation coefficients) differs slightly (minimal) when using data containing nan-values. I'd like to understand this difference in output, and maybe the Spearman method can be faster by re-using the current Pearson implementation or something.