Skip to content

df.corr Spearman method performance #28366

Closed
@Hiill

Description

@Hiill

Code Sample

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

#Using Spearman method: On my computer about 12.19 seconds
df.corr(method = 'spearman')

#Using detour: On my computer about 0.72 seconds
rank_df = df.rank()
rank_df.corr(method = 'pearson')

Problem description

I found that when I calculate Spearman correlation via the detour method it's way faster. The output (correlation coefficients) differs slightly (minimal) when using data containing nan-values. I'd like to understand this difference in output, and maybe the Spearman method can be faster by re-using the current Pearson implementation or something.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions