Skip to content

QST: is there a more straightforward approach to calulate correlation between two sets of columns ?  #52776

Open
@lcrmorin

Description

@lcrmorin

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://codereview.stackexchange.com/questions/284462/python-correlation-function

Question about pandas

I am looking to calculate the correlation between two sets of columns, namely between features and targets. Where, the number of features is well superior to the number of targets. The documented way to calculate corelation is to perform the calculation of the whole correlation matrix, then look at subsets of rows/columns, as shown in the code snippet below. However, as the number of features is big, I perform a lot of unecessary calculations as I will filter most of the coefficients out.

train = pd.read_csv('train.csv')
targets = ['y1','y2']
features = [c for c in train.columns if c not in targets]
train.corr().loc[features,targets]

Is there a more efficient way to do so ? I've tried corr_with too but it only seems to accept DataFrames with same subset of columns.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions