Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://codereview.stackexchange.com/questions/284462/python-correlation-function
Question about pandas
I am looking to calculate the correlation between two sets of columns, namely between features and targets. Where, the number of features is well superior to the number of targets. The documented way to calculate corelation is to perform the calculation of the whole correlation matrix, then look at subsets of rows/columns, as shown in the code snippet below. However, as the number of features is big, I perform a lot of unecessary calculations as I will filter most of the coefficients out.
train = pd.read_csv('train.csv')
targets = ['y1','y2']
features = [c for c in train.columns if c not in targets]
train.corr().loc[features,targets]
Is there a more efficient way to do so ? I've tried corr_with too but it only seems to accept DataFrames with same subset of columns.