Skip to content

ENH: Add axis argument to Dataframe.corr #35984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

ENH: Add axis argument to Dataframe.corr #35984

wants to merge 17 commits into from

Conversation

kc611
Copy link

@kc611 kc611 commented Aug 30, 2020

@pep8speaks
Copy link

pep8speaks commented Aug 30, 2020

Hello @kc611! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-31 15:28:08 UTC

@kc611 kc611 closed this Aug 30, 2020
@kc611 kc611 reopened this Aug 30, 2020
Copy link
Member

@dsaxton dsaxton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kc611 for the PR

@dsaxton dsaxton added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations labels Aug 30, 2020
@dsaxton dsaxton changed the title Added axis argument to Dataframe.corr ENH: Add axis argument to Dataframe.corr Aug 30, 2020
@dsaxton
Copy link
Member

dsaxton commented Aug 30, 2020

Also can you add a whatsnew note for v1.2.0?

@kc611
Copy link
Author

kc611 commented Aug 30, 2020

@dsaxton Thanks for the review. I made the changes accordingly.

I was also thinking about adding a callable method in the test but since this is a minor improvement (and most probably will work for every case for which Dataframe.corr worked normally,since we're only transposing the original matrix), I dropped the idea. Any changes to tests are welcome.

And also under what section of whatsnew note for v1.2.0 should I add the note?

@dsaxton
Copy link
Member

dsaxton commented Aug 30, 2020

And also under what section of whatsnew note for v1.2.0 should I add the note?

You can add it under enhancements (a bullet within other enhancements is probably fine)

Copy link
Member

@dsaxton dsaxton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @kc611 nice work

@kc611
Copy link
Author

kc611 commented Aug 30, 2020

LGTM @kc611 nice work

Thanks

@dsaxton dsaxton added this to the 1.2 milestone Aug 30, 2020
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kc611 , just leaving a small request

@kc611
Copy link
Author

kc611 commented Aug 31, 2020

@MarcoGorelli Thanks for review, I added the type annotations as requested.

But it seems that the tests are failing and I cannot figure out the exact reason. Can you assist me.

def corr(
self,
method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson",
min_periods: Optional[int] = 1, axis: Union[str, int] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running black on this file should fix the linting, the other CI issues seem unrelated

https://pandas.pydata.org/docs/dev/development/contributing.html#python-pep8-black

Copy link
Author

@kc611 kc611 Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dsaxton. The linting issue is fixed but CI issue still persists

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, though I'm not the best person to comment on whether doing it this way (taking the transpose) is most efficient. @dsaxton @WillAyd @jreback in case you have comments

def test_corr_axes(self, meth):
# https://github.com/pandas-dev/pandas/issues/35002
df = pd.DataFrame(np.random.normal(size=(10, 4)))
expected = df.T.corr(meth, axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's usually encouraged to explicitly write out the expected DataFrame so that expected doesn't go down the same code path as result

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I could do that but, wouldn't it just be a test for Dataframe.corr function itself. Since the original operations to be done on matrix itself are left unchanged.

Personally I don't think explicitly writing Dataframe in this case is needed, unless (as you suggested) instead of taking a transpose we implement a workaround involving changing the main function itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sure, perhaps wait for others' comments then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comes close to being circular but I think it's probably okay here. In this case it's hard to explicitly construct the expected DataFrame for all methods "from scratch" without either trivial input data or messy juggling of different scipy functions.

def corr(self, method="pearson", min_periods=1) -> "DataFrame":
def corr(
self,
method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this signature to an alias and put it in pandas._typing, cal it MethodWithCallable

@@ -8162,12 +8174,22 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame":
dogs cats
dogs 1.0 0.3
cats 0.3 1.0
>>> df.corr(method=histogram_intersection, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment as well

idx = cols.copy()
mat = numeric_df.to_numpy(dtype=float, na_value=np.nan, copy=False)

if axis == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have to transpose the results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do since the result is symmetric

@pytest.mark.parametrize("meth", ["pearson", "spearman", "kendall"])
def test_corr_axes(self, meth):
# https://github.com/pandas-dev/pandas/issues/35002
df = pd.DataFrame(np.random.normal(size=(10, 4)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put axis labels that are differnt for rows / columns and this should fail (need to handle that)

@dsaxton
Copy link
Member

dsaxton commented Sep 20, 2020

@kc611 Can you merge master and address @jreback comments above?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 21, 2020
@dsaxton
Copy link
Member

dsaxton commented Oct 21, 2020

@kc611 Can you address comments and merge master?

@kc611 kc611 closed this Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add axis argument to DataFrame.corr
5 participants