-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add axis argument to Dataframe.corr #35984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kc611
commented
Aug 30, 2020
•
edited
Loading
edited
- closes Add axis argument to DataFrame.corr #35002
- tests added / passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kc611 for the PR
Also can you add a whatsnew note for v1.2.0? |
@dsaxton Thanks for the review. I made the changes accordingly. I was also thinking about adding a callable method in the test but since this is a minor improvement (and most probably will work for every case for which Dataframe.corr worked normally,since we're only transposing the original matrix), I dropped the idea. Any changes to tests are welcome. And also under what section of whatsnew note for v1.2.0 should I add the note? |
You can add it under enhancements (a bullet within other enhancements is probably fine) |
Co-authored-by: Daniel Saxton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @kc611 nice work
Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kc611 , just leaving a small request
@MarcoGorelli Thanks for review, I added the type annotations as requested. But it seems that the tests are failing and I cannot figure out the exact reason. Can you assist me. |
pandas/core/frame.py
Outdated
def corr( | ||
self, | ||
method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson", | ||
min_periods: Optional[int] = 1, axis: Union[str, int] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running black on this file should fix the linting, the other CI issues seem unrelated
https://pandas.pydata.org/docs/dev/development/contributing.html#python-pep8-black
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dsaxton. The linting issue is fixed but CI issue still persists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_corr_axes(self, meth): | ||
# https://github.com/pandas-dev/pandas/issues/35002 | ||
df = pd.DataFrame(np.random.normal(size=(10, 4))) | ||
expected = df.T.corr(meth, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's usually encouraged to explicitly write out the expected DataFrame so that expected
doesn't go down the same code path as result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I could do that but, wouldn't it just be a test for Dataframe.corr function itself. Since the original operations to be done on matrix itself are left unchanged.
Personally I don't think explicitly writing Dataframe in this case is needed, unless (as you suggested) instead of taking a transpose we implement a workaround involving changing the main function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sure, perhaps wait for others' comments then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test comes close to being circular but I think it's probably okay here. In this case it's hard to explicitly construct the expected DataFrame for all methods "from scratch" without either trivial input data or messy juggling of different scipy functions.
def corr(self, method="pearson", min_periods=1) -> "DataFrame": | ||
def corr( | ||
self, | ||
method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this signature to an alias and put it in pandas._typing, cal it MethodWithCallable
@@ -8162,12 +8174,22 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |||
dogs cats | |||
dogs 1.0 0.3 | |||
cats 0.3 1.0 | |||
>>> df.corr(method=histogram_intersection, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment as well
idx = cols.copy() | ||
mat = numeric_df.to_numpy(dtype=float, na_value=np.nan, copy=False) | ||
|
||
if axis == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have to transpose the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do since the result is symmetric
@pytest.mark.parametrize("meth", ["pearson", "spearman", "kendall"]) | ||
def test_corr_axes(self, meth): | ||
# https://github.com/pandas-dev/pandas/issues/35002 | ||
df = pd.DataFrame(np.random.normal(size=(10, 4))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put axis labels that are differnt for rows / columns and this should fail (need to handle that)
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@kc611 Can you address comments and merge master? |