-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add axis argument to Dataframe.corr #35984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f6ef933
3648aeb
c588bc4
0dbde9d
feecd8d
91c1e3e
fb89cbe
d2a87f8
725f36a
f1c884b
0552fa4
3c4f88c
0f1e817
5871508
b98401b
8fc6cda
f2e6e84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
TYPE_CHECKING, | ||
Any, | ||
AnyStr, | ||
Callable, | ||
Dict, | ||
FrozenSet, | ||
Hashable, | ||
|
@@ -5787,7 +5788,7 @@ def nsmallest(self, n, columns, keep="first") -> "DataFrame": | |
population GDP alpha-2 | ||
Tuvalu 11300 38 TV | ||
Anguilla 11300 311 AI | ||
Iceland 337000 17036 IS | ||
Iceland 337000 17036 IS | ||
|
||
When using ``keep='last'``, ties are resolved in reverse order: | ||
|
||
|
@@ -8116,9 +8117,14 @@ def _series_round(s, decimals): | |
# ---------------------------------------------------------------------- | ||
# Statistical methods, etc. | ||
|
||
def corr(self, method="pearson", min_periods=1) -> "DataFrame": | ||
def corr( | ||
self, | ||
method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson", | ||
min_periods: Optional[int] = 1, | ||
axis: Union[str, int] = 0, | ||
) -> "DataFrame": | ||
""" | ||
Compute pairwise correlation of columns, excluding NA/null values. | ||
Compute pairwise correlation of rows or columns, excluding NA/null values. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -8140,6 +8146,12 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |
to have a valid result. Currently only available for Pearson | ||
and Spearman correlation. | ||
|
||
axis : {0 or 'index', 1 or 'columns'}, default 0 | ||
The axis to use. 0 or 'index' to compute column-wise, 1 or 'columns' for | ||
row-wise. | ||
kc611 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. versionadded:: 1.2.0 | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
|
@@ -8162,12 +8174,22 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |
dogs cats | ||
dogs 1.0 0.3 | ||
cats 0.3 1.0 | ||
>>> df.corr(method=histogram_intersection, axis=1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line before There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comment as well |
||
0 1 2 3 | ||
0 1.0 0.3 0.2 0.3 | ||
1 0.3 1.0 0.0 0.1 | ||
2 0.2 0.0 1.0 0.2 | ||
3 0.3 0.1 0.2 1.0 | ||
""" | ||
numeric_df = self._get_numeric_data() | ||
cols = numeric_df.columns | ||
axis = numeric_df._get_axis_number(axis) | ||
cols = numeric_df._get_agg_axis(axis) | ||
idx = cols.copy() | ||
mat = numeric_df.to_numpy(dtype=float, na_value=np.nan, copy=False) | ||
|
||
if axis == 1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we have to transpose the results? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we do since the result is symmetric |
||
mat = mat.transpose() | ||
|
||
if method == "pearson": | ||
correl = libalgos.nancorr(mat, minp=min_periods) | ||
elif method == "spearman": | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,6 +175,15 @@ def test_corr_int(self): | |
df3.cov() | ||
df3.corr() | ||
|
||
@td.skip_if_no_scipy | ||
@pytest.mark.parametrize("meth", ["pearson", "spearman", "kendall"]) | ||
def test_corr_axes(self, meth): | ||
# https://github.com/pandas-dev/pandas/issues/35002 | ||
df = pd.DataFrame(np.random.normal(size=(10, 4))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put axis labels that are differnt for rows / columns and this should fail (need to handle that) |
||
expected = df.T.corr(meth, axis=0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's usually encouraged to explicitly write out the expected DataFrame so that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I could do that but, wouldn't it just be a test for Dataframe.corr function itself. Since the original operations to be done on matrix itself are left unchanged. Personally I don't think explicitly writing Dataframe in this case is needed, unless (as you suggested) instead of taking a transpose we implement a workaround involving changing the main function itself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK sure, perhaps wait for others' comments then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test comes close to being circular but I think it's probably okay here. In this case it's hard to explicitly construct the expected DataFrame for all methods "from scratch" without either trivial input data or messy juggling of different scipy functions. |
||
result = df.corr(meth, axis=1) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
@td.skip_if_no_scipy | ||
@pytest.mark.parametrize( | ||
"nullable_column", [pd.array([1, 2, 3]), pd.array([1, 2, None])] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this signature to an alias and put it in pandas._typing, cal it MethodWithCallable