-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Implement multi-column DataFrame.quantiles
#44301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
385cff4
First pass at multi-column quantiles
charlesbluca 2c55c68
Update docstring
charlesbluca 4aae08c
Add tests for table quantiles
charlesbluca 8cc274f
Migrate quantiles code to quantile(method='table')
charlesbluca ee28436
Add handling for degenerate case
charlesbluca b45463b
Merge remote-tracking branch 'upstream/master' into multi-col-quantiles
charlesbluca 259ff59
Fix incorrect assertion in test_quantile_multi
charlesbluca c074638
Improve non-numeric exclusion test
charlesbluca ec24040
Resolve test_quantile_box failures
charlesbluca 3cc5d6d
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
charlesbluca 3b42472
Rename res to res_df
charlesbluca e6229c6
Resolve sparse test failures
charlesbluca 54240eb
Remove try/except block to try and resolve new failures
charlesbluca cec798f
Check if tests resolve when we only use transpose to unwrap
charlesbluca 04dbdfd
Add back in try / except block
charlesbluca 7bf7d18
Use if / else instead of try / except
charlesbluca 34f5c68
Merge branch 'main' into multi-col-quantiles
charlesbluca aded5dd
Merge branch 'main' into multi-col-quantiles
charlesbluca 9e3c300
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
charlesbluca 939c735
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
charlesbluca ae16a24
Merge branch 'main' into multi-col-quantiles
charlesbluca d601d4e
Merge branch 'main' into multi-col-quantiles
charlesbluca d058765
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
charlesbluca c495fea
Merge branch 'main' into multi-col-quantiles
charlesbluca 1c411fa
Merge branch 'main' into multi-col-quantiles
charlesbluca a019f15
Merge branch 'main' into multi-col-quantiles
charlesbluca 44d14a7
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke c9dd92f
Use pytest fixture to parameterize TestDataFrameQuantile
mroeschke 6fd8d49
Add tests validating arguments, remove unnecessary tolist()
mroeschke 4ebab82
Add whatsnew note
mroeschke 79789cd
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 5a13fea
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke eae90bc
Add xfails for arraymanager
mroeschke b431cf0
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 58264ed
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 250222b
Add ignores
mroeschke 85bb06a
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 9884d07
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke a5977dc
Improve assertin of test_quantile
mroeschke 90de88e
Add xfail marker for arraymanager
mroeschke b9a10c8
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 9db6c26
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 0dee399
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke c46fcbc
Merge remote-tracking branch 'upstream/main' into multi-col-quantiles
mroeschke 016f81b
Fix typing again
mroeschke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,7 +83,10 @@ | |
npt, | ||
) | ||
from pandas.compat._optional import import_optional_dependency | ||
from pandas.compat.numpy import function as nv | ||
from pandas.compat.numpy import ( | ||
function as nv, | ||
np_percentile_argname, | ||
) | ||
from pandas.util._decorators import ( | ||
Appender, | ||
Substitution, | ||
|
@@ -11129,6 +11132,7 @@ def quantile( | |
axis: Axis = 0, | ||
numeric_only: bool | lib.NoDefault = no_default, | ||
interpolation: QuantileInterpolation = "linear", | ||
method: Literal["single", "table"] = "single", | ||
) -> Series | DataFrame: | ||
""" | ||
Return values at the given quantile over requested axis. | ||
|
@@ -11157,6 +11161,10 @@ def quantile( | |
* higher: `j`. | ||
* nearest: `i` or `j` whichever is nearest. | ||
* midpoint: (`i` + `j`) / 2. | ||
method : {'single', 'table'}, default 'single' | ||
Whether to compute quantiles per-column ('single') or over all columns | ||
('table'). When 'table', the only allowed interpolation methods are | ||
'nearest', 'lower', and 'higher'. | ||
|
||
Returns | ||
------- | ||
|
@@ -11186,6 +11194,17 @@ def quantile( | |
0.1 1.3 3.7 | ||
0.5 2.5 55.0 | ||
|
||
Specifying `method='table'` will compute the quantile over all columns. | ||
|
||
>>> df.quantile(.1, method="table", interpolation="nearest") | ||
a 1 | ||
b 1 | ||
Name: 0.1, dtype: int64 | ||
>>> df.quantile([.1, .5], method="table", interpolation="nearest") | ||
a b | ||
0.1 1 1 | ||
0.5 3 100 | ||
|
||
Specifying `numeric_only=False` will also compute the quantile of | ||
datetime and timedelta data. | ||
|
||
|
@@ -11212,13 +11231,18 @@ def quantile( | |
# error: List item 0 has incompatible type "Union[float, Union[Union[ | ||
# ExtensionArray, ndarray[Any, Any]], Index, Series], Sequence[float]]"; | ||
# expected "float" | ||
res_df = self.quantile( | ||
[q], # type: ignore[list-item] | ||
res_df = self.quantile( # type: ignore[call-overload] | ||
[q], | ||
axis=axis, | ||
numeric_only=numeric_only, | ||
interpolation=interpolation, | ||
method=method, | ||
) | ||
res = res_df.iloc[0] | ||
if method == "single": | ||
res = res_df.iloc[0] | ||
else: | ||
# cannot directly iloc over sparse arrays | ||
res = res_df.T.iloc[:, 0] | ||
if axis == 1 and len(self) == 0: | ||
# GH#41544 try to get an appropriate dtype | ||
dtype = find_common_type(list(self.dtypes)) | ||
|
@@ -11246,11 +11270,47 @@ def quantile( | |
res = self._constructor([], index=q, columns=cols, dtype=dtype) | ||
return res.__finalize__(self, method="quantile") | ||
|
||
# error: Argument "qs" to "quantile" of "BlockManager" has incompatible type | ||
# "Index"; expected "Float64Index" | ||
res = data._mgr.quantile( | ||
qs=q, axis=1, interpolation=interpolation # type: ignore[arg-type] | ||
) | ||
valid_method = {"single", "table"} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe i am not getting something, but why isn't this just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
if method not in valid_method: | ||
raise ValueError( | ||
f"Invalid method: {method}. Method must be in {valid_method}." | ||
) | ||
if method == "single": | ||
# error: Argument "qs" to "quantile" of "BlockManager" has incompatible type | ||
# "Index"; expected "Float64Index" | ||
res = data._mgr.quantile( | ||
qs=q, axis=1, interpolation=interpolation # type: ignore[arg-type] | ||
) | ||
elif method == "table": | ||
valid_interpolation = {"nearest", "lower", "higher"} | ||
if interpolation not in valid_interpolation: | ||
raise ValueError( | ||
f"Invalid interpolation: {interpolation}. " | ||
f"Interpolation must be in {valid_interpolation}" | ||
) | ||
# handle degenerate case | ||
if len(data) == 0: | ||
if data.ndim == 2: | ||
dtype = find_common_type(list(self.dtypes)) | ||
else: | ||
dtype = self.dtype | ||
return self._constructor([], index=q, columns=data.columns, dtype=dtype) | ||
|
||
q_idx = np.quantile( # type: ignore[call-overload] | ||
np.arange(len(data)), q, **{np_percentile_argname: interpolation} | ||
) | ||
|
||
by = data.columns | ||
if len(by) > 1: | ||
keys = [data._get_label_or_level_values(x) for x in by] | ||
indexer = lexsort_indexer(keys) | ||
else: | ||
by = by[0] | ||
k = data._get_label_or_level_values(by) # type: ignore[arg-type] | ||
indexer = nargsort(k) | ||
|
||
res = data._mgr.take(indexer[q_idx], verify=False) | ||
res.axes[1] = q | ||
|
||
result = self._constructor(res) | ||
return result.__finalize__(self, method="quantile") | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.