-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add sort keyword to stack #53282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/reshape/reshape.py
Outdated
level_codes = sorted(set(mi_cols.codes[-1])) | ||
level_codes = unique(mi_cols.codes[-1]) | ||
if sort: | ||
level_codes = sorted(level_codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes level_codes to a list? can/should we preserve it as ndarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Used np.sort
here instead
if the default were False (mentioned in the #15105), would it be straightforward to reproduce the current behavior with something like |
Yeah that's a likely future behavior where we default to False and the deprecate this keyword, but we need to set it to true for backward compat |
Any other comments here @jbrockmendel? |
* ENH: Add sort keyword to stack * Removed commented * Use np.sort
@@ -711,7 +714,7 @@ def _convert_level_number(level_num: int, columns: Index): | |||
roll_columns = roll_columns.swaplevel(lev1, lev2) | |||
this.columns = mi_cols = roll_columns | |||
|
|||
if not mi_cols._is_lexsorted(): | |||
if not mi_cols._is_lexsorted() and sort: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is behaving as intended. In the example below, I would think that the rows would be swapped (the stacked 2nd level of the index being 0 1 instead of 1 0)
levels = ((0, 1), (1, 0))
stack_lev = 1
columns = MultiIndex(levels=levels, codes=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = DataFrame(columns=columns, data=[range(4)])
df_stacked = df.stack(stack_lev, sort=True)
print(df_stacked)
# 0 1
# 0 1 0 2
# 0 1 3
# Expected?
# 0 1
# 0 0 1 3
# 1 0 2
mi_cols._is_lexsorted()
is checking if the codes are lexsorted (they are [[0, 0, 1, 1], [0, 1, 0, 1]]
here) but not if the values are sorted (they are [[0, 1], [1, 0]]
here).
Should sort be sorting the level values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. Yeah I agree and I think the level values should be sorted here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I plan on putting up a PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* ENH: Add sort keyword to stack * Removed commented * Use np.sort
sort=False
option to stack/unstack/pivot #15105 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.