Skip to content

ENH: Add sort keyword to stack #53282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 30, 2023
Merged

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented May 17, 2023

@mroeschke mroeschke added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2023
@mroeschke mroeschke added this to the 2.1 milestone May 17, 2023
level_codes = sorted(set(mi_cols.codes[-1]))
level_codes = unique(mi_cols.codes[-1])
if sort:
level_codes = sorted(level_codes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes level_codes to a list? can/should we preserve it as ndarray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Used np.sort here instead

@jbrockmendel
Copy link
Member

if the default were False (mentioned in the #15105), would it be straightforward to reproduce the current behavior with something like df.stack(...).some_other_method()? If so, longer-term ideally we'd have users do that and not have a keyword

@mroeschke
Copy link
Member Author

Yeah that's a likely future behavior where we default to False and the deprecate this keyword, but we need to set it to true for backward compat

@mroeschke
Copy link
Member Author

Any other comments here @jbrockmendel?

@mroeschke mroeschke merged commit 563dd81 into pandas-dev:main May 30, 2023
@mroeschke mroeschke deleted the enh/stack/sort branch May 30, 2023 20:41
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023
* ENH: Add sort keyword to stack

* Removed commented

* Use np.sort
@@ -711,7 +714,7 @@ def _convert_level_number(level_num: int, columns: Index):
roll_columns = roll_columns.swaplevel(lev1, lev2)
this.columns = mi_cols = roll_columns

if not mi_cols._is_lexsorted():
if not mi_cols._is_lexsorted() and sort:
Copy link
Member

@rhshadrach rhshadrach Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is behaving as intended. In the example below, I would think that the rows would be swapped (the stacked 2nd level of the index being 0 1 instead of 1 0)

levels = ((0, 1), (1, 0))
stack_lev = 1
columns = MultiIndex(levels=levels, codes=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = DataFrame(columns=columns, data=[range(4)])
df_stacked = df.stack(stack_lev, sort=True)
print(df_stacked)
#      0  1
# 0 1  0  2
#   0  1  3

# Expected?
#      0  1
# 0 0  1  3
#   1  0  2

mi_cols._is_lexsorted() is checking if the codes are lexsorted (they are [[0, 0, 1, 1], [0, 1, 0, 1]] here) but not if the values are sorted (they are [[0, 1], [1, 0]] here).

Should sort be sorting the level values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. Yeah I agree and I think the level values should be sorted here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I plan on putting up a PR for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* ENH: Add sort keyword to stack

* Removed commented

* Use np.sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants