ENH: Add sort keyword to stack #53282

mroeschke · 2023-05-17T21:12:59Z

xref sort=False option to stack/unstack/pivot #15105 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

jbrockmendel · 2023-05-18T14:27:46Z

pandas/core/reshape/reshape.py

-    level_codes = sorted(set(mi_cols.codes[-1]))
+    level_codes = unique(mi_cols.codes[-1])
+    if sort:
+        level_codes = sorted(level_codes)


this changes level_codes to a list? can/should we preserve it as ndarray?

Good idea. Used np.sort here instead

jbrockmendel · 2023-05-18T14:31:26Z

if the default were False (mentioned in the #15105), would it be straightforward to reproduce the current behavior with something like df.stack(...).some_other_method()? If so, longer-term ideally we'd have users do that and not have a keyword

mroeschke · 2023-05-18T17:51:57Z

Yeah that's a likely future behavior where we default to False and the deprecate this keyword, but we need to set it to true for backward compat

mroeschke · 2023-05-30T20:33:50Z

Any other comments here @jbrockmendel?

* ENH: Add sort keyword to stack * Removed commented * Use np.sort

rhshadrach · 2023-06-12T22:43:24Z

pandas/core/reshape/reshape.py

@@ -711,7 +714,7 @@ def _convert_level_number(level_num: int, columns: Index):
            roll_columns = roll_columns.swaplevel(lev1, lev2)
        this.columns = mi_cols = roll_columns

-    if not mi_cols._is_lexsorted():
+    if not mi_cols._is_lexsorted() and sort:


I'm wondering if this is behaving as intended. In the example below, I would think that the rows would be swapped (the stacked 2nd level of the index being 0 1 instead of 1 0)

levels = ((0, 1), (1, 0)) stack_lev = 1 columns = MultiIndex(levels=levels, codes=[[0, 0, 1, 1], [0, 1, 0, 1]]) df = DataFrame(columns=columns, data=[range(4)]) df_stacked = df.stack(stack_lev, sort=True) print(df_stacked) # 0 1 # 0 1 0 2 # 0 1 3 # Expected? # 0 1 # 0 0 1 3 # 1 0 2

mi_cols._is_lexsorted() is checking if the codes are lexsorted (they are [[0, 0, 1, 1], [0, 1, 0, 1]] here) but not if the values are sorted (they are [[0, 1], [1, 0]] here).

Should sort be sorting the level values?

Nice find. Yeah I agree and I think the level values should be sorted here

Thanks, I plan on putting up a PR for this.

* ENH: Add sort keyword to stack * Removed commented * Use np.sort

mroeschke added 2 commits May 17, 2023 14:10

ENH: Add sort keyword to stack

855277b

Removed commented

2b333b9

mroeschke added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2023

mroeschke added this to the 2.1 milestone May 17, 2023

jbrockmendel reviewed May 18, 2023

View reviewed changes

mroeschke added 2 commits May 18, 2023 10:43

Merge remote-tracking branch 'upstream/main' into enh/stack/sort

796ed09

Use np.sort

66688c1

mroeschke added 2 commits May 22, 2023 12:23

Merge remote-tracking branch 'upstream/main' into enh/stack/sort

9cf3423

Merge remote-tracking branch 'upstream/main' into enh/stack/sort

f549144

jbrockmendel approved these changes May 30, 2023

View reviewed changes

mroeschke merged commit 563dd81 into pandas-dev:main May 30, 2023

mroeschke deleted the enh/stack/sort branch May 30, 2023 20:41

topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023

ENH: Add sort keyword to stack (pandas-dev#53282)

32c9ea9

* ENH: Add sort keyword to stack * Removed commented * Use np.sort

rhshadrach reviewed Jun 12, 2023

View reviewed changes

rhshadrach mentioned this pull request Jun 13, 2023

BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels #53636

Closed

This was referenced Jun 25, 2023

BUG: DataFrame.stack sometimes sorting the resulting index #53825

Merged

REGR: DataFrame.stack was sometimes sorting resulting index #53969

Closed

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023

ENH: Add sort keyword to stack (pandas-dev#53282)

d82fc63

* ENH: Add sort keyword to stack * Removed commented * Use np.sort

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add sort keyword to stack #53282

ENH: Add sort keyword to stack #53282

mroeschke commented May 17, 2023 •

edited

Loading

jbrockmendel May 18, 2023

mroeschke May 22, 2023

jbrockmendel commented May 18, 2023

mroeschke commented May 18, 2023

mroeschke commented May 30, 2023

rhshadrach Jun 12, 2023 •

edited

Loading

mroeschke Jun 12, 2023

rhshadrach Jun 12, 2023

mroeschke Jun 12, 2023

ENH: Add sort keyword to stack #53282

ENH: Add sort keyword to stack #53282

Conversation

mroeschke commented May 17, 2023 • edited Loading

jbrockmendel May 18, 2023

Choose a reason for hiding this comment

mroeschke May 22, 2023

Choose a reason for hiding this comment

jbrockmendel commented May 18, 2023

mroeschke commented May 18, 2023

mroeschke commented May 30, 2023

rhshadrach Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke Jun 12, 2023

Choose a reason for hiding this comment

rhshadrach Jun 12, 2023

Choose a reason for hiding this comment

mroeschke Jun 12, 2023

Choose a reason for hiding this comment

mroeschke commented May 17, 2023 •

edited

Loading

rhshadrach Jun 12, 2023 •

edited

Loading