Skip to content

sort_index fails on MutiIndex'ed DataFrame resulting from groupby.apply #15687

Closed
@8one6

Description

@8one6

I feel like this is part of a suite of bugs that come from a failure to notice when a MultiIndex that was once lexsorted loses its lexsortedness. I submitted one example of this a couple years ago (see #8017), but this related bug persists.

On Pandas 0.19.0:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame(
    np.random.randn(8, 2),
    index=pd.MultiIndex.from_product([['a', 'b'], ['big', 'small'], ['red', 'blue']], names=['letter', 'size', 'color']),
    columns=['near', 'far']
)
df = df.sort_index()

def my_func(group):
    group.index = ['newz', 'newa']
    return group

res = df.groupby(level=['letter', 'size']).apply(my_func).sort_index()

print res
###OUTPUT###
                       near       far
letter size                          
a      big   newz  0.978738  2.240893
             newa  1.764052  0.400157
       small newz  0.950088 -0.151357
             newa  1.867558 -0.977278
b      big   newz  0.144044  1.454274
             newa -0.103219  0.410599
       small newz  0.443863  0.333674
             newa  0.761038  0.121675

So before the apply command, df was properly sorted on the row index. However, as you can see, res is not properly sorted, even though its creation ends with a sort_index command.

This is a bug, right? I would think we want people to be able to assume that anytime they call sort_index that the result comes out lexicographically sorted, no?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions