Skip to content

BUG: set_index with passing key of first level of MI produces invalid result #24683

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

I didn't find yet a small reproducible example, but with the actual (also small) data, I see the following problem:

In [47]: subjects_url = 'https://physionet.org/pn4/sleep-edfx/ST-subjects.xls'   
    ...: data = pd.read_excel(subjects_url, header=[0, 1])

In [48]: data.head()                                            
Out[48]: 
  Subject - age - sex           Placebo night            Temazepam night           
                   Nr Age M1/F2      night nr lights off        night nr lights off
0                   1  60     1             1   23:01:00               2   23:48:00
1                   2  35     2             2   23:27:00               1   00:00:00
2                   4  18     2             1   23:53:00               2   22:37:00
3                   5  32     2             2   23:23:00               1   23:34:00
4                   6  35     2             1   23:28:00               2   23:26:00

When doing a set_index with a key of the first level of the index (which I think is not supported), it actually gives a result, but an invalid one, which is illustrated by the repr that is erroring:

In [49]: res = data.set_index('Subject - age - sex')                       

In [50]: res                                         
Out[50]: ---------------------------------------------------------------------------
...
TypeError: unsupported format string passed to numpy.ndarray.__format__

The invalid part is that res.index seems to be an Int64Index, but is backed by a 2D array:

In [51]: res.index                                                     
Out[51]: 
Int64Index([ 1, 60,  1,  2, 35,  2,  4, 18,  2,  5, 32,  2,  6, 35,  2,  7, 51,
             2,  8, 66,  2,  9, 47,  1, 10, 20,  2, 11, 21,  2, 12, 21,  1, 13,
            22,  1, 14, 20,  1, 15, 66,  2, 16, 79,  2, 17, 48,  2, 18, 53,  2,
            19, 28,  2, 20, 24,  1, 21, 34,  2, 22, 56,  1, 24, 48,  2],
           dtype='int64', name='Subject - age - sex')

In [52]: res.index.values                                                  
Out[52]: 
array([[ 1, 60,  1],
       [ 2, 35,  2],
       [ 4, 18,  2],
       [ 5, 32,  2],
       [ 6, 35,  2],
       [ 7, 51,  2],
       [ 8, 66,  2],
       [ 9, 47,  1],
       [10, 20,  2],
       [11, 21,  2],
       [12, 21,  1],
       [13, 22,  1],
       [14, 20,  1],
       [15, 66,  2],
       [16, 79,  2],
       [17, 48,  2],
       [18, 53,  2],
       [19, 28,  2],
       [20, 24,  1],
       [21, 34,  2],
       [22, 56,  1],
       [24, 48,  2]])

Done with up to date master (0.24.dev)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions