Skip to content

BUG - sparse dataframes lose multi-index column names #11600

Closed
@Ezekiel-Kruglick

Description

@Ezekiel-Kruglick

From SO: http://stackoverflow.com/questions/33702198/do-python-pandas-sparse-dataframes-lose-multi-index-column-names-or-am-i-doing-i

Bug is simple in concept, multi-index with column level names loses those names when going into sparse dataframes.

Minimal example - first create a multi-index dataframe:

In[2]: import pandas as pd
In[3]: miindex = pd.MultiIndex.from_product([["x","y"], ["10","20"]],names=['row-foo', 'row-bar'])
micol = pd.MultiIndex.from_product([['a','b','c'], ["1","2"]],names=['col-foo', 'col-bar'])
df = pd.DataFrame(index=miindex, columns=micol).sortlevel().sortlevel(axis=1)
df = df.fillna(value=3.14)
df
Out[3]: 
col-foo             a           b           c      
col-bar             1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

This gives us a nice test multi-index with column and row level names. Now if I make a sparse matrix out of that and show it, the column level names are gone.

In[4]: ds = df.to_sparse()
ds
Out[4]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

And if I convert the sparse version back to dense those level names are still gone.

In[6]: ds.to_dense()
Out[6]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

I AM aware that displaying the sparse version calls to_dense() but the loss appears to be happening at the conversion to sparse. I'm exploring moving to sparse to reduce memory usage for a code base and my attempts to access the levels within the sparse dataframe generate "KeyError: 'Level not found'"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions