Closed
Description
Bug is simple in concept, multi-index with column level names loses those names when going into sparse dataframes.
Minimal example - first create a multi-index dataframe:
In[2]: import pandas as pd
In[3]: miindex = pd.MultiIndex.from_product([["x","y"], ["10","20"]],names=['row-foo', 'row-bar'])
micol = pd.MultiIndex.from_product([['a','b','c'], ["1","2"]],names=['col-foo', 'col-bar'])
df = pd.DataFrame(index=miindex, columns=micol).sortlevel().sortlevel(axis=1)
df = df.fillna(value=3.14)
df
Out[3]:
col-foo a b c
col-bar 1 2 1 2 1 2
row-foo row-bar
x 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
y 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
This gives us a nice test multi-index with column and row level names. Now if I make a sparse matrix out of that and show it, the column level names are gone.
In[4]: ds = df.to_sparse()
ds
Out[4]:
a b c
1 2 1 2 1 2
row-foo row-bar
x 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
y 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
And if I convert the sparse version back to dense those level names are still gone.
In[6]: ds.to_dense()
Out[6]:
a b c
1 2 1 2 1 2
row-foo row-bar
x 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
y 10 3.14 3.14 3.14 3.14 3.14 3.14
20 3.14 3.14 3.14 3.14 3.14 3.14
I AM aware that displaying the sparse version calls to_dense() but the loss appears to be happening at the conversion to sparse. I'm exploring moving to sparse to reduce memory usage for a code base and my attempts to access the levels within the sparse dataframe generate "KeyError: 'Level not found'"