Skip to content

Sparse Dataframe with multiindex error when slicing  #21231

Closed
@Xparx

Description

@Xparx

Code Sample,

import pandas as pd
import numpy as np
spdf = pd.DataFrame(np.random.rand(5, 5) > 0.7).astype(float).to_sparse(fill_value=0)

spdf.columns = pd.MultiIndex.from_tuples((("A", 1), ("A", 1), ("B", 1), ("B", 2), ("C", 2)))

spdf["A"] # Throws error
spdf.to_dense()["A"] # Works

Problem description

Could not find this specific issue among the sparse issues here.
It seems that the sparse dataframe can not handle mutliindex slicing in the way that dense (regular dataframes can).

The error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-bf1d8d920880> in <module>()
----> 1 spdf["A"]

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/sparse/frame.py in __getitem__(self, key)
    439             return self._getitem_array(key)
    440         else:
--> 441             return self._get_item_cache(key)
    442 
    443     def get_value(self, index, col, takeable=False):

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4130                 raise TypeError("cannot label index with a null key")
   4131 
-> 4132             indexer = self.items.get_indexer_for([item])
   4133             return self.reindex_indexer(new_axis=self.items[indexer],
   4134                                         indexer=indexer, axis=0,

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_indexer_for(self, target, **kwargs)
   3367         if self.is_unique:
   3368             return self.get_indexer(target, **kwargs)
-> 3369         indexer, _ = self.get_indexer_non_unique(target, **kwargs)
   3370         return indexer
   3371 

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/multi.py in get_indexer_non_unique(self, target)
   2046     @Appender(_index_shared_docs['get_indexer_non_unique'] % _index_doc_kwargs)
   2047     def get_indexer_non_unique(self, target):
-> 2048         return super(MultiIndex, self).get_indexer_non_unique(target)
   2049 
   2050     def reindex(self, target, method=None, level=None, limit=None,

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_indexer_non_unique(self, target)
   3357             tgt_values = target._ndarray_values
   3358 
-> 3359         indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
   3360         return _ensure_platform_int(indexer), missing
   3361 

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer_non_unique()

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes()

~/.virtualenvs/default/lib/python3.5/site-packages/pandas/core/indexes/multi.py in _codes_to_ints(self, codes)
     72         # Shift the representation of each level by the pre-calculated number
     73         # of bits:
---> 74         codes <<= self.offsets
     75 
     76         # Now sum and OR are in fact interchangeable. This is a simple

ValueError: non-broadcastable output operand with shape (1,1) doesn't match the broadcast shape (1,2)

Expected Output

The sparse dataframe should have the same capabilities as the dense one.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.6.0-040600-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    SparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions