Skip to content

SparseDataFrame loc multiple index is slow #14310

Closed
@DeoLeung

Description

@DeoLeung
In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 0, 0],[0,0,0]])

# non sparse
In [19]: %time df.loc[0]
CPU times: user 367 µs, sys: 14 µs, total: 381 µs
Wall time: 374 µs
Out[19]:
0    1
1    0
2    0
Name: 0, dtype: int64

In [20]: %time df.loc[[0]]
CPU times: user 808 µs, sys: 57 µs, total: 865 µs
Wall time: 827 µs
Out[20]:
   0  1  2
0  1  0  0

In [3]: df = df.to_sparse()

# sparse
In [5]: %time df.loc[0]
CPU times: user 429 µs, sys: 14 µs, total: 443 µs
Wall time: 436 µs
Out[5]:
0    1.0
1    0.0
2    0.0
Name: 0, dtype: float64
BlockIndex
Block locations: array([0], dtype=int32)
Block lengths: array([3], dtype=int32)

In [6]: %time df.loc[[0]]
CPU times: user 4.07 ms, sys: 406 µs, total: 4.48 ms
Wall time: 6.16 ms
Out[6]:
     0    1    2
0  1.0  0.0  0.0

It seems in SparseDataFrame, selecting as dataframe, the time increased much more than it does in DataFrame, I would expect it runs faster(on my 1m x 500 SparseDataFrame it takes seconds for the loc operation)

## INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_GB.UTF-8
LANG: en_GB.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.2
Cython: None
numpy: 1.11.0
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: None
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions