Skip to content

BUG: SparseDataFrame coerces input to dense matrix if string-type index is given #22630

Closed
@scottgigante

Description

@scottgigante

Code Sample, a copy-pastable example if possible

import scipy.sparse as sp
import pandas as pd
import numpy as np
shape = (500000, 50000)
data = np.repeat(1, 10000)
i = np.random.choice(shape[0], 10000, replace=False)
j = np.random.choice(shape[1], 10000, replace=False)
X = sp.coo_matrix((data, (i, j)), shape=shape)

# this works fine
df = pd.SparseDataFrame(X, index=np.arange(shape[0]))
df.index = np.arange(shape[0]).astype(str)
# this requires 400GB of memory and takes an hour
df = pd.SparseDataFrame(X, index=np.arange(shape[0]).astype(str))

Problem description

pd.SparseDataFrame densifies its input if it is handed a string index. This is extremely undesirable and very confusing for the user.

Expected Output

The data frame should be created in a matter of seconds, without coercing to a dense matrix.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.3-arch1-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.7.3
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.5
numpy: 1.15.0
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.7
feather: 0.4.0
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.4
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions