Skip to content

Pandas wrongly read csr_matrix with explicit 0 #28992

Closed
@PrimozGodec

Description

@PrimozGodec

Code Sample, a copy-pastable example if possible

>>> a = csr_matrix([[1, 0, 0], [2, 2, 0], [0, 2, 2]])
>>> a[0, 2] = 0
>>> a.todense()
matrix([[1, 0, 0],
        [2, 2, 0],
        [0, 2, 2]], dtype=int64)

>>> pd.DataFrame.sparse.from_spmatrix(a)
   0  1  2
0  1  0  0
1  2  2  0
2  0  2  0

Observe the difference in the bottom left value in the second matrix.

Problem description

When the sparse matrix (in our case the matrix with the explicit zero) is imported in pandas with from_spmatrix function at least one value that should not be zero becomes a zero. See the example below. Both matrices should be the same.

Expected Output

When converting sparse matrix to Pandas DataFrame values of the sparse array should stay the same.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.8.0
Cython : None
pytest : 5.2.0
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.2.1

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions