Skip to content

BUG: SparseDataFrame constructor issues #16807

Closed
@kbattocchi

Description

@kbattocchi

I've noticed a bunch of issues with SparseDataFrame's constructor, particularly around support for non-float64 datatypes:

>>> pd.SparseDataFrame(columns=list("ab"), index=range(4), default_fill_value=0.0)
    a   b
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN

- [ ] when data is not provided and the dtype is set to int64, the constructor call fails

>>> pd.SparseDataFrame(columns=list("ab"), index=range(4), dtype=np.int64)
ValueError: cannot convert float NaN to integer
  • when data is an empty sparse matrix with int64 entries and no dtype is specified, the type of the SparseDataFrame is still float64. Worse, even when dtype is specified, it still doesn't work.
>>> pd.SparseDataFrame(coo_matrix((4,3), dtype=np.int64))
    0   1   2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
>>> pd.SparseDataFrame(coo_matrix((4,3), dtype=np.int64), dtype=np.int64)
    0   1   2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
  • probably due to the same issue, given a sparse matrix with underlying elements of type int64, if any of the values in a column is 0 then that column will be treated as float64s:
>>> pd.SparseDataFrame(coo_matrix(np.arange(12).reshape(4,3), dtype=np.int64))
     0   1   2
0  NaN   1   2
1  3.0   4   5
2  6.0   7   8
3  9.0  10  11

(as long as at least one value is non-zero, then this can be worked around by setting default_fill_value; however, if all of the values in one column are zero then this no longer works).

Output of pd.show_versions()

INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.2
pytest: 2.9.2
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions