Skip to content

Reindexing a sparse data structure with a different index results in losing the dtype #26123

Open
@hdinh

Description

@hdinh

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

for dtype in (np.float32, np.int32, np.bool):
    s = pd.SparseSeries([1, 0], dtype=dtype)
    print('original:', s.dtype)
    print('reindexed:', s.reindex([0, 1, 2]).dtype)

Output

original: Sparse[float32, nan]
reindexed: Sparse[float64, nan]
original: Sparse[int32, 0]
reindexed: Sparse[float64, 0]
original: Sparse[bool, False]
reindexed: Sparse[float64, False]

Problem description

The output sparse series is always of type Sparse[float64] instead of the dtype passed in.

This looks to be a regression from v0.23.4. Perhaps it has to do with the new SparseArray rework in v0.24.x.

Expected Output

Ideally the dtype would not be lost. Although that wasn't exactly true in v0.23.4, the sparse float dtypes were not upcasted. My use case for casting to a sparse dtype is to save space, so the conversion to float64 breaks things.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-16-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: 0.12.0
xarray: None
IPython: 7.3.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 2.2.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugRegressionFunctionality that used to work in a prior pandas versionSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions