Description
Code Sample, a copy-pastable example if possible
>>> import pandas as pd
>>> c = pd.Categorical(list('abcabc'))
>>> c
[a, b, c, a, b, c]
Categories (3, object): [a, b, c]
>>> pd.Series(c).dtype
CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)
>>> pd.Series(c).to_sparse().dtype
dtype('O')
>>> pd.SparseArray(c)
[a, b, c, a, b, c]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3, 4, 5], dtype=int32)
>>> pd.SparseArray(c).dtype
dtype('O')
>>> pd.SparseSeries(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/joel/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/core/sparse/series.py", line 175, in __init__
length = len(index)
TypeError: object of type 'NoneType' has no len()
>>> pd.DataFrame({'a': c})['a']
0 a
1 b
2 c
3 a
4 b
5 c
Name: a, dtype: category
Categories (3, object): [a, b, c]
>>> pd.SparseDataFrame({'a': c})['a']
0 a
1 b
2 c
3 a
4 b
5 c
Name: a, dtype: object
BlockIndex
Block locations: array([0], dtype=int32)
Block lengths: array([6], dtype=int32)
Problem description
- Categoricals are upcast to object dtype when put into
SparseArray
andSparseDataFrame
(or when callingSeries.to_sparse()
). This is inconsistent with the categorical dtype retained by dense Series and DataFrame. - SparseSeries raises an error when constructed with a categorical argument. This is inconsistent with the SparseArray and SparseDataFrame behaviour.
Expected Output
SparseDataFrame({'a': c})['a'].dtype == SparseSeries(c).dtype == SparseArray(c).dtype == Series(c).dtype
or at a minimum:
SparseSeries(c)
raises no error, and produces object dtype.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
pandas: 0+unknown
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None