Open
Description
Previously, you check for the SparseDataFrame class, but now we want people to store sparse data in normal DataFrames/Series.
So for a library developer, what is the recommended way to check for sparse data?
For a Series (or SparseArray), the pd.api.types.is_sparse
still works on a plain Series with sparse data:
In [42]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1])})
In [43]: s = df['a']
In [44]: type(s)
Out[44]: pandas.core.series.Series
In [45]: pd.api.types.is_sparse(s)
Out[45]: True
but for a DataFrame that function does not work. So you could apply that method on each column/dtype, and check with any
or all
depending on your requirements:
In [56]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1]), 'b': [1, 2, 3]})
In [57]: df.dtypes.apply(pd.api.types.is_sparse)
Out[57]:
a True
b False
dtype: bool
In [58]: df.dtypes.apply(pd.api.types.is_sparse).any()
Out[58]: True
In [59]: df.dtypes.apply(pd.api.types.is_sparse).all()
Out[59]: False
So that actually works quite OK, now I am writing it.
Do we want to make this even easier somehow? Or document this?