Skip to content

Recommended way to check for sparse data (DataFrame or Series) #26706

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Previously, you check for the SparseDataFrame class, but now we want people to store sparse data in normal DataFrames/Series.

So for a library developer, what is the recommended way to check for sparse data?

For a Series (or SparseArray), the pd.api.types.is_sparse still works on a plain Series with sparse data:

In [42]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1])}) 

In [43]: s = df['a']

In [44]: type(s)
Out[44]: pandas.core.series.Series

In [45]: pd.api.types.is_sparse(s)
Out[45]: True

but for a DataFrame that function does not work. So you could apply that method on each column/dtype, and check with any or all depending on your requirements:

In [56]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1]), 'b': [1, 2, 3]})

In [57]: df.dtypes.apply(pd.api.types.is_sparse)
Out[57]: 
a     True
b    False
dtype: bool

In [58]: df.dtypes.apply(pd.api.types.is_sparse).any()
Out[58]: True

In [59]: df.dtypes.apply(pd.api.types.is_sparse).all()
Out[59]: False

So that actually works quite OK, now I am writing it.
Do we want to make this even easier somehow? Or document this?

Metadata

Metadata

Labels

EnhancementNeeds DiscussionRequires discussion from core team before further actionSparseSparse Data Type

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions