Skip to content

API: getting the "densified" values of a Categorical (preserving categories dtype) ? #26410

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Assume we have a Categorical, and want to convert to a dense array (not encoded). We have np.asarray(..) and the to_dense() method (which uses asarray under the hood):

In [1]: cat = pd.Categorical(['a', 'b', 'a'])

In [2]: np.asarray(cat) 
Out[2]: array(['a', 'b', 'a'], dtype=object)

In [3]: cat.to_dense() 
Out[3]: array(['a', 'b', 'a'], dtype=object)

In addition, we also have get_values:

In [4]: cat.get_values() 
Out[4]: array(['a', 'b', 'a'], dtype=object)

get_values is mostly the same, with the exception that returns an Index for datetime/period/timedelta, and an object array for integers if there are missing values instead of float array:

In [10]: cat = pd.Categorical(pd.date_range("2012", periods=3))

In [11]: cat.to_dense()
Out[11]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

In [12]: cat.get_values()
Out[12]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')

With the result that it preserves somewhat more the dtype (although only specifically for datetime-like, it will not do it for any EA)

While looking into the deprecation of get_values (#26409), I was wondering: do we want some method to actually get a "dense" version of the array, but with the exact same dtype? (so returning an EA in case the categories have an extension dtype)

And should we deprecate to_dense() ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions