Description
Assume we have a Categorical, and want to convert to a dense array (not encoded). We have np.asarray(..)
and the to_dense()
method (which uses asarray under the hood):
In [1]: cat = pd.Categorical(['a', 'b', 'a'])
In [2]: np.asarray(cat)
Out[2]: array(['a', 'b', 'a'], dtype=object)
In [3]: cat.to_dense()
Out[3]: array(['a', 'b', 'a'], dtype=object)
In addition, we also have get_values
:
In [4]: cat.get_values()
Out[4]: array(['a', 'b', 'a'], dtype=object)
get_values
is mostly the same, with the exception that returns an Index for datetime/period/timedelta, and an object array for integers if there are missing values instead of float array:
In [10]: cat = pd.Categorical(pd.date_range("2012", periods=3))
In [11]: cat.to_dense()
Out[11]:
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
In [12]: cat.get_values()
Out[12]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')
With the result that it preserves somewhat more the dtype (although only specifically for datetime-like, it will not do it for any EA)
While looking into the deprecation of get_values
(#26409), I was wondering: do we want some method to actually get a "dense" version of the array, but with the exact same dtype? (so returning an EA in case the categories have an extension dtype)
And should we deprecate to_dense()
?