Skip to content

BUG: to_numpy on categorical data with tz aware datetime categories returns datetime64 #26406

Closed
@jorisvandenbossche

Description

@jorisvandenbossche
In [1]: s = pd.Series(pd.Categorical(pd.date_range("2012", periods=3, tz='Europe/Brussels')))

In [2]: s.to_numpy() 
Out[2]: 
array(['2011-12-31T23:00:00.000000000', '2012-01-01T23:00:00.000000000',
       '2012-01-02T23:00:00.000000000'], dtype='datetime64[ns]')

In [3]: s = pd.Series(pd.date_range("2012", periods=3, tz='Europe/Brussels'))

In [4]: s.to_numpy() 
Out[4]: 
array([Timestamp('2012-01-01 00:00:00+0100', tz='Europe/Brussels', freq='D'),
       Timestamp('2012-01-02 00:00:00+0100', tz='Europe/Brussels', freq='D'),
       Timestamp('2012-01-03 00:00:00+0100', tz='Europe/Brussels', freq='D')],
      dtype=object)

We opted for the default behaviour of to_numpy to be as much preserving the values, so for normal datetimetz series, this is object dtype. So probably we should do the same for Categorical?

@TomAugspurger since to_numpy is still new, I suppose we can see this is as a bug fix and not change in behaviour (to not go through deprecation cycle)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions