Description
Collecting some design thoughts here. I've spent ~4 hours today building a DatetimeTZArray. That's mostly working on its own, but now all of pandas needs to be updated to use it.
I think the most sensible way forward is to implement two new EAs
- DatetimeArray
- DatetimeTZArray
Initially I tried just DatetimeTZArray, since numpy natively supports datetimes without timezones. But that will require a lot of checks internal to pandas. Better to have both be EAs. One potential issue here is that DatetimeBlock
can apparently be consolidated (it doesn't inherit from NonConsolidatableMixin
like DatetimeTZBlock
). However, I've been unable to construct a DataFrame with consolidated DatetimeTZ blocks. @jreback do you know if that's possible?
In [20]: pd.DataFrame({"A": pd.date_range('2017', periods=12, tz='UTC'), 'B': pd.date_range('2017', periods=12, tz="UTC")})._data.consolidate()
Out[20]:
BlockManager
Items: Index(['A', 'B'], dtype='object')
Axis 1: RangeIndex(start=0, stop=12, step=1)
DatetimeTZBlock: slice(0, 1, 1), 1 x 12, dtype: datetime64[ns, UTC]
DatetimeTZBlock: slice(1, 2, 1), 1 x 12, dtype: datetime64[ns, UTC]
note the two blocks.
As far as user-facing changes go, we still haven't settled on the types of
Series[DatetimeDtype].values
Series[DatetimeTZDtype].values
We can support anything. Backwards compatibility would make those ndarray[datetime64ns]
(after a conversion to UTC & dropping the timezone for TZ-aware). Consistency with other EAs would have those be EAs). We could please nobody and make tz-naive an ndarray and tz-aware an EA. It's not clear to me what's best here.