Skip to content

dtype for tz-naive DatetimeArray and TimedeltaArray #24662

Open
@TomAugspurger

Description

@TomAugspurger

Right now, Datetime.Array.dtype can be either np.dtype('M8[ns]) or a DatetimeTZDtype, depending on whether the values are tz-naive or tz-aware. This means that while DatetimeArray[tz-naive] is an instance of ExtensionArray, it doesn't satisfy the minimum ExtensionArray API, which requires that array.dtype be an ExtensionDtype.

In [4]: pd.date_range('2000', periods=4)._data.dtype
Out[4]: dtype('<M8[ns]')

In [5]: pd.date_range('2000', periods=4, tz="CET")._data.dtype
Out[5]: datetime64[ns, CET]

The causes some type-unsoundness for places that are supposed to return an ExtensionArray. The two most prominent being pd.array and Series.array. As an example, following isn't necessarily safe code

def f(ser: Series) -> Callable: 
    return ser.array.dtype.construct_array_type

that will fail for tz-naive datetime data, because its .dtype is a NumPy dtype.


Proposal:

  1. Make a DatetimeDtype (or allow DatetimeTZDtype to have tz=None). Make a TimedeltaArray.dtype
  2. Ensure that DatetimeArray.dtype and TimedeltaArray.dtype is always an ExtensionDtype
  3. Wrap Series.dtype and DatetimeIndex.dtype to continue returning the NumPy dtype

The last step is to avoid breaking code relying on Series[tz-naive].dtype being a NumPy dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeDtype ConversionsUnexpected or buggy dtype conversionsEnhancementExtensionArrayExtending pandas with custom dtypes or arrays.PDEP missing valuesIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprintTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions