Open
Description
Currently, the to_numpy()
implementation on the datetimelike arrays is quite simple (basically np.array(EA, dtype)
, which calls __array__
which returns/casts the underlying numpy array). This means that it basically follows numpy's casting rules, while for astype
we have a whole bunch of custom conversion rules.
Some examples:
In [1]: arr = pd.date_range("2012-01-01", periods=3).array
In [2]: arr
Out[2]:
<DatetimeArray>
['2012-01-01 00:00:00', '2012-01-02 00:00:00', '2012-01-03 00:00:00']
Length: 3, dtype: datetime64[ns]
# conversion to string (-> different formatting)
In [3]: arr.astype(str)
Out[3]: array(['2012-01-01', '2012-01-02', '2012-01-03'], dtype=object)
In [4]: arr.to_numpy(str)
Out[4]:
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-03T00:00:00.000000000'], dtype='<U48')
# conversion to float or timedelta (error vs working cast)
In [5]: arr.astype(float)
...
TypeError: Cannot cast DatetimeArray to dtype float64
In [6]: arr.to_numpy(float)
Out[6]: array([1.3253760e+18, 1.3254624e+18, 1.3255488e+18])
In [7]: arr.astype('timedelta64[ns]')
...
TypeError: Cannot cast DatetimeArray to dtype timedelta64[ns]
In [8]: arr.to_numpy('timedelta64[ns]')
Out[8]:
array([1325376000000000000, 1325462400000000000, 1325548800000000000],
dtype='timedelta64[ns]')
We might say: to_numpy()
is for converting to a numpy array, so in that case it is fine to follow numpy's casting rules. But it would still be good to explicitly decide on this.
On the other hand it's also strange to have two different sets of rules (and the fact that it uses numpy's rules here is somewhat an implementation detail).