Closed
Description
From #26848. When you pass a ndarray to the DataFrame constructor and specify a dtype, this does a "plain" numpy astype
, which can have some unwanted side-effects (that we avoid in other parts of pandas) such as np.nan -> integer conversion and out of bounds timestamps:
In [26]: pd.DataFrame(np.array([[1, np.nan], [2, 3]]), dtype='int64')
Out[26]:
0 1
0 1 -9223372036854775808
1 2 3
In [27]: pd.DataFrame(np.array([['2300-01-01']], dtype='datetime64[D]'), dtype='datetime64[ns]')
Out[27]:
0
0 1715-06-13 00:25:26.290448384
Both cases are guarded in DataFrame.astype:
In [29]: pd.DataFrame(np.array([[1, np.nan], [2, 3]])).astype(dtype='int64')
...
~/scipy/pandas/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
678
679 if not np.isfinite(arr).all():
--> 680 raise ValueError('Cannot convert non-finite values (NA or inf) to '
681 'integer')
682
ValueError: Cannot convert non-finite values (NA or inf) to integer
In [30]: pd.DataFrame(np.array([['2300-01-01']], dtype='datetime64[D]')).astype(dtype='datetime64[ns]')
...
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2300-01-01 00:00:00
I suppose we want to do such a safe astype in DataFrame constructor itself as well?