Description
In pandas>=0.23.0
import pandas as pd
good_input = np.array(['bad_format', '2013-06-07'], dtype='O')
bad_input = np.array([np.string_('bad_format'), '2013-06-07'], dtype='O')
pd.to_datetime(good_input, errors='coerce')
pd.to_datetime(bad_input, errors='coerce')
returns
Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: array(['bad_format', '2013-06-07'], dtype=object)
while when using pandas==0.22.0 everything is fine and returns
Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
More precisely, the behaviour of pandas._libs.array_to_datetime
have been changed between 0.22.0 and 0.23.0:
from pandas._libs import tslib
tslib.array_to_datetime(good_input, errors='coerce')
tslib.array_to_datetime(bad_input, errors='coerce')
gives
Out[7]:
array([ 'NaT', '2013-06-07T00:00:00.000000000'],
dtype='datetime64[ns]')
Out[8]: array(['bad_format', '2013-06-07'], dtype=object)
Expected Output
the same as it was in pandas==0.22.0
Out[4]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Out[5]: DatetimeIndex(['NaT', '2013-06-07'], dtype='datetime64[ns]', freq=None)
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1067-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 39.1.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.7.1
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.5
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.5.0
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None