Description
Code Sample, a copy-pastable example if possible
pd.to_datetime(["31/12/2014", "10/03/2011"])
Out[37]:
DatetimeIndex(['2014-12-31', '2011-10-03'], dtype='datetime64[ns]', freq=None)
Expected Output
DatetimeIndex(['2014-12-31', '2011-03-10'], dtype='datetime64[ns]', freq=None)
_Reason: Expect a default behavior (without extra format
or dayfirst
parameter) that implements a consistent datetime parsing within the same column._
Comments
Within the internal helper function pd.tseries.tools._to_datetime
, the datetime format will be first inferred based on the first non-nan element, followed by a parsing via tslib.array_strptime
(Line 357), which gives expected result.
pd.tseries.tools._guess_datetime_format_for_array(["31/12/2014", "10/03/2011"])
Out[25]:
'%d/%m/%Y'
pd.tslib.array_strptime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), "%d/%m/%Y")
Out[46]:
array(['2014-12-31T08:00:00.000000000+0800',
'2011-03-10T08:00:00.000000000+0800'], dtype='datetime64[ns]')
But the fallback function tslib.array_to_datetime
doesn't use the inferred format(Line 373)
pd.tslib.array_to_datetime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), format="%d/%m/%Y")
Out[51]:
array(['2014-12-31T08:00:00.000000000+0800',
'2011-10-03T08:00:00.000000000+0800'], dtype='datetime64[ns]')
_This is caused by the default value (False
) of parameter infer_datetime_format
to the _to_datetime
(Line 285)
Suggestion: make parameter infer_datetime_format
to the _to_datetime
default to True
, it won't solve all problems, but at least half of them, depending on the format of the first element._
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.13-44.el6uek.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None