Skip to content

BUG: Unexpected pd.to_datetime result #12583

Closed
@dolaameng

Description

@dolaameng

Code Sample, a copy-pastable example if possible

pd.to_datetime(["31/12/2014", "10/03/2011"])
Out[37]:
DatetimeIndex(['2014-12-31', '2011-10-03'], dtype='datetime64[ns]', freq=None)

Expected Output

DatetimeIndex(['2014-12-31', '2011-03-10'], dtype='datetime64[ns]', freq=None)

_Reason: Expect a default behavior (without extra format or dayfirst parameter) that implements a consistent datetime parsing within the same column._

Comments

Within the internal helper function pd.tseries.tools._to_datetime, the datetime format will be first inferred based on the first non-nan element, followed by a parsing via tslib.array_strptime(Line 357), which gives expected result.

pd.tseries.tools._guess_datetime_format_for_array(["31/12/2014", "10/03/2011"])
Out[25]:
'%d/%m/%Y'


pd.tslib.array_strptime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), "%d/%m/%Y")
Out[46]:
array(['2014-12-31T08:00:00.000000000+0800',
       '2011-03-10T08:00:00.000000000+0800'], dtype='datetime64[ns]')

But the fallback function tslib.array_to_datetime doesn't use the inferred format(Line 373)

pd.tslib.array_to_datetime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), format="%d/%m/%Y")
Out[51]:
array(['2014-12-31T08:00:00.000000000+0800',
       '2011-10-03T08:00:00.000000000+0800'], dtype='datetime64[ns]')

_This is caused by the default value (False) of parameter infer_datetime_format to the _to_datetime (Line 285)
Suggestion: make parameter infer_datetime_format to the _to_datetime default to True, it won't solve all problems, but at least half of them, depending on the format of the first element.
_

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.13-44.el6uek.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeDuplicate ReportDuplicate issue or pull request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions