Skip to content

pd.to_datetime much slower with supplied format than when format is inferred #10178

Closed
@dsimmie

Description

@dsimmie

It is much slower when converting a date string to supply a date format for a column than for it to be inferred. I would've though there should be less work to do when the format is known (and supplied)

To test

df = DataFrame({'date_text':["2015-05-18" for i in range(10**6)]})
%timeit pd.to_datetime(df['date_text'],infer_datetime_format=True, box=False).values.view('i8')/10**9
10 loops, best of 3: 115 ms per loop
#Top line from %prun of same command:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1       0.095    0.095    0.095  0.095   {pandas.tslib.array_to_datetime} 

%timeit pd.to_datetime(df['date_text'],format="%Y-%m-%d", box=False).values.view('i8')/10**9
1 loops, best of 3: 2.27 s per loop
#Top line from %prun of same command:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1       2.282    2.282    2.282  2.282   {pandas.tslib.array_strptime}

This plot is taken from this S/O post which shows the difference over a larger range of sizes (and compared to other methods).

perf_comparison

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.10
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 2.2.0-b1
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
rpy2: None
sqlalchemy: 1.0.0
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions