Closed
Description
Say you have to parse some nicely ISO formatted date strings, you can just parse this with todatetime
very fast. But if you were 'overcautious' and provided the format="%Y-%m-%d %H:%M:%S"
for safety, this seems to be around 20 times slower.
Would it be possible to provide a fastpath for certain provided format strings (as already exists for %Y%m%d
I think).
In [129]: s = pd.Series(pd.date_range('2000-01-01', periods=1000, freq='H'))
In [130]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.%f"))
In [131]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 406 µs per loop
In [132]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%dT%H:%M:%S.%f")
100 loops, best of 3: 9.73 ms per loop
In [133]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))
In [134]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 361 µs per loop
In [135]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%d %H:%M:%S")
100 loops, best of 3: 8.36 ms per loop
For non-standard formats, providing format does give a big improvement:
In [136]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y/%m/%d %H:%M:%S"))
In [137]: %timeit pd.to_datetime(s_as_dt_strings)
10 loops, best of 3: 92.2 ms per loop
In [138]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y/%m/%d %H:%M:%S")
100 loops, best of 3: 9.08 ms per loop