Skip to content

Fastpath for to_datetime when providing ISO format as keyword? #8154

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Say you have to parse some nicely ISO formatted date strings, you can just parse this with todatetime very fast. But if you were 'overcautious' and provided the format="%Y-%m-%d %H:%M:%S" for safety, this seems to be around 20 times slower.
Would it be possible to provide a fastpath for certain provided format strings (as already exists for %Y%m%d I think).

In [129]: s = pd.Series(pd.date_range('2000-01-01', periods=1000, freq='H'))

In [130]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.%f"))

In [131]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 406 µs per loop

In [132]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%dT%H:%M:%S.%f")
100 loops, best of 3: 9.73 ms per loop
In [133]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))

In [134]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 361 µs per loop

In [135]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%d %H:%M:%S")
100 loops, best of 3: 8.36 ms per loop

For non-standard formats, providing format does give a big improvement:

In [136]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y/%m/%d %H:%M:%S"))

In [137]: %timeit pd.to_datetime(s_as_dt_strings)
10 loops, best of 3: 92.2 ms per loop

In [138]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y/%m/%d %H:%M:%S")
100 loops, best of 3: 9.08 ms per loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions