Skip to content

CLN: to_datetime internals #21702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 3, 2018
Merged

Conversation

mroeschke
Copy link
Member

  • tests passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

The internals of to_datetime is getting a bit unwieldy, so split out the _convert_listlike logic and origin shifting logic to _convert_listlike_datetime and adjust_to_origin methods respectively outside of to_datetime. The logic was not changed.

@pep8speaks
Copy link

pep8speaks commented Jul 2, 2018

Hello @mroeschke! Thanks for updating the PR.

Line 243:17: E722 do not use bare except'
Line 315:9: E722 do not use bare except'

Comment last updated on July 03, 2018 at 05:24 Hours UTC

@mroeschke mroeschke added the Clean label Jul 2, 2018
@codecov
Copy link

codecov bot commented Jul 2, 2018

Codecov Report

Merging #21702 into master will increase coverage by <.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21702      +/-   ##
==========================================
+ Coverage    91.9%   91.91%   +<.01%     
==========================================
  Files         158      158              
  Lines       49690    49695       +5     
==========================================
+ Hits        45670    45675       +5     
  Misses       4020     4020
Flag Coverage Δ
#multiple 90.28% <88.88%> (ø) ⬆️
#single 41.95% <51.28%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 85.22% <88.88%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cd2679...e9320f1. Read the comment docs.

@jreback jreback added this to the 0.24.0 milestone Jul 2, 2018
@jreback
Copy link
Contributor

jreback commented Jul 2, 2018

lgtm. can you do a perf check to make sure nothing changed (as the caching logic was lightly touched here)

@@ -38,7 +39,7 @@ def _guess_datetime_format_for_array(arr, **kwargs):
return _guess_datetime_format(arr[non_nan_elements[0]], **kwargs)


def _maybe_cache(arg, format, cache, tz, convert_listlike):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tz here was not used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tz was passed into the convert_listlike function further down, but now I am embedding it into convert_listlike with functools.partial in to_datetime

raise e


def _adjust_to_origin(arg, origin, unit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for separating this out

passed unit from to_datetime, must be 'D'
Returns
-------
ndarray of adjusted dates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessarily an ndarray? Couldn't it be a Timestamp?

need newline before Returns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yeah this can be a scalar value. Will fix that tonight.

@mroeschke
Copy link
Member Author

My asv setup is still a little broken, but here's a benchmark showing no performance hit (with cache)

In [1]: from pandas import *

In [3]: N = 100

In [4]: dup_string_with_tz = ['2000-02-11 15:00:00-0800'] * N

# this branch
In [5]: %timeit to_datetime(dup_string_with_tz, cache=True)
1.31 ms ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit to_datetime(dup_string_with_tz, cache=False)
2.3 ms ± 41.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Master
In [4]: %timeit to_datetime(dup_string_with_tz, cache=True)
1.29 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit to_datetime(dup_string_with_tz, cache=False)
2.31 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@jreback jreback merged commit 1de57da into pandas-dev:master Jul 3, 2018
@jreback
Copy link
Contributor

jreback commented Jul 3, 2018

thanks @mroeschke happily take refactorings to clean up things / make more readable

@mroeschke mroeschke deleted the reorganize_to_datetime branch July 3, 2018 15:13
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants