-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: lazify IO imports #52421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: lazify IO imports #52421
Conversation
generally in favour - do you have an estimate of how much this helps? |
The measurements with The next big chunk available would be to delay core.groupby import, which is 86ms. Doing that requires some yak-shaving bc core.window imports it, so that needs to be made lazy (and docstrings moved xref #51823). Also dask references pd.core.groupby.groupby.something_i_forget_what at import-time to do a docstring pinning, so we'd need to do something about that. Following that is some smaller-but-easier stuff: stdlib imports like shutil and stuff in pd.compat.compressors that are relatively expensive, dateutil (xref #52659). The big ones are pyarrow (87ms) and numpy (86ms). making numpy import lazy is probably not worth the effort (though asking numpy to lazify parts of its own imports might be worthwhile). I'd like to see the pyarrow import made lazy, especially if we don't make it required. |
# error: Cannot determine type of 'index_col' | ||
kwds["allow_leading_cols"] = ( | ||
self.index_col is not False # type: ignore[has-type] | ||
) | ||
kwds["allow_leading_cols"] = self.index_col is not False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does lazifying imports change how things are type-checked here?
is it that it than mypy
can't follow imports, and so we end up with a lot more Any
s?
If so, then there is a tradeoff to consider...slightly faster import pandas
, but fewer type hints whilst developing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does lazifying imports change how things are type-checked here?
hmm i have no idea. cc @simonjayhawkins ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea @Dr-Irv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some mypy funkyness.
If you insert the line reveal_type(self.index_col)
in main, you get:
pandas\io\parsers\c_parser_wrapper.py:68: error: Cannot determine type of "index_col" [has-type]
pandas\io\parsers\c_parser_wrapper.py:68: note: Revealed type is "Any"
With this PR, the first message about "Cannot determine type" goes away. But it still sees the type of self.index_col
as Any
. So I guess this is a good thing
No clear interest here, closing. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.