PERF: lazify IO imports #52421

jbrockmendel · 2023-04-04T20:23:09Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/__init__.py

MarcoGorelli · 2023-04-21T18:09:09Z

generally in favour - do you have an estimate of how much this helps?

jbrockmendel · 2023-04-21T18:57:05Z

do you have an estimate of how much this helps?

The measurements with python -X importtime -c "import pandas as pd" are extremely noisy. I eyeball it at roughly 511ms in main 472ms in this branch.

The next big chunk available would be to delay core.groupby import, which is 86ms. Doing that requires some yak-shaving bc core.window imports it, so that needs to be made lazy (and docstrings moved xref #51823). Also dask references pd.core.groupby.groupby.something_i_forget_what at import-time to do a docstring pinning, so we'd need to do something about that.

Following that is some smaller-but-easier stuff: stdlib imports like shutil and stuff in pd.compat.compressors that are relatively expensive, dateutil (xref #52659).

The big ones are pyarrow (87ms) and numpy (86ms). making numpy import lazy is probably not worth the effort (though asking numpy to lazify parts of its own imports might be worthwhile). I'd like to see the pyarrow import made lazy, especially if we don't make it required.

MarcoGorelli · 2023-04-22T19:03:24Z

pandas/io/parsers/c_parser_wrapper.py

-        # error: Cannot determine type of 'index_col'
-        kwds["allow_leading_cols"] = (
-            self.index_col is not False  # type: ignore[has-type]
-        )
+        kwds["allow_leading_cols"] = self.index_col is not False


why does lazifying imports change how things are type-checked here?

is it that it than mypy can't follow imports, and so we end up with a lot more Anys?

If so, then there is a tradeoff to consider...slightly faster import pandas, but fewer type hints whilst developing?

why does lazifying imports change how things are type-checked here?

hmm i have no idea. cc @simonjayhawkins ?

any idea @Dr-Irv?

This is some mypy funkyness.

If you insert the line reveal_type(self.index_col) in main, you get:

pandas\io\parsers\c_parser_wrapper.py:68: error: Cannot determine type of "index_col" [has-type] pandas\io\parsers\c_parser_wrapper.py:68: note: Revealed type is "Any"

With this PR, the first message about "Cannot determine type" goes away. But it still sees the type of self.index_col as Any. So I guess this is a good thing

jbrockmendel · 2023-05-03T17:07:32Z

No clear interest here, closing.

PERF: lazify IO imports

fe18dcb

twoertwein reviewed Apr 4, 2023

View reviewed changes

pandas/__init__.py Show resolved Hide resolved

jbrockmendel added 3 commits April 4, 2023 16:43

lint fixup

e8800c1

lint fixup

42a3c8f

mypy fixup

09f1c7b

Merge branch 'main' into perf-import-2

5cc9e64

jbrockmendel added 2 commits April 21, 2023 12:41

lazy io imports in pd.api.typing

a9e1f39

__future__ annotations

1cce118

MarcoGorelli reviewed Apr 22, 2023

View reviewed changes

jbrockmendel closed this May 3, 2023

jbrockmendel deleted the perf-import-2 branch May 3, 2023 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: lazify IO imports #52421

PERF: lazify IO imports #52421

Uh oh!

jbrockmendel commented Apr 4, 2023

Uh oh!

Uh oh!

MarcoGorelli commented Apr 21, 2023

Uh oh!

jbrockmendel commented Apr 21, 2023

Uh oh!

MarcoGorelli Apr 22, 2023

Uh oh!

jbrockmendel Apr 22, 2023

Uh oh!

jbrockmendel Apr 29, 2023

Uh oh!

Dr-Irv Apr 30, 2023

Uh oh!

jbrockmendel commented May 3, 2023

Uh oh!

Uh oh!

Uh oh!

PERF: lazify IO imports #52421

PERF: lazify IO imports #52421

Uh oh!

Conversation

jbrockmendel commented Apr 4, 2023

Uh oh!

Uh oh!

MarcoGorelli commented Apr 21, 2023

Uh oh!

jbrockmendel commented Apr 21, 2023

Uh oh!

MarcoGorelli Apr 22, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Apr 22, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Apr 29, 2023

Choose a reason for hiding this comment

Uh oh!

Dr-Irv Apr 30, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented May 3, 2023

Uh oh!

Uh oh!