-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: 34% faster Series.to_dict #50089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improve the performance of Series.to_dict() by not including a generator that is not necessary. Local tests showed significant performance improvements.
Nice! Can you post the results from your local test? |
Here is the test. Performance improves by 34% import pandas as pd
import random
from timeit import timeit
df = pd.Series([random.random() for _ in range(1000)], dtype=float)
print(timeit("df.to_dict()", globals=globals(), number=10000)) Before change: |
Note the performance gain only applies when the series uses Python native types. Also out of curiosity, what's the release timeline for Pandas 2.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry, should have noticed that the generator was no longer necessary in my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. No need for an entry in the whatsnew, the whatsnew from the PR linked in the OP here covers this as well.
Thanks @staadecker - very nice! |
Thank you all! |
Improve the performance of Series.to_dict() by not including a generator that is not necessary. Local tests showed significant performance improvements.
Note this builds off the changes of #46487 from @RogerThomas