-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: sparse to_csv #49066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: sparse to_csv #49066
Conversation
Improves to_csv performance for sparse matric by casting to dense before initializing DataFrameFormatter. Results in many fewer calls to `to_native_types` which saves time.
This should improve memory consumption by only materializing one chunk at a time
Revised asv benchmarks (upstream/main vs pr) after moving materialization to the chunk level to preserve memory. Chunk-level materialization takes longer than all-at-once materialization but is still a significant improvement over upstream/main.
|
Please don't wait for reviewers to resolve conversations |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
Improves
NDFrame.to_csv
performance for sparse dataframe by casting to dense before initializingDataFrameFormatter
. Results in many fewer calls toto_native_types
which saves time. Added a new ASV benchmark based on the example provided by OP in #41023.Benchmark results:
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.