Open
Description
I have a ~20x SAS7BDAT parser speedup ready to PR. It's a lot of changes. Goal is to avoid Python operations as much as possible. A preview of all of the changes can be found here: jonashaag#7
I want to contribute those changes to Pandas. Is it easier to review if I make a lot of small PRs, or do you prefer reviewing one large PR? Multiple small PRs will be something like: 5 PRs with 10% of the changes plus one larger PR with 50% of the changes. It will be very difficult to split the large PR any further.
One drawback of multiple small PRs is that it's more work for me and some of the changes may not seem useful if done in an isolated fashion.
PRs:
- SAS7BDAT parser: Fast byteswap #47403 10-20% improvement
- SAS7BDAT parser: Faster string parsing #47404 10-50% improvement
- SAS7BDAT parser: Speed up RLE/RDC decompression #47405 30-50% improvement
- SAS7BDAT parser: Improve subheader lookup performance #47656 10% improvement
- Do encoding and blank in Python (upcoming) 30-60% improvement