Skip to content

Performance optimizations for data reading/writing #942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 24, 2024

Conversation

jmschonfeld
Copy link
Contributor

I've determined a few bottlenecks in the data reading/writing path that this PR addresses:

  1. When writing data, cleaning up the temporary file used FileManager.removeItem(atPath:) which was performing a dynamic cast to get the delegate
    • I added more type information to the FileManager internal functions and removed the dynamic cast to avoid any casting machinery here
    • I avoided an additional as? CocoaError dynamic cast when possible
    • I added a quick-check call to rmdir before calling FileManager.removeItem(atPath:) since rmdir is noticeably faster than removefile for known empty directories (the Linux implementation of removeItem(atPath:) already does this, so we can consider that for a future FM improvement for Darwin)
  2. When reading from a file that didn't exist, we spend a lot of time creating the error boxing items in AnyHashable boxes and appending to the dictionary
    • I reorganized the error creation logic to create the dictionary all in one place rather than copying/appending to it in recursive calls
    • I also looked into avoiding the creation of the dictionary until bridging time to only create the user info when required, however when throw-ing a CocoaError, the compiler emits immediate calls to bridge it to an NSError so this proved to be unnecessary
    • In the end, I chose to eagerly create an NSError on Darwin to avoid a lot of bridging overhead (bridging NSString error key constants to String to set in the dictionary just to bridge the dictionary back to ObjC, etc.) while maintaining the current status quo (where bridging isn't relevant) on non-Darwin.

I updated our benchmarks to be a bit more accurate and to try to test some of these things. I was unable to get a benchmark/profile to accurately show me results for issue 1, but for issue 2 the read-nonexistentFile improved by 33% (compared to main) with the error creation reorganization and improved by 77% with the full solution as proposed by creating the error in ObjC to avoid double-bridging costs (compared to main)

@jmschonfeld
Copy link
Contributor Author

@swift-ci please test

@itingliu
Copy link
Contributor

Windows failure looks legitimate. Can you take a look?

@jmschonfeld
Copy link
Contributor Author

@swift-ci please test

@jmschonfeld
Copy link
Contributor Author

Windows failure looks legitimate. Can you take a look?

@itingliu thanks for the catch, fixed!

@jmschonfeld jmschonfeld merged commit 6c0a3e8 into swiftlang:main Sep 24, 2024
3 checks passed
@jmschonfeld jmschonfeld deleted the data-perf branch September 24, 2024 20:17
cthielen pushed a commit to cthielen/swift-foundation that referenced this pull request Nov 8, 2024
* (135743158) Performance optimizations for data reading/writing

* Fix Windows build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants