-
Notifications
You must be signed in to change notification settings - Fork 49
[5.7] Merge benchmarker improvements and character class bitset optimization #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
milseman
merged 12 commits into
swiftlang:swift/release/5.7
from
rctcwyvrn:5_7_perf_and_benchmarks
Jul 1, 2022
Merged
[5.7] Merge benchmarker improvements and character class bitset optimization #532
milseman
merged 12 commits into
swiftlang:swift/release/5.7
from
rctcwyvrn:5_7_perf_and_benchmarks
Jul 1, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
^ and $ should match the start and end of the callee, even if that callee is a substring. Right now ^ and $ match the start and end of the callee's base string, instead. In addition, ^ and $ should only match the start and end of the callee when replacing a subrange, not the start and end of the subrange.
This prepares for adopting an opaque result type for matches(of:) and ranges(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.
* Avoid double execution by avoiding Array init * De-genericize processor, engine, etc. Provides only modest performance improvements (it was already getting specialized), but makes it possible to add String-specific specializations.
* Allow CustomConsuming types to match w/ zero width We previously asserted if a custom consuming type matches with zero width, but that isn't necessary or good. A custom type can implement a lookaround assertion or act as a tracer. * Rename Processor.advance(to:) to resume(at:) Since the given index doesn’t need to advance, this name is less misleading.
This separates the two different ideas for boundaries in the base input: - subjectBounds: These represent the actual subject in the input string. For a `String` callee, this will cover the entire bounds, while for a `Substring` these will represent the bounds of the substring in the base. - searchBounds: These represent the current search range in the subject. These bounds can be the same as `subjectBounds` or a subrange when searching for subsequent matches or replacing only in a subrange of a string. * firstMatch shouldn't update searchBounds on iteration When we move forward while searching for the first match, the search bounds should stay the same. Only the currentPosition needs to move forward. This will allow us to implement the \G start of match anchor, with which /\Gab/ matches "abab" twice, compared with /^ab/, which only matches once. * Make matches(of:) and ranges(of:) boundary-aware With this change, RegexMatchesCollection keeps the subject bounds and search bounds separately, modifying the search bounds with each iteration. In addition, the replace methods that only operate on a subrange can specify that specifically, getting the correct anchor behavior while only matching within a portion of a string.
* [benchmark] Add no-capture version of grapheme breaking exercise * [benchmark] Add cross-engine benchmark helpers * [benchmark] Hangul Syllable finding benchmark
- Adds benchmarks for html and email regexes - Adds support to save and compare benchmarker runs Co-authored-by: Michael Ilseman <[email protected]>
- Space out the names properly instead of relying on tabs - Add a decimal point to the percentage - Filter out NS benchmarks from the comparison - Sort comparisons by amount of improvement/regression (by s, not % beceause we have lots of variance + low runtime benchmarks)
…acter classes in unicode scalars mode (swiftlang#511) - Add AsciiBitset as an conditional optimization for custom character classes that only contain ascii characters - Adds CompileOptions to turn off optimizations - Adds basic testing infrastructure for testing if compilation emitted certain instructions and if the optimized regex returned the same result as the unoptimized Co-authored-by: Michael Ilseman <[email protected]>
Testing the combination of this on top of #531 in swiftlang/swift#59817 |
stephentyrone
approved these changes
Jul 1, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry pick of benchmarker improvements #501 #509 #512 as well as a performance improvement in #511
Based off of #531