-
Notifications
You must be signed in to change notification settings - Fork 49
Add a string-specific search algorithm #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a Boyer-Moore substring search algorithm, and updates the `firstRange(of:)` and `ranges(of:)` methods to use that when both pieces of the search are strings/substrings. Still need to look at availability and switch the "replacing" methods to use this new search algorithm.
85245ff
to
ded493c
Compare
For large strings, a recursive search can run out of stack space. This eliminates the issue by looping within the `nextRange` function.
@swift-ci Please test |
@swift-ci Please test |
@swift-ci Please test |
Benchmark metrics of the improvements:
|
Updated the algorithm to skip calculating the bad-character offset table when the pattern is very short, since the benefit of skipping is reduced. Updated benchmarks:
|
@swift-ci Please test |
1 similar comment
@swift-ci Please test |
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff :)
@@ -135,7 +135,7 @@ extension Collection where Element: Equatable { | |||
) -> RangesCollection<ZSearcher<Self>> where C.Element == Element { | |||
_ranges(of: ZSearcher(pattern: Array(other), by: ==)) | |||
} | |||
|
|||
// FIXME: Return `some Collection<Range<Index>>` for SE-0346 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to comment about the extra Array allocation then saw there was already a fixme about it
with: replacement, | ||
maxReplacements: maxReplacements) | ||
switch (self, other, replacement) { | ||
case (let str as String, let other as String, let repl as String): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit painful but I don't see a better way to do it
* Add a string-specific search algorithm This adds a Boyer-Moore substring search algorithm, and updates the `firstRange(of:)` and `ranges(of:)` methods to use that when both pieces of the search are strings/substrings. * Substring search: iterative rather than recursive For large strings, a recursive search can run out of stack space. This eliminates the issue by looping within the `nextRange` function. * Dispatch string splitting to new searcher * Remove generic on SubstringSearcher * Remove unnecessary inlining annotations * Update string algorithms tests * Verify string/substring dispatch in algorithms * Add tests for string.replacing maxReplacements * Add fallback to naive search for small patterns * Improve some comments/formatting in the string search
* Add a string-specific search algorithm This adds a Boyer-Moore substring search algorithm, and updates the `firstRange(of:)` and `ranges(of:)` methods to use that when both pieces of the search are strings/substrings. * Substring search: iterative rather than recursive For large strings, a recursive search can run out of stack space. This eliminates the issue by looping within the `nextRange` function. * Dispatch string splitting to new searcher * Remove generic on SubstringSearcher * Remove unnecessary inlining annotations * Update string algorithms tests * Verify string/substring dispatch in algorithms * Add tests for string.replacing maxReplacements * Add fallback to naive search for small patterns * Improve some comments/formatting in the string search
This adds a Boyer-Moore substring search algorithm, and updates the
firstRange(of:)
,ranges(of:)
,split(...)
, andreplacing
methods to use that when both pieces of the search are strings/substrings.