-
Notifications
You must be signed in to change notification settings - Fork 49
Add regex-specific Matches and Ranges collections #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex-specific Matches and Ranges collections #460
Conversation
This prepares for adopting an opaque result type for matches(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.
These can be a higher-order operation that replace each range from a given collection instead of doing the searching directly.
@swift-ci Please test |
@swift-ci Please test |
Adds a custom iterator to avoid the heavy indexing cost in many cases
150af35
to
c7dd144
Compare
@swift-ci Please test |
// its `startIndex`, the iterator begins with this current match populated. | ||
// For subsequent calls to `next()`, this value is `nil`, and `nextStart` | ||
// is used to search for the next match. | ||
var currentMatch: Regex<Output>.Match? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we double store it? Is this just a Bool?
|
||
mutating func next() -> Regex<Output>.Match? { | ||
// Initial case with pre-computed first match | ||
if let match = currentMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we never set currentMatch
, then it's just a bool telling you whether to grab the one from the underlying collection vs computing one yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice 👍🏻
@swift-ci Please test |
// iteration. For subsequent calls to `next()`, this value is `false`, and | ||
// `nextStart` is used to search for the next match. | ||
var initialIteration = true | ||
var nextStart: String.Index? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between initialIterator == true
and nextStart == nil
? Do we have 4 cases, 3 cases, or 2? What's the difference between nextStart
is nil vs endIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three cases:
initialIteration == true
: We use the match found (or not) that's stored inbase.startIndex
nextStart != nil
: We search for the next match starting atnextStart
nextStart == nil
: End of iteration, no need to search
nextStart == endIndex
is a valid search-starting index, just like any other, so it can't represent the "finished iterating" state.
return input.index(after: match.range.upperBound) | ||
case .unicodeScalar: | ||
return input.unicodeScalars.index(after: match.range.upperBound) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this code in a few places. I wonder if it should be an underscored helper method somewhere that takes the semantic level as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll add it as part of #435.
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after you fix up the try!
.
@swift-ci Please test |
This moves a call to `prefix(_:)` to before a for-in loop to work around the issue described in swiftlang/swift#59522.
@swift-ci Please test |
@swift-ci Please test |
@swift-ci Please test |
This prepares for adopting an opaque result type for matches(of:) and ranges(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.
This prepares for adopting an opaque result type for
matches(of:)
. The old,CollectionConsumer
-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.The lazy range collection had the same issue, so this includes a wrapper for the new
RegexMatchesCollection
that just returns ranges, and modifies thereplace
/replacing
methods to operate over a collection of ranges instead of a collection searcher.