Skip to content

Add regex-specific Matches and Ranges collections #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 17, 2022

Conversation

natecook1000
Copy link
Member

@natecook1000 natecook1000 commented May 30, 2022

This prepares for adopting an opaque result type for matches(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.

The lazy range collection had the same issue, so this includes a wrapper for the new RegexMatchesCollection that just returns ranges, and modifies the replace/replacing methods to operate over a collection of ranges instead of a collection searcher.

This prepares for adopting an opaque result type for matches(of:).
The old, CollectionConsumer-based model moves index-by-index, and
isn't aware of the regex's semantic level, which results in inaccurate
results for regexes that match at a mid-character index.
These can be a higher-order operation that replace each range from
a given collection instead of doing the searching directly.
@natecook1000 natecook1000 changed the title Add a regex-match-specific collection Add regex-specific Matches and Ranges collections May 30, 2022
@natecook1000 natecook1000 marked this pull request as ready for review May 30, 2022 19:08
@natecook1000 natecook1000 requested review from milseman and itingliu May 30, 2022 19:08
@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000 natecook1000 force-pushed the regex_match_collection branch from 150af35 to c7dd144 Compare June 1, 2022 16:11
@natecook1000
Copy link
Member Author

@swift-ci Please test

// its `startIndex`, the iterator begins with this current match populated.
// For subsequent calls to `next()`, this value is `nil`, and `nextStart`
// is used to search for the next match.
var currentMatch: Regex<Output>.Match?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we double store it? Is this just a Bool?


mutating func next() -> Regex<Output>.Match? {
// Initial case with pre-computed first match
if let match = currentMatch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we never set currentMatch, then it's just a bool telling you whether to grab the one from the underlying collection vs computing one yourself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice 👍🏻

@natecook1000
Copy link
Member Author

@swift-ci Please test

// iteration. For subsequent calls to `next()`, this value is `false`, and
// `nextStart` is used to search for the next match.
var initialIteration = true
var nextStart: String.Index?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between initialIterator == true and nextStart == nil? Do we have 4 cases, 3 cases, or 2? What's the difference between nextStart is nil vs endIndex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three cases:

  1. initialIteration == true: We use the match found (or not) that's stored in base.startIndex
  2. nextStart != nil: We search for the next match starting at nextStart
  3. nextStart == nil: End of iteration, no need to search

nextStart == endIndex is a valid search-starting index, just like any other, so it can't represent the "finished iterating" state.

return input.index(after: match.range.upperBound)
case .unicodeScalar:
return input.unicodeScalars.index(after: match.range.upperBound)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this code in a few places. I wonder if it should be an underscored helper method somewhere that takes the semantic level as a parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll add it as part of #435.

@natecook1000
Copy link
Member Author

@swift-ci Please test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after you fix up the try!.

@natecook1000
Copy link
Member Author

@swift-ci Please test

This moves a call to `prefix(_:)` to before a for-in loop to work around
the issue described in swiftlang/swift#59522.
@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000 natecook1000 merged commit 26a2dc9 into swiftlang:main Jun 17, 2022
@natecook1000 natecook1000 deleted the regex_match_collection branch June 17, 2022 13:44
milseman pushed a commit to milseman/swift-experimental-string-processing that referenced this pull request Jun 30, 2022
This prepares for adopting an opaque result type for matches(of:)
and ranges(of:). The old, CollectionConsumer-based model moves 
index-by-index, and isn't aware of the regex's semantic level, 
which results in inaccurate results for regexes that match at a 
mid-character index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants