Skip to content

[Integration] main (e87149a) -> swift/main #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jun 29, 2022

Conversation

hamishknight
Copy link
Contributor

No description provided.

hamishknight and others added 30 commits June 16, 2022 18:27
This shouldn't include e.g `namedCapturesOnly`.
This should always be set in a multi-line literal,
with extended syntax potentially being set and
unset as we parse.
Relax the ban on unsetting extended syntax in a
multi-line literal such that it does not apply to
a scoped unset e.g `(?-x:...)`, as long as it does
not span multiple lines.

This commit also bans the use of `(?^)` in a
multi-line literal, unless it is scoped and does
not span multiple lines. Instead, `(?^x)` must be
written, as PCRE defines `(?^)` to be equivalent to
`(?-imnsx)`.
* Add a `clearThrough` instruction

This will let us fix lookahead assertions that have leftover save
points in the subpattern on success, and also allow us to implement
atomic groups.

* Fix lookaheads with quantifiers

On success, the subpatterns in lookaheads like (?=.*e) had a save
point that persisted, causing the logic in the lookahead group to
be invalid.

* Implement atomic non-capturing group support

In addition to the (?>...) syntax, this is what's underneath `Local`.
* Allow CustomConsuming types to match w/ zero width

We previously asserted if a custom consuming type matches with zero
width, but that isn't necessary or good. A custom type can implement
a lookaround assertion or act as a tracer.

* Rename Processor.advance(to:) to resume(at:)

Since the given index doesn’t need to advance, this name is less
misleading.
This prepares for adopting an opaque result type for matches(of:)
and ranges(of:). The old, CollectionConsumer-based model moves 
index-by-index, and isn't aware of the regex's semantic level, 
which results in inaccurate results for regexes that match at a 
mid-character index.
20x perf speedup in the "BasicBacktrack" benchmarks.
* Re-use the same executor, remember semantic mode.

Gives around a 20% perf improvement to first-match style benchmarks.

* Remove history preservation

Cuts down on memory usage and avoids some ARC overhead. ~20% gains
on "AllMatches" and related benchmarks.

* Lower-level matchSeq

Avoid collection algorithms inside matchSeq, which are liable to add ARC and inefficiencies. Results in a 3x improvement to ReluctantQuantWithTerminal.
Gives a 7x improvement to firstMatch-style benchmarks like "FirstMatch", 2-3x to CSS and basic backtracking benchmarks.

Thanks to @rctcwyvrn for the original code.
Currently, unary regex component builder simply forwards the component's base type. However, this is inconsistent with non-unary builder results. The current behavior may lead to surprising results when the user marks a property with `@RegexComponentBuilder`.

This patch makes `RegexComponentBuilder.buildPartialBlock<R>(first: R)` return a `Regex<R.RegexOutput>` rather than `R` itself.

---

Before:

```swift
// error: cannot convert value of type 'OneOrMore<Substring>' to specified type 'Regex<Substring>'
@RegexComponentBuilder
var r: Regex<Substring> {
  OneOrMore("a")
  // Adding other components below will make the error go away.
}

struct MyCustomRegex: RegexComponent {
  // error: cannot convert value of type 'OneOrMore<Substring>' to specified type 'Regex<Substring>'
  var regex: Regex<Substring> {
    OneOrMore("a")
  }
}
```

After: No errors.
Make unary builder return `Regex` type consistently
* [benchmark] Add no-capture version of grapheme breaking exercise

* [benchmark] Add cross-engine benchmark helpers

* [benchmark] Hangul Syllable finding benchmark
* Avoid double execution by avoiding Array init

* De-genericize processor, engine, etc.

Provides only modest performance improvements (it was already getting
specialized), but makes it possible to add String-specific specializations.
* Add debug mode

* Fix typo in css regex

* Add HTML benchmark

* Add email regex benchmarks

* Add save/compare functionality to the benchmarker

* Clean up compare and add cli flags
oops

Repeat does not get to participate in inline

fix tests
Handle atoms as things to be wrapped in One
[Printer] Unconditionally print a regex block for concatenations
This separates the two different ideas for boundaries in
the base input:

- subjectBounds: These represent the actual subject in the input
  string. For a `String` callee, this will cover the entire bounds,
  while for a `Substring` these will represent the bounds of the
  substring in the base.
- searchBounds: These represent the current search range in the
  subject. These bounds can be the same as `subjectBounds` or a
  subrange when searching for subsequent matches or replacing only
  in a subrange of a string.

* firstMatch shouldn't update searchBounds on iteration

When we move forward while searching for the first match, the search
bounds should stay the same. Only the currentPosition needs to move
forward. This will allow us to implement the \G start of match anchor,
with which /\Gab/ matches "abab" twice, compared with /^ab/, which
only matches once.

* Make matches(of:) and ranges(of:) boundary-aware

With this change, RegexMatchesCollection keeps the subject bounds
and search bounds separately, modifying the search bounds with each
iteration. In addition, the replace methods that only operate on a
subrange can specify that specifically, getting the correct anchor
behavior while only matching within a portion of a string.
* [benchmark] Add no-capture version of grapheme breaking exercise

* [benchmark] Add cross-engine benchmark helpers

* [benchmark] Hangul Syllable finding benchmark

* Add debug mode

* Fix typo in css regex

* Add HTML benchmark

* Add email regex benchmarks

* Add save/compare functionality to the benchmarker

* Clean up compare and add cli flags

* Make fixes

* oops, remove some leftover code

* Fix linux build issue + add cli option for specifying compare file

* Add benchmarks

Co-authored-by: Michael Ilseman <[email protected]>
- Space out the names properly instead of relying on tabs
- Add a decimal point to the percentage
- Filter out NS benchmarks from the comparison
- Sort comparisons by amount of improvement/regression
  (by s, not % beceause we have lots of variance + low runtime benchmarks)
We can do the semantic members check up-front.
Tighten up validation of character class range
operands such that we reject quotes and custom
character classes. This includes rejecting syntax
that would be a subtraction in .NET. We throw a
custom error that suggests using `--` instead.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight merged commit cff565f into swiftlang:swift/main Jun 29, 2022
@hamishknight hamishknight deleted the main-merge branch June 29, 2022 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants