[Integration] main (e87149a) -> swift/main #521

hamishknight · 2022-06-29T10:37:42Z

No description provided.

This shouldn't include e.g `namedCapturesOnly`.

This should always be set in a multi-line literal, with extended syntax potentially being set and unset as we parse.

Relax the ban on unsetting extended syntax in a multi-line literal such that it does not apply to a scoped unset e.g `(?-x:...)`, as long as it does not span multiple lines. This commit also bans the use of `(?^)` in a multi-line literal, unless it is scoped and does not span multiple lines. Instead, `(?^x)` must be written, as PCRE defines `(?^)` to be equivalent to `(?-imnsx)`.

* Add a `clearThrough` instruction This will let us fix lookahead assertions that have leftover save points in the subpattern on success, and also allow us to implement atomic groups. * Fix lookaheads with quantifiers On success, the subpatterns in lookaheads like (?=.*e) had a save point that persisted, causing the logic in the lookahead group to be invalid. * Implement atomic non-capturing group support In addition to the (?>...) syntax, this is what's underneath `Local`.

* Allow CustomConsuming types to match w/ zero width We previously asserted if a custom consuming type matches with zero width, but that isn't necessary or good. A custom type can implement a lookaround assertion or act as a tracer. * Rename Processor.advance(to:) to resume(at:) Since the given index doesn’t need to advance, this name is less misleading.

This prepares for adopting an opaque result type for matches(of:) and ranges(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.

20x perf speedup in the "BasicBacktrack" benchmarks.

* Re-use the same executor, remember semantic mode. Gives around a 20% perf improvement to first-match style benchmarks. * Remove history preservation Cuts down on memory usage and avoids some ARC overhead. ~20% gains on "AllMatches" and related benchmarks. * Lower-level matchSeq Avoid collection algorithms inside matchSeq, which are liable to add ARC and inefficiencies. Results in a 3x improvement to ReluctantQuantWithTerminal.

@rctcwyvrn

Gives a 7x improvement to firstMatch-style benchmarks like "FirstMatch", 2-3x to CSS and basic backtracking benchmarks. Thanks to @rctcwyvrn for the original code.

Currently, unary regex component builder simply forwards the component's base type. However, this is inconsistent with non-unary builder results. The current behavior may lead to surprising results when the user marks a property with `@RegexComponentBuilder`. This patch makes `RegexComponentBuilder.buildPartialBlock<R>(first: R)` return a `Regex<R.RegexOutput>` rather than `R` itself. --- Before: ```swift // error: cannot convert value of type 'OneOrMore<Substring>' to specified type 'Regex<Substring>' @RegexComponentBuilder var r: Regex<Substring> { OneOrMore("a") // Adding other components below will make the error go away. } struct MyCustomRegex: RegexComponent { // error: cannot convert value of type 'OneOrMore<Substring>' to specified type 'Regex<Substring>' var regex: Regex<Substring> { OneOrMore("a") } } ``` After: No errors.

Make unary builder return `Regex` type consistently

* [benchmark] Add no-capture version of grapheme breaking exercise * [benchmark] Add cross-engine benchmark helpers * [benchmark] Hangul Syllable finding benchmark

* Avoid double execution by avoiding Array init * De-genericize processor, engine, etc. Provides only modest performance improvements (it was already getting specialized), but makes it possible to add String-specific specializations.

* Add debug mode * Fix typo in css regex * Add HTML benchmark * Add email regex benchmarks * Add save/compare functionality to the benchmarker * Clean up compare and add cli flags

)" (swiftlang#507) This reverts commit e0af639.

oops Repeat does not get to participate in inline fix tests

Handle atoms as things to be wrapped in One

fix test

[Printer] Unconditionally print a regex block for concatenations

This separates the two different ideas for boundaries in the base input: - subjectBounds: These represent the actual subject in the input string. For a `String` callee, this will cover the entire bounds, while for a `Substring` these will represent the bounds of the substring in the base. - searchBounds: These represent the current search range in the subject. These bounds can be the same as `subjectBounds` or a subrange when searching for subsequent matches or replacing only in a subrange of a string. * firstMatch shouldn't update searchBounds on iteration When we move forward while searching for the first match, the search bounds should stay the same. Only the currentPosition needs to move forward. This will allow us to implement the \G start of match anchor, with which /\Gab/ matches "abab" twice, compared with /^ab/, which only matches once. * Make matches(of:) and ranges(of:) boundary-aware With this change, RegexMatchesCollection keeps the subject bounds and search bounds separately, modifying the search bounds with each iteration. In addition, the replace methods that only operate on a subrange can specify that specifically, getting the correct anchor behavior while only matching within a portion of a string.

* [benchmark] Add no-capture version of grapheme breaking exercise * [benchmark] Add cross-engine benchmark helpers * [benchmark] Hangul Syllable finding benchmark * Add debug mode * Fix typo in css regex * Add HTML benchmark * Add email regex benchmarks * Add save/compare functionality to the benchmarker * Clean up compare and add cli flags * Make fixes * oops, remove some leftover code * Fix linux build issue + add cli option for specifying compare file * Add benchmarks Co-authored-by: Michael Ilseman <[email protected]>

- Space out the names properly instead of relying on tabs - Add a decimal point to the percentage - Filter out NS benchmarks from the comparison - Sort comparisons by amount of improvement/regression (by s, not % beceause we have lots of variance + low runtime benchmarks)

We can do the semantic members check up-front.

Tighten up validation of character class range operands such that we reject quotes and custom character classes. This includes rejecting syntax that would be a subtraction in .NET. We throw a custom error that suggests using `--` instead.

hamishknight · 2022-06-29T10:37:54Z

@swift-ci please test

hamishknight and others added 30 commits June 16, 2022 18:27

Add multi-line escaped newline test

0439e4f

Fix the definition of SyntaxOptions.experimental

76b5605

This shouldn't include e.g `namedCapturesOnly`.

Clarify the .multilineCompilerLiteral syntax option

31eb417

This should always be set in a multi-line literal, with extended syntax potentially being set and unset as we parse.

Merge pull request swiftlang#484 from hamishknight/limited-run-syntax

3076eba

Remove linear factor from Engine's consume (swiftlang#494)

b4f12bb

20x perf speedup in the "BasicBacktrack" benchmarks.

Share the same processor in firstMatch (swiftlang#497)

fcd0b59

Gives a 7x improvement to firstMatch-style benchmarks like "FirstMatch", 2-3x to CSS and basic backtracking benchmarks. Thanks to @rctcwyvrn for the original code.

Merge pull request swiftlang#503 from rxwei/unary-buildpartialblock

cc9efb9

Make unary builder return `Regex` type consistently

[benchmark] Simplify and add more benchmarks (swiftlang#501)

4cea05a

* [benchmark] Add no-capture version of grapheme breaking exercise * [benchmark] Add cross-engine benchmark helpers * [benchmark] Hangul Syllable finding benchmark

Add more benchmarks and benchmarker functionality (swiftlang#505)

e0af639

* Add debug mode * Fix typo in css regex * Add HTML benchmark * Add email regex benchmarks * Add save/compare functionality to the benchmarker * Clean up compare and add cli flags

Revert "Add more benchmarks and benchmarker functionality (swiftlang#505

bb558ea

)" (swiftlang#507) This reverts commit e0af639.

Handle atoms as things to be wrapped

06e6e02

oops Repeat does not get to participate in inline fix tests

Remove unused regsiters, opodes (swiftlang#506)

e6a4032

Merge pull request swiftlang#413 from Azoy/oops-i-did-it-agan

6803dbf

Handle atoms as things to be wrapped in One

Unconditionally print a regex block for concatenations

16a25f2

fix test

Merge pull request swiftlang#508 from Azoy/fix-concat-printing

61e979c

[Printer] Unconditionally print a regex block for concatenations

Small parseCustomCharacterClass cleanup

f2e7433

We can do the semantic members check up-front.

Factor out parsePotentialCCRange

aa7c37b

Merge pull request swiftlang#517 from hamishknight/closed-range

e87149a

Merge branch 'main' into main-merge

8dd3886

hamishknight mentioned this pull request Jun 29, 2022

[DNM] Null PR swiftlang/swift#58827

Draft

hamishknight merged commit cff565f into swiftlang:swift/main Jun 29, 2022

hamishknight deleted the main-merge branch June 29, 2022 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Integration] main (e87149a) -> swift/main #521

[Integration] main (e87149a) -> swift/main #521

Uh oh!

hamishknight commented Jun 29, 2022

Uh oh!

hamishknight commented Jun 29, 2022

Uh oh!

Uh oh!

[Integration] main (e87149a) -> swift/main #521

[Integration] main (e87149a) -> swift/main #521

Uh oh!

Conversation

hamishknight commented Jun 29, 2022

Uh oh!

hamishknight commented Jun 29, 2022

Uh oh!

Uh oh!