Skip to content

[Integration] main (005e0fb) -> swift/main #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 17, 2022

Conversation

hamishknight
Copy link
Contributor

No description provided.

hamishknight and others added 30 commits May 26, 2022 17:49
PCRE and ICU both support quoted sequences that
don't have a terminating `\E`. Update the parsing
to allow this.

Additionally, allow empty quoted sequences outside
of custom character classes, which is consistent
with ICU.

Finally, don't allow quoted sequences to span
multiple lines in extended syntax literals.
rdar://92459215 has been fixed.
Use inits instead of as methods

Add ARO tests
This change makes `Regex`, `RegexComponent`, and its component types `Sendable`.

Regex stores a `Program` instance, which lazily lowers the DSLTree into a 
compiled program. Without synchronization, this lazy compilation is unsafe 
under concurrency. This change uses atomic initialization for the compiled 
program.
Obtain match output elements without materializing the output.
- Add a test where the capture transform produecs a `Substring` from a `Substring`.
- Add a test where the capture transform wraps a `Substring` in an `Optional`.
Disable Prototypes to work around a CI failure
…custom types

* Track the whole match as an element of the "capture list" in the matching engine. Do so by emitting code as an implicit `capture` around the root node.
* No longer handle `matcher` as a special case within `capture` lowering, because the matcher can be arbitrarily nested within "output-forwarding" nodes, such as a `changeMatchingOptions` non-capturing group. Instead, make the bytecode emitter carry a result value so that a custom output can be propagated through any forwarding nodes.
  ```swift
  Regex {
    Capture(
      SemanticVersionParser()
        .ignoringCase()
        .matchingSemantics(.unicodeScalar)
    ) // This would not work previously.
  }
  ```
* Collapse DSLTree node `transform` into `capture`, because a transform can never be standalone (without a `capture` parent). This greatly simplifies `capture` lowering.
* Make the bytecode's capture transform use type `(Input, _StoredCapture) -> Any` so that it can transform any whole match, not just `Substring`. This means you can now transform any captured value, including a custom-consuming regex component's result!
  ```swift
   Regex {
    "version:"
    OneOrMore(.whitespace)
    Capture {
      SemanticVersionParser() // Regex<SemanticVersion>
    } transform: {
      // (SemanticVersion) -> SomethingElse
    }
  }
  ```
  The transforms of `Capture` and `TryCapture` are now generalized from taking `Substring` to taking generic parameter `W` (the whole match).
* Fix an issue where initial options were applied based solely on whether the bytecode had any instructions, failing examples such as `((?i:.))`. It now checks whether the first matchable atom has been emitted.
Fully generalize "whole match" in the engine and enable transforming custom types
This change preserves the lazy atomic initialization, so using
Regex will still be thread-safe by default, even without the
annotation.
Add additional capture transform tests.
`buildEither` was removed from the regex builder DSL proposal. See swiftlang/swift-evolution#1634.
Parse, but diagnose in Sema
top level code is real weird, let's not talk about it
^ and $ should match the start and end of the callee, even if that
callee is a substring. Right now ^ and $ match the start and end of
the callee's base string, instead. In addition, ^ and $ should only
match the start and end of the callee when replacing a subrange, not
the start and end of the subrange.
This is accepted by PCRE, and forms an empty
option change sequence. We probably ought to warn
on it though as it's a no-op.
hamishknight and others added 4 commits June 16, 2022 17:33
This was caused by the fact that we'd walk into
`expectUnicodeScalar` if we saw `\o`, but we only
want to parse `\o{`. Instead, change it to be a `lex..`
method, and bail if we don't lex a scalar.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight merged commit 6e65ecb into swiftlang:swift/main Jun 17, 2022
@hamishknight hamishknight deleted the main-merge branch June 17, 2022 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants