Skip to content

Sync 5.7 branch with main #409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 188 commits into from
May 15, 2022

Conversation

natecook1000
Copy link
Member

@natecook1000 natecook1000 commented May 12, 2022

Catches our 5.7 branch up with recent fixes and addresses rdar://92876793.

hamishknight and others added 30 commits March 18, 2022 17:57
A quick pass to flip `/.../` out of the alternatives
and into the main syntax. Still needs a bunch of
work.

Also add some commentary on a regex with `]` as the
starting character.
…n. Split out Proposed solution from Detailed design. Parallelize the structure a bit better.
This isn't actually used, as we convert to a DSL
custom character class, and then use that consumer
logic.
Convert AST escape sequences that represent a
scalar value (e.g `\f`, `n`, `\a`) into scalars in
the DSL tree. This allows the matching engine to
match against them.
natecook1000 and others added 15 commits May 9, 2022 18:16
Removing this deprecation warning, as it's just generating noise.
We may still eventually want to remove _CharacterClassModel.matchLevel
along with other refactoring in the future.
I wasn't aware of this Unicode property when
initially implementing this. It's a more restricted
set of whitespace that Unicode reccommends for
parsing patterns. It's the same set of whitespace
used for extended syntax.

UAX44-LM3 itself doesn't appear to specify the
exact set of whitespace to match against, but this
is no more restrictive than the engines I'm aware
of.
This allows us to store the source location of the
inner scalar value.
Allow a whitespace-separated list of scalars within
the `\u{...}` syntax. This is syntactic sugar that
gets implicitly splatted out, for example `\u{A B C}`
becomes `\u{A}\u{B}\u{C}`.
`curIdx` is an index of `astChildren`, not
`children`.
The `predicate` may independently advance the
location before bailing, and we don't want that
to affect the recorded location of the result. We
probably ought to replace `lexUntil` with a better
API.
…tlang#392)

- Explicitly ask the compiler not to implicitly import _StringProessing. This is to avoid a circular dependency when `-enable-experimental-string-processing` is enabled by default.
- Unify the build flags for modules that are built in the compiler repo into a `stdlibSettings` value.
- Disable implicit _Concurrency import as well since it is how it's built in the compiler repo. This helps us catch errors before we integrate with the compiler repo.
- Remove `-enable-experimental-pairwise-build-block` since SE-0348 has been implemented and enabled.
- Update the minimum toolchain requirement to 2022-04-20.
This adds start/end anchors ^ and $, groups that form zero-width
assertions, and option-changing groups without content `(?i)...`
_RegexParser does not need resilience as it's only ever going to be used by _StringProcessing and RegexBuilder.
One is a lightweight component that allows the use of the leading dot syntax to reference `RegexComponent` static members such as character classes as a non-first expression in a regex builder block.

---

Before:

```swift
Regex {
    .digit // works today but brittle; inserting anything above this line will break this

    OneOrMore(.whitespace)

    .word // ❌ error: 'OneOrMore' has no member named 'word' (because this is parsed as a member reference on the preceeding expression)
}
```

After:

```swift
Regex {
    One(.digit)              // recommended even though `.digit` works today
    OneOrMore(.whitespace)
    One(.word)
} // ✅
```

In a follow-up patch, we will propose adding an additional protocol inheriting from `RegexComponent` that will ban the use of the leading dot syntax even on the first line of `Regex { ... }`, as this will enforce the recommended style (use of `One`), and prevent surprises when the user inserts a pattern above the leading dot line.
PCRE, Oniguruma, and ICU allow `]` to appear as
the first member of a custom character class, and
treat it as literal, due to empty character classes
being invalid.

However this behavior isn't particularly intuitive,
and makes lexing heuristics harder to implement
properly. Instead, reject such character classes
as being empty, and require escaping if `]` is
meant as the first character.
@natecook1000 natecook1000 marked this pull request as draft May 12, 2022 18:14
@natecook1000 natecook1000 marked this pull request as ready for review May 12, 2022 18:54
@natecook1000
Copy link
Member Author

@swift-ci Please test

@natecook1000 natecook1000 changed the title [DO NOT MERGE] Branch comparison Sync 5.7 branch with main May 12, 2022
@natecook1000 natecook1000 merged commit a0f2a44 into swiftlang:swift/release/5.7 May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants