Skip to content

Update proposals #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions Documentation/Evolution/ProposalOverview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@

# Regex Proposals

## Regex Type and Overview

- [Pitch](https://forums.swift.org/t/pitch-regex-type-and-overview/56029)
- Proposal: To-be-scheduled

Presents basic Regex type and gives an overview of how everything fits into the overall story


## Regex Builder DSL

- [Pitch thread](https://forums.swift.org/t/pitch-regex-builder-dsl/56007)

Covers the result builder approach and basic API.


## Run-time Regex Construction

- Pitch thread: [Regex Syntax](https://forums.swift.org/t/pitch-regex-syntax/55711)
+ Brief: Syntactic superset of PCRE2, Oniguruma, ICU, UTS\#18, etc.

Covers the "interior" syntax, extended syntaxes, run-time construction of a regex from a string, and details of `AnyRegexOutput`.

Note: The above pitch drills into the syntax, the revised pitch including two initializers and existential details is still under development.

## Regex Literals

- [Draft](https://github.com/apple/swift-experimental-string-processing/pull/187)
- (Old) original pitch:
+ [Thread](https://forums.swift.org/t/pitch-regular-expression-literals/52820)
+ [Update](https://forums.swift.org/t/pitch-regular-expression-literals/52820/90)


## String processing algorithms

- [Pitch thread](https://forums.swift.org/t/pitch-regex-powered-string-processing-algorithms/55969)

Proposes a slew of Regex-powered algorithms.

Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.

## Unicode for String Processing

- Draft: TBD
- (Old) [Character class definitions](https://forums.swift.org/t/pitch-character-classes-for-string-processing/52920)

Covers three topics:

- Proposes literal and DSL API for library-defined character classes, Unicode scripts and properties, and custom character classes.
- Proposes literal and DSL API for options that affect matching behavior.
- Defines how Unicode scalar-based classes are extended to grapheme clusters in the different semantic and other matching modes.


57 changes: 46 additions & 11 deletions Documentation/Evolution/RegexSyntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,56 @@ Hello, we want to issue an update to [Regular Expression Literals](https://forum

A regex declares a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. We propose the ability to create a regex at run time from a string containing regex syntax (detailed here), API for accessing the match and captures, and a means to convert between an existential capture representation and concrete types.

The overall story is laid out in [Regex Type and Overview](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md) and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107).
The overall story is laid out in [Regex Type and Overview][overview] and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107).

## Motivation

Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.

<!--
... tools need run time construction
... ns regular expression operates over a fundamentally different model and has limited syntactic and semantic support
... we prpose a best-in-class treatment of familiar regex syntax
-->
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage.

```swift
let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
let nsRegEx = try! NSRegularExpression(pattern: pattern)

The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings.
func processEntry(_ line: String) -> Transaction? {
let range = NSRange(line.startIndex..<line.endIndex, in: line)
guard let result = nsRegEx.firstMatch(in: line, range: range),
let kindRange = Range(result.range(at: 1), in: line),
let kind = Transaction.Kind(line[kindRange]),
let dateRange = Range(result.range(at: 2), in: line),
let date = try? Date(String(line[dateRange]), strategy: dateParser),
let accountRange = Range(result.range(at: 3), in: line),
let amountRange = Range(result.range(at: 4), in: line),
let amount = try? Decimal(
String(line[amountRange]), format: decimalParser)
else {
return nil
}

return Transaction(
kind: kind, date: date, account: String(line[accountRange]), amount: amount)
}
```

Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic mismatch between ICU and Swift's `String` is discussed in [Unicode for String Processing][pitches].

Run-time construction is important for tools and editors. For example, SwiftPM allows the user to provide a regular expression to filter tests via `swift test --filter`.

This proposal specifically hones in on the _familiarity_ aspect by providing a best-in-class treatment of familiar regex syntax.

## Proposed Solution

<!--
... regex compiling and existential match type
-->
We propose run-time construction of `Regex` from a best-in-class treatment of familiar regular expression syntax. A `Regex` is generic over its `Output`, which includes capture information. This may be an existential `AnyRegexOutput`, or a concrete type provided by the user.

```swift
let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
let regex = try! Regex(compiling: pattern)
// regex: Regex<AnyRegexOutput>

let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> =
try! Regex(compiling: pattern)
```


### Syntax

Expand Down Expand Up @@ -866,3 +895,9 @@ This proposal regards _syntactic_ support, and does not necessarily mean that ev
[unicode-scripts]: https://www.unicode.org/reports/tr24/#Script
[unicode-script-extensions]: https://www.unicode.org/reports/tr24/#Script_Extensions
[balancing-groups]: https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#balancing-group-definitions
[overview]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md
[pitches]: https://github.com/apple/swift-experimental-string-processing/issues/107




2 changes: 1 addition & 1 deletion Documentation/Evolution/RegexTypeOverview.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ We propose addressing this basic shortcoming through an effort we are calling re
3. A literal for compile-time construction of a regex with statically-typed captures, enabling powerful source tools.
4. An expressive and composable result-builder DSL, with support for capturing strongly-typed values.
5. A modern treatment of Unicode semantics and string processing.
6. A treasure trove of string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components.
6. A slew of regex-powered string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components.

This proposal provides details on \#1, the `Regex` type and captures, and gives an overview of how each of the other proposals fit into regex in Swift.

Expand Down