Skip to content

Commit afcc40b

Browse files
authored
Update proposals (#248)
1 parent a0999f3 commit afcc40b

File tree

3 files changed

+102
-12
lines changed

3 files changed

+102
-12
lines changed
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
2+
# Regex Proposals
3+
4+
## Regex Type and Overview
5+
6+
- [Pitch](https://forums.swift.org/t/pitch-regex-type-and-overview/56029)
7+
- Proposal: To-be-scheduled
8+
9+
Presents basic Regex type and gives an overview of how everything fits into the overall story
10+
11+
12+
## Regex Builder DSL
13+
14+
- [Pitch thread](https://forums.swift.org/t/pitch-regex-builder-dsl/56007)
15+
16+
Covers the result builder approach and basic API.
17+
18+
19+
## Run-time Regex Construction
20+
21+
- Pitch thread: [Regex Syntax](https://forums.swift.org/t/pitch-regex-syntax/55711)
22+
+ Brief: Syntactic superset of PCRE2, Oniguruma, ICU, UTS\#18, etc.
23+
24+
Covers the "interior" syntax, extended syntaxes, run-time construction of a regex from a string, and details of `AnyRegexOutput`.
25+
26+
Note: The above pitch drills into the syntax, the revised pitch including two initializers and existential details is still under development.
27+
28+
## Regex Literals
29+
30+
- [Draft](https://github.com/apple/swift-experimental-string-processing/pull/187)
31+
- (Old) original pitch:
32+
+ [Thread](https://forums.swift.org/t/pitch-regular-expression-literals/52820)
33+
+ [Update](https://forums.swift.org/t/pitch-regular-expression-literals/52820/90)
34+
35+
36+
## String processing algorithms
37+
38+
- [Pitch thread](https://forums.swift.org/t/pitch-regex-powered-string-processing-algorithms/55969)
39+
40+
Proposes a slew of Regex-powered algorithms.
41+
42+
Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.
43+
44+
## Unicode for String Processing
45+
46+
- Draft: TBD
47+
- (Old) [Character class definitions](https://forums.swift.org/t/pitch-character-classes-for-string-processing/52920)
48+
49+
Covers three topics:
50+
51+
- Proposes literal and DSL API for library-defined character classes, Unicode scripts and properties, and custom character classes.
52+
- Proposes literal and DSL API for options that affect matching behavior.
53+
- Defines how Unicode scalar-based classes are extended to grapheme clusters in the different semantic and other matching modes.
54+
55+

Documentation/Evolution/RegexSyntax.md

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,27 +10,56 @@ Hello, we want to issue an update to [Regular Expression Literals](https://forum
1010

1111
A regex declares a string processing algorithm using syntax familiar across a variety of languages and tools throughout programming history. We propose the ability to create a regex at run time from a string containing regex syntax (detailed here), API for accessing the match and captures, and a means to convert between an existential capture representation and concrete types.
1212

13-
The overall story is laid out in [Regex Type and Overview](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md) and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107).
13+
The overall story is laid out in [Regex Type and Overview][overview] and each individual component is tracked in [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107).
1414

1515
## Motivation
1616

1717
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
1818

19-
<!--
20-
... tools need run time construction
21-
... ns regular expression operates over a fundamentally different model and has limited syntactic and semantic support
22-
... we prpose a best-in-class treatment of familiar regex syntax
23-
-->
19+
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage.
20+
21+
```swift
22+
let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
23+
let nsRegEx = try! NSRegularExpression(pattern: pattern)
2424

25-
The full string processing effort includes a regex type with strongly typed captures, the ability to create a regex from a string at runtime, a compile-time literal, a result builder DSL, protocols for intermixing 3rd party industrial-strength parsers with regex declarations, and a slew of regex-powered algorithms over strings.
25+
func processEntry(_ line: String) -> Transaction? {
26+
let range = NSRange(line.startIndex..<line.endIndex, in: line)
27+
guard let result = nsRegEx.firstMatch(in: line, range: range),
28+
let kindRange = Range(result.range(at: 1), in: line),
29+
let kind = Transaction.Kind(line[kindRange]),
30+
let dateRange = Range(result.range(at: 2), in: line),
31+
let date = try? Date(String(line[dateRange]), strategy: dateParser),
32+
let accountRange = Range(result.range(at: 3), in: line),
33+
let amountRange = Range(result.range(at: 4), in: line),
34+
let amount = try? Decimal(
35+
String(line[amountRange]), format: decimalParser)
36+
else {
37+
return nil
38+
}
39+
40+
return Transaction(
41+
kind: kind, date: date, account: String(line[accountRange]), amount: amount)
42+
}
43+
```
44+
45+
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic mismatch between ICU and Swift's `String` is discussed in [Unicode for String Processing][pitches].
46+
47+
Run-time construction is important for tools and editors. For example, SwiftPM allows the user to provide a regular expression to filter tests via `swift test --filter`.
2648

27-
This proposal specifically hones in on the _familiarity_ aspect by providing a best-in-class treatment of familiar regex syntax.
2849

2950
## Proposed Solution
3051

31-
<!--
32-
... regex compiling and existential match type
33-
-->
52+
We propose run-time construction of `Regex` from a best-in-class treatment of familiar regular expression syntax. A `Regex` is generic over its `Output`, which includes capture information. This may be an existential `AnyRegexOutput`, or a concrete type provided by the user.
53+
54+
```swift
55+
let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
56+
let regex = try! Regex(compiling: pattern)
57+
// regex: Regex<AnyRegexOutput>
58+
59+
let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> =
60+
try! Regex(compiling: pattern)
61+
```
62+
3463

3564
### Syntax
3665

@@ -866,3 +895,9 @@ This proposal regards _syntactic_ support, and does not necessarily mean that ev
866895
[unicode-scripts]: https://www.unicode.org/reports/tr24/#Script
867896
[unicode-script-extensions]: https://www.unicode.org/reports/tr24/#Script_Extensions
868897
[balancing-groups]: https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#balancing-group-definitions
898+
[overview]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/RegexTypeOverview.md
899+
[pitches]: https://github.com/apple/swift-experimental-string-processing/issues/107
900+
901+
902+
903+

Documentation/Evolution/RegexTypeOverview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ We propose addressing this basic shortcoming through an effort we are calling re
1414
3. A literal for compile-time construction of a regex with statically-typed captures, enabling powerful source tools.
1515
4. An expressive and composable result-builder DSL, with support for capturing strongly-typed values.
1616
5. A modern treatment of Unicode semantics and string processing.
17-
6. A treasure trove of string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components.
17+
6. A slew of regex-powered string processing algorithms, along with library-extensible protocols enabling industrial-strength parsers to be used seamlessly as regex components.
1818

1919
This proposal provides details on \#1, the `Regex` type and captures, and gives an overview of how each of the other proposals fit into regex in Swift.
2020

0 commit comments

Comments
 (0)