Skip to content

Commit 0c2ed87

Browse files
authored
Update regex syntax pitch (#258)
* Update regex syntax pitch * Rename file
1 parent 93abfcb commit 0c2ed87

File tree

2 files changed

+118
-8
lines changed

2 files changed

+118
-8
lines changed

Documentation/Evolution/RegexSyntax.md renamed to Documentation/Evolution/RegexSyntaxRunTimeConstruction.md

Lines changed: 89 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals.
33
-->
44

5-
# Run-time Regex Construction
5+
# Regex Syntax and Run-time Construction
66

77
- Authors: [Hamish Knight](https://github.com/hamishknight), [Michael Ilseman](https://github.com/milseman)
88

@@ -16,7 +16,7 @@ The overall story is laid out in [Regex Type and Overview][overview] and each in
1616

1717
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
1818

19-
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage.
19+
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage, such as the need to translate between `NSRange` and `Range`.
2020

2121
```swift
2222
let pattern = #"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
@@ -42,7 +42,7 @@ func processEntry(_ line: String) -> Transaction? {
4242
}
4343
```
4444

45-
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic mismatch between ICU and Swift's `String` is discussed in [Unicode for String Processing][pitches].
45+
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic differences between ICU's string model and Swift's `String` is discussed in [Unicode for String Processing][pitches].
4646

4747
Run-time construction is important for tools and editors. For example, SwiftPM allows the user to provide a regular expression to filter tests via `swift test --filter`.
4848

@@ -60,7 +60,6 @@ let regex: Regex<(Substring, Substring, Substring, Substring, Substring)> =
6060
try! Regex(compiling: pattern)
6161
```
6262

63-
6463
### Syntax
6564

6665
We propose accepting a syntactic "superset" of the following existing regular expression engines:
@@ -80,11 +79,87 @@ Regex syntax will be part of Swift's source-compatibility story as well as its b
8079

8180
## Detailed Design
8281

83-
<!--
84-
... init, dynamic match, conversion to static
85-
-->
82+
We propose initializers to declare and compile a regex from syntax. Upon failure, these initializers throw compilation errors, such as for syntax or type errors. API for retrieving error information is future work.
83+
84+
```swift
85+
extension Regex {
86+
/// Parse and compile `pattern`, resulting in a strongly-typed capture list.
87+
public init(compiling pattern: String, as: Output.Type = Output.self) throws
88+
}
89+
extension Regex where Output == AnyRegexOutput {
90+
/// Parse and compile `pattern`, resulting in an existentially-typed capture list.
91+
public init(compiling pattern: String) throws
92+
}
93+
```
94+
95+
We propose `AnyRegexOutput` for capture types not known at compilation time, alongside casting API to convert to a strongly-typed capture list.
96+
97+
```swift
98+
/// A type-erased regex output
99+
public struct AnyRegexOutput {
100+
/// Creates a type-erased regex output from an existing output.
101+
///
102+
/// Use this initializer to fit a regex with strongly typed captures into the
103+
/// use site of a dynamic regex, i.e. one that was created from a string.
104+
public init<Output>(_ match: Regex<Output>.Match)
105+
106+
/// Returns a typed output by converting the underlying value to the specified
107+
/// type.
108+
///
109+
/// - Parameter type: The expected output type.
110+
/// - Returns: The output, if the underlying value can be converted to the
111+
/// output type, or nil otherwise.
112+
public func `as`<Output>(_ type: Output.Type) -> Output?
113+
}
114+
extension AnyRegexOutput: RandomAccessCollection {
115+
public struct Element {
116+
/// The range over which a value was captured. `nil` for no-capture.
117+
public var range: Range<String.Index>?
118+
119+
/// The slice of the input over which a value was captured. `nil` for no-capture.
120+
public var substring: Substring?
121+
122+
/// The captured value. `nil` for no-capture.
123+
public var value: Any?
124+
}
125+
126+
// Trivial collection conformance requirements
86127

87-
We propose the following syntax for regex.
128+
public var startIndex: Int { get }
129+
130+
public var endIndex: Int { get }
131+
132+
public var count: Int { get }
133+
134+
public func index(after i: Int) -> Int
135+
136+
public func index(before i: Int) -> Int
137+
138+
public subscript(position: Int) -> Element
139+
}
140+
```
141+
142+
We propose adding an API to `Regex<AnyRegexOutput>.Match` to cast the output type to a concrete one. A regex match will lazily create a `Substring` on demand, so casting the match itself saves ARC traffic vs extracting and casting the output.
143+
144+
```swift
145+
extension Regex.Match where Output == AnyRegexOutput {
146+
/// Creates a type-erased regex match from an existing match.
147+
///
148+
/// Use this initializer to fit a regex match with strongly typed captures into the
149+
/// use site of a dynamic regex match, i.e. one that was created from a string.
150+
public init<Output>(_ match: Regex<Output>.Match)
151+
152+
/// Returns a typed match by converting the underlying values to the specified
153+
/// types.
154+
///
155+
/// - Parameter type: The expected output type.
156+
/// - Returns: A match generic over the output type if the underlying values can be converted to the
157+
/// output type. Returns `nil` otherwise.
158+
public func `as`<Output>(_ type: Output.Type) -> Regex<Output>.Match?
159+
}
160+
```
161+
162+
The rest of this proposal will be a detailed and exhaustive definition of our proposed regex syntax.
88163

89164
<details><summary>Grammar Notation</summary>
90165

@@ -856,6 +931,12 @@ We are deferring runtime support for callouts from regex literals as future work
856931

857932
## Alternatives Considered
858933

934+
### Failalbe inits
935+
936+
There are many ways for compilation to fail, from syntactic errors to unsupported features to type mismatches. In the general case, run-time compilation errors are not recoverable by a tool without modifying the user's input. Even then, the thrown errors contain valuable information as to why compilation failed. For example, swiftpm presents any errors directly to the user.
937+
938+
As proposed, the errors thrown will be the same errors presented to the Swift compiler, tracking fine-grained source locations with specific reasons why compilation failed. Defining a rich error API is future work, as these errors are rapidly evolving and it is too early to lock in the ABI.
939+
859940

860941
### Skip the syntax
861942

Sources/_StringProcessing/Regex/AnyRegexOutput.swift

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ extension Regex.Match where Output == AnyRegexOutput {
3737
}
3838
}
3939

40+
/// A type-erased regex output
4041
public struct AnyRegexOutput {
4142
let input: String
4243
fileprivate let _elements: [ElementRepresentation]
@@ -70,6 +71,7 @@ extension AnyRegexOutput {
7071

7172
/// Returns a typed output by converting the underlying value to the specified
7273
/// type.
74+
///
7375
/// - Parameter type: The expected output type.
7476
/// - Returns: The output, if the underlying value can be converted to the
7577
/// output type, or nil otherwise.
@@ -119,13 +121,20 @@ extension AnyRegexOutput: RandomAccessCollection {
119121
fileprivate let representation: ElementRepresentation
120122
let input: String
121123

124+
/// The range over which a value was captured. `nil` for no-capture.
122125
public var range: Range<String.Index>? {
123126
representation.bounds
124127
}
125128

129+
/// The slice of the input over which a value was captured. `nil` for no-capture.
126130
public var substring: Substring? {
127131
range.map { input[$0] }
128132
}
133+
134+
/// The captured value, `nil` for no-capture
135+
public var value: Any? {
136+
fatalError()
137+
}
129138
}
130139

131140
public var startIndex: Int {
@@ -152,3 +161,23 @@ extension AnyRegexOutput: RandomAccessCollection {
152161
.init(representation: _elements[position], input: input)
153162
}
154163
}
164+
165+
extension Regex.Match where Output == AnyRegexOutput {
166+
/// Creates a type-erased regex match from an existing match.
167+
///
168+
/// Use this initializer to fit a regex match with strongly typed captures into the
169+
/// use site of a dynamic regex match, i.e. one that was created from a string.
170+
public init<Output>(_ match: Regex<Output>.Match) {
171+
fatalError("FIXME: Not implemented")
172+
}
173+
174+
/// Returns a typed match by converting the underlying values to the specified
175+
/// types.
176+
///
177+
/// - Parameter type: The expected output type.
178+
/// - Returns: A match generic over the output type if the underlying values can be converted to the
179+
/// output type. Returns `nil` otherwise.
180+
public func `as`<Output>(_ type: Output.Type) -> Regex<Output>.Match? {
181+
fatalError("FIXME: Not implemented")
182+
}
183+
}

0 commit comments

Comments
 (0)