You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Documentation/Evolution/RegexSyntaxRunTimeConstruction.md
+89-8Lines changed: 89 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
Hello, we want to issue an update to [Regular Expression Literals](https://forums.swift.org/t/pitch-regular-expression-literals/52820) and prepare for a formal proposal. The great delimiter deliberation continues to unfold, so in the meantime, we have a significant amount of surface area to present for review/feedback: the syntax _inside_ a regex literal. Additionally, this is the syntax accepted from a string used for run-time regex construction, so we're devoting an entire pitch/proposal to the topic of _regex syntax_, distinct from the result builder DSL or the choice of delimiters for literals.
@@ -16,7 +16,7 @@ The overall story is laid out in [Regex Type and Overview][overview] and each in
16
16
17
17
Swift aims to be a pragmatic programming language, striking a balance between familiarity, interoperability, and advancing the art. Swift's `String` presents a uniquely Unicode-forward model of string, but currently suffers from limited processing facilities.
18
18
19
-
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage.
19
+
`NSRegularExpression` can construct a processing pipeline from a string containing [ICU regular expression syntax][icu-syntax]. However, it is inherently tied to ICU's engine and thus it operates over a fundamentally different model of string than Swift's `String`. It is also limited in features and carries a fair amount of Objective-C baggage, such as the need to translate between `NSRange` and `Range`.
20
20
21
21
```swift
22
22
let pattern =#"(\w+)\s\s+(\S+)\s\s+((?:(?!\s\s).)*)\s\s+(.*)"#
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic mismatch between ICU and Swift's `String` is discussed in [Unicode for String Processing][pitches].
45
+
Fixing these fundamental limitations requires migrating to a completely different engine and type system representation. This is the path we're proposing with `Regex`, outlined in [Regex Type and Overview][overview]. Details on the semantic differences between ICU's string model and Swift's `String` is discussed in [Unicode for String Processing][pitches].
46
46
47
47
Run-time construction is important for tools and editors. For example, SwiftPM allows the user to provide a regular expression to filter tests via `swift test --filter`.
We propose accepting a syntactic "superset" of the following existing regular expression engines:
@@ -80,11 +79,87 @@ Regex syntax will be part of Swift's source-compatibility story as well as its b
80
79
81
80
## Detailed Design
82
81
83
-
<!--
84
-
... init, dynamic match, conversion to static
85
-
-->
82
+
We propose initializers to declare and compile a regex from syntax. Upon failure, these initializers throw compilation errors, such as for syntax or type errors. API for retrieving error information is future work.
83
+
84
+
```swift
85
+
extensionRegex {
86
+
/// Parse and compile `pattern`, resulting in a strongly-typed capture list.
/// The range over which a value was captured. `nil` for no-capture.
117
+
publicvar range: Range<String.Index>?
118
+
119
+
/// The slice of the input over which a value was captured. `nil` for no-capture.
120
+
publicvar substring: Substring?
121
+
122
+
/// The captured value. `nil` for no-capture.
123
+
publicvar value: Any?
124
+
}
125
+
126
+
// Trivial collection conformance requirements
86
127
87
-
We propose the following syntax for regex.
128
+
publicvar startIndex: Int { get }
129
+
130
+
publicvar endIndex: Int { get }
131
+
132
+
publicvar count: Int { get }
133
+
134
+
publicfuncindex(afteri: Int) ->Int
135
+
136
+
publicfuncindex(beforei: Int) ->Int
137
+
138
+
publicsubscript(position: Int) ->Element
139
+
}
140
+
```
141
+
142
+
We propose adding an API to `Regex<AnyRegexOutput>.Match` to cast the output type to a concrete one. A regex match will lazily create a `Substring` on demand, so casting the match itself saves ARC traffic vs extracting and casting the output.
143
+
144
+
```swift
145
+
extensionRegex.Match where Output == AnyRegexOutput {
146
+
/// Creates a type-erased regex match from an existing match.
147
+
///
148
+
/// Use this initializer to fit a regex match with strongly typed captures into the
149
+
/// use site of a dynamic regex match, i.e. one that was created from a string.
150
+
publicinit<Output>(_match: Regex<Output>.Match)
151
+
152
+
/// Returns a typed match by converting the underlying values to the specified
153
+
/// types.
154
+
///
155
+
/// - Parameter type: The expected output type.
156
+
/// - Returns: A match generic over the output type if the underlying values can be converted to the
The rest of this proposal will be a detailed and exhaustive definition of our proposed regex syntax.
88
163
89
164
<details><summary>Grammar Notation</summary>
90
165
@@ -856,6 +931,12 @@ We are deferring runtime support for callouts from regex literals as future work
856
931
857
932
## Alternatives Considered
858
933
934
+
### Failalbe inits
935
+
936
+
There are many ways for compilation to fail, from syntactic errors to unsupported features to type mismatches. In the general case, run-time compilation errors are not recoverable by a tool without modifying the user's input. Even then, the thrown errors contain valuable information as to why compilation failed. For example, swiftpm presents any errors directly to the user.
937
+
938
+
As proposed, the errors thrown will be the same errors presented to the Swift compiler, tracking fine-grained source locations with specific reasons why compilation failed. Defining a rich error API is future work, as these errors are rapidly evolving and it is too early to lock in the ABI.
0 commit comments