Skip to content

Commit b0cd8f1

Browse files
authored
Provide literal pattern for convertible regexes (#670)
This adds an (SPI) API for converting a `Regex` instance to a regex literal when possible, whether the regex starts as a literal, a run-time-constructed regex, or using the RegexBuilder syntax. The `_literalPattern` is round-trippable, so a regex created from a `_literalPattern` will have that same `_literalPattern`, but it could be different from the original pattern used to create a regex. This adds another confirmation test, as part of the parse testing. When a parsing test is expected to succeed, and contains supported syntax, this checks that a round-tripped literal pattern maintains its identity.
1 parent 4e742b4 commit b0cd8f1

File tree

10 files changed

+801
-24
lines changed

10 files changed

+801
-24
lines changed

Sources/_RegexParser/Regex/Parse/LexicalAnalysis.swift

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2091,9 +2091,7 @@ extension Parser {
20912091
// multiple scalars. These may be confusable for metacharacters, e.g
20922092
// `[\u{301}]` wouldn't be interpreted as a custom character class due
20932093
// to the combining accent (assuming it is literal, not `\u{...}`).
2094-
let scalars = char.unicodeScalars
2095-
if scalars.count > 1 && scalars.first!.isASCII && char != "\r\n" &&
2096-
!char.isLetter && !char.isNumber {
2094+
if char.isConfusable {
20972095
p.error(.confusableCharacter(char), at: charLoc.location)
20982096
}
20992097
break

Sources/_RegexParser/Utility/Misc.swift

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,19 @@ extension Character {
3232
let str = String(self)
3333
return str._nfcCodeUnits.elementsEqual(str.utf8)
3434
}
35+
36+
/// Whether this character could be confusable with a metacharacter in a
37+
/// regex literal.
38+
///
39+
/// A "confusable" character is one that starts with a non-alphanumeric ASCII
40+
/// character and includes other combining Unicode scalars. For example,
41+
/// `"[́"` (aka `"[\u{301}"`) is confusable, since it looks just like the
42+
/// `"["` metacharacter, but doesn't parse as one.
43+
public var isConfusable: Bool {
44+
let scalars = self.unicodeScalars
45+
return scalars.count > 1 && scalars.first!.isASCII && self != "\r\n" &&
46+
!self.isLetter && !self.isNumber
47+
}
3548
}
3649

3750
extension CustomStringConvertible {

Sources/_StringProcessing/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ add_library(_StringProcessing
7676
Compiler.swift
7777
ConsumerInterface.swift
7878
Executor.swift
79+
LiteralPrinter.swift
7980
MatchingOptions.swift
8081
PrintAsPattern.swift)
8182
target_compile_options(_StringProcessing PRIVATE

0 commit comments

Comments
 (0)