Skip to content

Sync 5.7 branch with main #409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 188 commits into from
May 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
188 commits
Select commit Hold shift + click to select a range
381a3c7
Add DelimiterSyntax.md
hamishknight Mar 18, 2022
c79f457
Expand on parsing issues with `/` as delimited. Add a note about edit…
DaveEwing Mar 20, 2022
731292e
Rewrite that Editor Considerations paragraph.
DaveEwing Mar 21, 2022
1049276
Change single quote constructs to be invalid
hamishknight Mar 21, 2022
c7d556c
Elaborate on starting character limitations
hamishknight Mar 21, 2022
7074cfb
Elaborate on comma case
hamishknight Mar 21, 2022
945219e
grammar
hamishknight Mar 21, 2022
d1d0d57
Add comma disambiguation
hamishknight Mar 21, 2022
bcebfc6
Update comma disambiguation
hamishknight Mar 21, 2022
1c2b7ad
Flip pitch to `/.../` as the main syntax
hamishknight Mar 22, 2022
e841184
Update alternatives considered
hamishknight Mar 23, 2022
be7a802
Fix headings
hamishknight Mar 23, 2022
06c2b28
Tweak phrasing
hamishknight Mar 23, 2022
91a93a8
Small tweaks
hamishknight Mar 23, 2022
c0e3bef
Tweak
hamishknight Mar 23, 2022
298092c
Flesh things out a bit more. Initial bits for the Intro and Motivatio…
DaveEwing Mar 24, 2022
1ebefa1
Expand out disclosure triangles, and other tweaks
hamishknight Mar 24, 2022
811bfcb
Generalize discussion on language mode
hamishknight Mar 24, 2022
70be006
Expand some prose
hamishknight Mar 25, 2022
4bb25b3
Update to also pitch `#/.../#`
hamishknight Mar 28, 2022
35d9132
Add DSL example
hamishknight Mar 28, 2022
f99dadb
Rejig motivation/solution
hamishknight Mar 29, 2022
eed1b24
Expand on typed captures
hamishknight Mar 29, 2022
7ad4037
Generalize language mode
hamishknight Mar 29, 2022
f4ef0c2
Add multi-line mode
hamishknight Mar 31, 2022
b25a0b8
Remove Normalization and Grapheme data for SPI
Azoy Mar 31, 2022
addfbfd
Update pitch
hamishknight Apr 1, 2022
bb819f6
Clarify upgrade path
hamishknight Apr 1, 2022
36f7160
Remove AST CustomCharacterClass consumer generation
hamishknight Apr 4, 2022
ed9f72c
Convert scalar escape sequences to DSL scalars
hamishknight Apr 4, 2022
afcc40b
Update proposals (#248)
milseman Apr 4, 2022
d56d706
Update status link (#249)
milseman Apr 4, 2022
d57107b
Update DSL proposal.
rxwei Apr 5, 2022
c8c2001
Merge pull request #250 from rxwei/dsl-update
rxwei Apr 5, 2022
5ec5001
Complete list of authors
rxwei Apr 5, 2022
38aedc0
Merge pull request #251 from rxwei/dsl-update
rxwei Apr 5, 2022
b78d7d0
Add `Regex<Match, Captures>` alternative to regex proposal
rxwei Apr 5, 2022
1310985
Merge pull request #252 from rxwei/dsl-update
rxwei Apr 5, 2022
45e8a1f
Fix HexDigit definition in RegexSyntax.md
hamishknight Apr 5, 2022
189d329
Merge pull request #253 from hamishknight/fix-hex-syntax
hamishknight Apr 5, 2022
ebfcdb3
Merge pull request #240 from Azoy/spi-unicode
Azoy Apr 5, 2022
0a9447a
Update typed captures section + other tweaks
hamishknight Apr 6, 2022
5d4d136
Merge pull request #245 from hamishknight/to-scale
hamishknight Apr 7, 2022
52bc932
Clean up based on the String Processing Algorithms proposal (#247)
itingliu Apr 7, 2022
3b77fe4
Fill out remainder of options API (#246)
natecook1000 Apr 7, 2022
93abfcb
Move `CharacterClass` API into RegexBuilder (#254)
natecook1000 Apr 8, 2022
b41edbd
Author links
hamishknight Apr 8, 2022
0c2ed87
Update regex syntax pitch (#258)
milseman Apr 8, 2022
e0cea6c
Typo (#259)
milseman Apr 8, 2022
2e80ced
Proposal cleanup (#260)
milseman Apr 8, 2022
c5db717
Update ProposalOverview.md
milseman Apr 8, 2022
8da68d3
Updated extended delimiter section
hamishknight Apr 8, 2022
5af9ca5
Eliminate extra public API (#256)
natecook1000 Apr 8, 2022
9d0cf04
Update delimiter proposal
milseman Apr 9, 2022
fed4c53
Throwing customization hooks (#261)
milseman Apr 10, 2022
720c10c
Update Documentation/Evolution/DelimiterSyntax.md
hamishknight Apr 11, 2022
c045e21
Clarify backslash rule
hamishknight Apr 11, 2022
ff6b443
Remove docc-plugin dependency (#263)
natecook1000 Apr 11, 2022
969bd91
Added a Trying it out section
milseman Apr 11, 2022
e31262d
Minor tweaks
hamishknight Apr 11, 2022
0454f13
Allow custom character classes to begin with `:`
hamishknight Apr 12, 2022
19adde4
Merge pull request #269 from hamishknight/posixbly
hamishknight Apr 12, 2022
bf7702f
Update pitch
hamishknight Apr 12, 2022
9c8a116
Rename DelimiterSyntax.md -> RegexLiterals.md
hamishknight Apr 12, 2022
ce458eb
Nominalize API names (#271)
milseman Apr 12, 2022
657c4a6
Allow POSIX character properties outside of custom character classes
hamishknight Apr 13, 2022
0338178
Merge pull request #187 from hamishknight/delimiter-syntax
hamishknight Apr 13, 2022
81ce4ef
Update RegexLiterals.md
hamishknight Apr 13, 2022
fad4dd9
Merge pull request #272 from hamishknight/posix-quirks
hamishknight Apr 13, 2022
39cb22d
Merge pull request #274 from hamishknight/tweak-regex-literal-pitch
hamishknight Apr 13, 2022
859c3d5
Update RegexLiterals.md
hamishknight Apr 14, 2022
935d748
Merge pull request #277 from hamishknight/tweak-regex-literal-pitch
hamishknight Apr 14, 2022
33b0b4c
Fix character class trivia matching
hamishknight Apr 14, 2022
70ecb27
Fix trivia parsing for set operations and initial `]` cases
hamishknight Apr 14, 2022
63ab0a9
Merge pull request #278 from hamishknight/trivagone
hamishknight Apr 14, 2022
a487e94
Add SwiftStdlib 5.7 availability (#276)
rxwei Apr 14, 2022
6f7ab96
Throw error if we encounter stray opening '('
hamishknight Apr 14, 2022
aede1f7
Change matching option scoping behavior to match PCRE
hamishknight Apr 14, 2022
4428e7f
Move RegexComponent conformances to RegexBuilder (#279)
natecook1000 Apr 14, 2022
34da2b6
Merge pull request #237 from hamishknight/mix-n-match
hamishknight Apr 14, 2022
89da9f8
Error on unknown character properties
hamishknight Apr 14, 2022
da89bf7
Rename RegexComponent.Output (#281)
natecook1000 Apr 14, 2022
b3b8fee
Add remaining availability annotations.
rxwei Apr 15, 2022
a0ed7e1
Merge pull request #283 from rxwei/fix-availability
rxwei Apr 15, 2022
dfec8fb
Import _RegexParser as implementation only
rxwei Apr 15, 2022
59ad177
Merge pull request #287 from apple/impl-import
rxwei Apr 15, 2022
3293905
Fix release build.
rxwei Apr 16, 2022
c6cdf6c
Throwing matches and update to CustomMatchingRegexComponent
itingliu Apr 13, 2022
15a20b2
Merge pull request #292 from rxwei/fix-288
rxwei Apr 16, 2022
a342405
Add Substring algorithms tests (#289)
natecook1000 Apr 18, 2022
b959d0a
Merge pull request #273 from itingliu/throwing-hooks
rxwei Apr 18, 2022
fea6fe2
RegexBuilder quantifiers take an optional behavior (#293)
natecook1000 Apr 18, 2022
42641da
Nominalize option methods (#295)
natecook1000 Apr 18, 2022
e1604a6
Merge pull request #280 from hamishknight/error-on-unknown-props
hamishknight Apr 19, 2022
3f16170
Don't parse a character property containing a backslash
hamishknight Apr 19, 2022
fa5f2f1
Update Regex Syntax document for `[:...:]` changes
hamishknight Apr 19, 2022
9ccde19
Support obtaining captures by name on `AnyRegexOutput` (#300)
rxwei Apr 19, 2022
182da3b
Untangle `_RegexParser` from `RegexBuilder` (#299)
natecook1000 Apr 19, 2022
8068ea1
Merge pull request #301 from hamishknight/yet-more-posix-quirks
hamishknight Apr 19, 2022
08b7808
Merge pull request #302 from hamishknight/update-syntax
hamishknight Apr 19, 2022
00aa315
Expose `matches`, `ranges` and `split` (#304)
itingliu Apr 19, 2022
15355bf
Convenience quoting (#305)
milseman Apr 19, 2022
46b9a0f
Remove compiling argument label (#306)
milseman Apr 20, 2022
b24d3ea
Move the closure argument to the end of the arg list (#307)
itingliu Apr 21, 2022
f9a4675
Adds RegexBuilder.CharacterClass.anyUnicodeScalar (#315)
natecook1000 Apr 21, 2022
4857bc7
Allow setting any of the three quant behaviors (#311)
natecook1000 Apr 21, 2022
73a5ccf
Add `wholeMatch` and `prefixMatch` (#286)
itingliu Apr 22, 2022
3e2160c
Update local proposal copies (#317)
milseman Apr 22, 2022
53acbb2
Update ProposalOverview.md
milseman Apr 22, 2022
b057c4e
Update ProposalOverview.md
milseman Apr 22, 2022
8dd8470
Unicode for String Processing proposal (#257)
natecook1000 Apr 22, 2022
81bc5d0
Updates for algorithms proposal (#319)
milseman Apr 22, 2022
89b80bf
Preparation for location aware diagnostics in the compiler.
rintaro Apr 11, 2022
06f40f6
Merge pull request #321 from rintaro/diagnostic-swiftcompiler
rintaro Apr 22, 2022
1f99047
Rename CustomPrefixMatchRegexComponent to CustomConsumingRegexCompone…
itingliu Apr 22, 2022
563d6c2
Remove String.Index evils
Azoy Apr 23, 2022
70c0756
Merge pull request #331 from Azoy/remove-string-index-init
Azoy Apr 23, 2022
571e259
More updates for algorithms proposal (#324)
natecook1000 Apr 23, 2022
2df0f24
Add ~= overloads (#335)
milseman Apr 23, 2022
dfa4ea1
Fix a missed type name change
natecook1000 Apr 23, 2022
6fab471
Add a default arity and an flag for silencing logs
natecook1000 Apr 23, 2022
7b737e3
Add `@RegexComponentBuilder` overloads for string processing algorith…
itingliu Apr 24, 2022
c56dc76
Remove @testable annotations where possible
natecook1000 Apr 22, 2022
59a34cb
Switch FixedPatternConsumer to be over Sequence
natecook1000 Apr 23, 2022
caad657
Update Sequence/Collection constraints
natecook1000 Apr 23, 2022
b7a021f
Update trim(while:) - rethrowing and nonescaping
natecook1000 Apr 23, 2022
e0b4d5e
Add tests for trim methods
natecook1000 Apr 23, 2022
d6a01e7
Add maxSplits and omitEmpty to split methods
natecook1000 Apr 23, 2022
e81a8bd
Add tests / fixes for contains / firstRange(of:)
natecook1000 Apr 23, 2022
9e09bf8
Test to ensure stdlib `split` is still accessible
natecook1000 Apr 24, 2022
433740b
Fix stale links
milseman Apr 25, 2022
8dce8c2
Mention API naming consistency (#341)
milseman Apr 25, 2022
1fd5115
Generic `~=` operator
rxwei Apr 23, 2022
5af1427
Add `@RegexComponentBuilder` overloads for collection algorithms (#342)
itingliu Apr 25, 2022
882c2be
Merge pull request #339 from rxwei/generic-pattern-matching-operator
rxwei Apr 25, 2022
ab308ee
Revise doc comments for API reference style.
amartini51 Apr 21, 2022
1467470
Algorithm cleanup (#351)
milseman Apr 26, 2022
e0922ec
API stubs for casting and named captures (#349)
milseman Apr 26, 2022
bef2092
Mention language level pattern matching (#354)
milseman Apr 26, 2022
9ff87db
Fix empty.split w/ empty separator (#353)
natecook1000 Apr 26, 2022
2401a58
Add a section describing 'find empty' behavior (#352)
natecook1000 Apr 26, 2022
d0598b7
Add example from RegexBuilder proposal as a test (#344)
natecook1000 Apr 26, 2022
435090d
Refactor generator script (#356)
milseman Apr 26, 2022
96df14d
Fix cut-off sentence.
amartini51 Apr 26, 2022
12ef33b
Update PatternConverter
Azoy Apr 26, 2022
0e59bdf
Update Sources/_StringProcessing/Regex/Match.swift
milseman Apr 27, 2022
ac618f6
Back out an accidental source change.
amartini51 Apr 27, 2022
6c5c082
Merge pull request #350 from apple/amartini/docs_91301229
amartini51 Apr 27, 2022
7d03a1e
Introduce new compiler interface
hamishknight Apr 27, 2022
b7a03c9
Merge pull request #364 from hamishknight/compiler-interface
hamishknight Apr 28, 2022
2c386ef
Add some comments
Azoy Apr 28, 2022
40c177f
Merge pull request #359 from Azoy/update-patternconverter
Azoy Apr 28, 2022
b7fb965
Replace opaque results with generic parameters in Algorithms.swift (#…
hamishknight Apr 28, 2022
838bdfe
Simplify capture representations (#360)
milseman Apr 28, 2022
e748aea
Add NegativeLookahead and Anchor comments (#372)
natecook1000 May 2, 2022
13342eb
Add matching support for `\p{Lc}`
hamishknight May 3, 2022
925f51b
Add parser support for `\p{L&}`
hamishknight May 3, 2022
ade8f01
Merge pull request #373 from hamishknight/case-in-prop
hamishknight May 3, 2022
c44efeb
Update ProposalOverview.md
milseman May 3, 2022
9801855
Add tests for AnyRegexOutput (#371)
milseman May 3, 2022
0e5cfa8
Rename noAutoCapture -> namedCapturesOnly
hamishknight May 4, 2022
2a4b3a6
Implement the `(?n)` option
hamishknight May 4, 2022
f22cb4f
Merge pull request #377 from hamishknight/named-captures-only
hamishknight May 4, 2022
6d833aa
Improve Unicode/UTS18 and semantic level support (#268)
natecook1000 May 5, 2022
09a385b
Support Unicode scalar names in `\p{name=...}` (#382)
natecook1000 May 6, 2022
39c0ed5
Modify DSL test to test for uncaptured backreference (#355)
natecook1000 May 6, 2022
9740416
Introduce ASTStage parameter to `parse`
hamishknight May 9, 2022
4b31736
Implement semantic diagnostics
hamishknight May 9, 2022
466b375
Validate capture lists
hamishknight May 9, 2022
c95e862
Address review feedback
hamishknight May 9, 2022
7f068dc
Merge pull request #379 from hamishknight/sema
hamishknight May 9, 2022
c16e389
Implement \R, \v, \h for character/scalar modes (#384)
natecook1000 May 9, 2022
c13980f
De-deprecate MatchingOptions.matchLevel (#390)
natecook1000 May 9, 2022
61965c3
Restrict character property fuzzy matching to "pattern whitespace"
hamishknight May 10, 2022
05e610a
Improve the wording of a diagnostic
hamishknight May 10, 2022
7752015
Introduce AST.Atom.Scalar
hamishknight May 10, 2022
f436cca
Introduce scalar sequences `\u{AA BB CC}`
hamishknight May 10, 2022
0597164
Fix invalid indexing
hamishknight May 10, 2022
0872d16
Fix source location tracking in `lexUntil`
hamishknight May 10, 2022
5b30c5b
Merge pull request #386 from hamishknight/multiscalar
hamishknight May 10, 2022
b209e4f
Tidy up build flags and fix implicit import circular dependency (#392)
rxwei May 10, 2022
f779459
Catch more unquantifiable elements (#391)
natecook1000 May 10, 2022
87ea119
Disable resilience on _RegexParser (#397)
rxwei May 11, 2022
baf9f22
Introduce `One`
rxwei May 12, 2022
d9d02c1
Ban `]` as literal first character of custom character class
hamishknight May 12, 2022
d3ea692
Merge pull request #404 from hamishknight/ban-empty-cc
hamishknight May 12, 2022
b8178c2
Merge pull request #403 from rxwei/1
rxwei May 12, 2022
1a65e1e
Merge branch 'swift/release/5.7' into main_as_of_12_may
natecook1000 May 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,18 @@ let availabilityDefinition = PackageDescription.SwiftSetting.unsafeFlags([
"-Xfrontend",
"-define-availability",
"-Xfrontend",
#"SwiftStdlib 5.7:macOS 9999, iOS 9999, watchOS 9999, tvOS 9999"#,
"SwiftStdlib 5.7:macOS 9999, iOS 9999, watchOS 9999, tvOS 9999",
])

let stdlibSettings: [PackageDescription.SwiftSetting] = [
/// Swift settings for building a private stdlib-like module that is to be used
/// by other stdlib-like modules only.
let privateStdlibSettings: [PackageDescription.SwiftSetting] = [
.unsafeFlags(["-Xfrontend", "-disable-implicit-concurrency-module-import"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-string-processing-module-import"]),
]

/// Swift settings for building a user-facing stdlib-like module.
let publicStdlibSettings: [PackageDescription.SwiftSetting] = [
.unsafeFlags(["-enable-library-evolution"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-concurrency-module-import"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-string-processing-module-import"]),
Expand Down Expand Up @@ -43,7 +51,7 @@ let package = Package(
.target(
name: "_RegexParser",
dependencies: [],
swiftSettings: stdlibSettings),
swiftSettings: privateStdlibSettings),
.testTarget(
name: "MatchingEngineTests",
dependencies: [
Expand All @@ -55,16 +63,16 @@ let package = Package(
.target(
name: "_StringProcessing",
dependencies: ["_RegexParser", "_CUnicode"],
swiftSettings: stdlibSettings),
swiftSettings: publicStdlibSettings),
.target(
name: "RegexBuilder",
dependencies: ["_StringProcessing", "_RegexParser"],
swiftSettings: stdlibSettings),
swiftSettings: publicStdlibSettings),
.testTarget(
name: "RegexTests",
dependencies: ["_StringProcessing"],
swiftSettings: [
.unsafeFlags(["-Xfrontend", "-disable-availability-checking"])
.unsafeFlags(["-Xfrontend", "-disable-availability-checking"]),
]),
.testTarget(
name: "RegexBuilderTests",
Expand Down
2 changes: 1 addition & 1 deletion Sources/PatternConverter/PatternConverter.swift
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ struct PatternConverter: ParsableCommand {
print("Converting '\(delim)\(regex)\(delim)'")

let ast = try _RegexParser.parse(
regex,
regex, .semantic,
experimentalSyntax ? .experimental : .traditional)

// Show rendered source ranges
Expand Down
6 changes: 3 additions & 3 deletions Sources/_RegexParser/Regex/AST/AST.swift
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ extension AST {

extension AST {
/// A node in the regex AST.
@frozen
public indirect enum Node:
Hashable, _TreeNode //, _ASTPrintable ASTValue, ASTAction
{
Expand Down Expand Up @@ -125,7 +124,9 @@ extension AST.Node {
switch self {
case .atom(let a):
return a.isQuantifiable
case .group, .conditional, .customCharacterClass, .absentFunction:
case .group(let g):
return g.isQuantifiable
case .conditional, .customCharacterClass, .absentFunction:
return true
case .alternation, .concatenation, .quantification, .quote, .trivia,
.empty:
Expand Down Expand Up @@ -247,7 +248,6 @@ extension AST {
}

public struct Reference: Hashable {
@frozen
public enum Kind: Hashable {
// \n \gn \g{n} \g<n> \g'n' (?n) (?(n)...
// Oniguruma: \k<n>, \k'n'
Expand Down
81 changes: 70 additions & 11 deletions Sources/_RegexParser/Regex/AST/Atom.swift
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ extension AST {
self.location = loc
}

@frozen
public enum Kind: Hashable {
/// Just a character
///
Expand All @@ -29,7 +28,13 @@ extension AST {
/// A Unicode scalar value written as a literal
///
/// \u{...}, \0dd, \x{...}, ...
case scalar(Unicode.Scalar)
case scalar(Scalar)

/// A whitespace-separated sequence of Unicode scalar values which are
/// implicitly splatted out.
///
/// `\u{A B C}` -> `\u{A}\u{B}\u{C}`
case scalarSequence(ScalarSequence)

/// A Unicode property, category, or script, including those written using
/// POSIX syntax.
Expand Down Expand Up @@ -84,6 +89,7 @@ extension AST.Atom {
switch kind {
case .char(let v): return v
case .scalar(let v): return v
case .scalarSequence(let v): return v
case .property(let v): return v
case .escaped(let v): return v
case .keyboardControl(let v): return v
Expand All @@ -106,6 +112,30 @@ extension AST.Atom {
}
}

extension AST.Atom {
public struct Scalar: Hashable {
public var value: UnicodeScalar
public var location: SourceLocation

public init(_ value: UnicodeScalar, _ location: SourceLocation) {
self.value = value
self.location = location
}
}

public struct ScalarSequence: Hashable {
public var scalars: [Scalar]
public var trivia: [AST.Trivia]

public init(_ scalars: [Scalar], trivia: [AST.Trivia]) {
precondition(scalars.count > 1, "Expected multiple scalars")
self.scalars = scalars
self.trivia = trivia
}
public var scalarValues: [Unicode.Scalar] { scalars.map(\.value) }
}
}

extension AST.Atom {

// TODO: We might scrap this and break out a few categories so
Expand All @@ -115,7 +145,6 @@ extension AST.Atom {

// Characters, character types, literals, etc., derived from
// an escape sequence.
@frozen
public enum EscapedBuiltin: Hashable {
// TODO: better doc comments

Expand Down Expand Up @@ -368,7 +397,6 @@ extension AST.Atom {
}

extension AST.Atom.CharacterProperty {
@frozen
public enum Kind: Hashable {
/// Matches any character, equivalent to Oniguruma's '\O'.
case any
Expand Down Expand Up @@ -396,6 +424,9 @@ extension AST.Atom.CharacterProperty {
case script(Unicode.Script)
case scriptExtension(Unicode.Script)

/// Character name in the form `\p{name=...}`
case named(String)

case posix(Unicode.POSIXProperty)

/// Some special properties implemented by PCRE and Oniguruma.
Expand All @@ -404,7 +435,6 @@ extension AST.Atom.CharacterProperty {
}

// TODO: erm, separate out or fold into something? splat it in?
@frozen
public enum PCRESpecialCategory: String, Hashable {
case alphanumeric = "Xan"
case posixSpace = "Xps"
Expand All @@ -416,7 +446,6 @@ extension AST.Atom.CharacterProperty {

extension AST.Atom {
/// Anchors and other built-in zero-width assertions.
@frozen
public enum AssertionKind: String {
/// \A
case startOfSubject = #"\A"#
Expand Down Expand Up @@ -665,6 +694,23 @@ extension AST.Atom.EscapedBuiltin {
return nil
}
}

public var isQuantifiable: Bool {
switch self {
case .alarm, .escape, .formfeed, .newline, .carriageReturn, .tab,
.singleDataUnit, .decimalDigit, .notDecimalDigit, .horizontalWhitespace,
.notHorizontalWhitespace, .notNewline, .newlineSequence, .whitespace,
.notWhitespace, .verticalTab, .notVerticalTab, .wordCharacter,
.notWordCharacter, .backspace, .graphemeCluster, .trueAnychar:
return true

case .wordBoundary, .notWordBoundary, .startOfSubject,
.endOfSubjectBeforeNewline, .endOfSubject,
.firstMatchingPositionInSubject, .resetStartOfMatch, .textSegment,
.notTextSegment:
return false
}
}
}

extension AST.Atom {
Expand All @@ -677,7 +723,7 @@ extension AST.Atom {
case .char(let c):
return c
case .scalar(let s):
return Character(s)
return Character(s.value)

case .escaped(let c):
return c.scalarValue.map(Character.init)
Expand All @@ -693,8 +739,9 @@ extension AST.Atom {
// the AST? Or defer for the matching engine?
return nil

case .property, .any, .startOfLine, .endOfLine, .backreference, .subpattern,
.callout, .backtrackingDirective, .changeMatchingOptions:
case .scalarSequence, .property, .any, .startOfLine, .endOfLine,
.backreference, .subpattern, .callout, .backtrackingDirective,
.changeMatchingOptions:
return nil
}
}
Expand All @@ -716,13 +763,21 @@ extension AST.Atom {
/// A string literal representation of the atom, if possible.
///
/// Individual characters are returned as-is, and Unicode scalars are
/// presented using "\u{nnnn}" syntax.
/// presented using "\u{nn nn ...}" syntax.
public var literalStringValue: String? {
func scalarLiteral(_ u: [UnicodeScalar]) -> String {
let digits = u.map { String($0.value, radix: 16, uppercase: true) }
.joined(separator: " ")
return "\\u{\(digits)}"
}
switch kind {
case .char(let c):
return String(c)
case .scalar(let s):
return "\\u{\(String(s.value, radix: 16, uppercase: true))}"
return scalarLiteral([s.value])

case .scalarSequence(let s):
return scalarLiteral(s.scalarValues)

case .keyboardControl(let x):
return "\\C-\(x)"
Expand All @@ -746,6 +801,10 @@ extension AST.Atom {
case .changeMatchingOptions:
return false
// TODO: Are callouts quantifiable?
case .escaped(let esc):
return esc.isQuantifiable
case .startOfLine, .endOfLine:
return false
default:
return true
}
Expand Down
3 changes: 0 additions & 3 deletions Sources/_RegexParser/Regex/AST/CustomCharClass.swift
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ extension AST {
self.location = sr
}

@frozen
public enum Member: Hashable {
/// A nested custom character class `[[ab][cd]]`
case custom(CustomCharacterClass)
Expand Down Expand Up @@ -59,13 +58,11 @@ extension AST {
self.rhs = rhs
}
}
@frozen
public enum SetOp: String, Hashable {
case subtraction = "--"
case intersection = "&&"
case symmetricDifference = "~~"
}
@frozen
public enum Start: String {
case normal = "["
case inverted = "[^"
Expand Down
15 changes: 15 additions & 0 deletions Sources/_RegexParser/Regex/AST/Group.swift
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,18 @@ extension AST.Group {
}
}
}

extension AST.Group {
var isQuantifiable: Bool {
switch kind.value {
case .capture, .namedCapture, .balancedCapture, .nonCapture,
.nonCaptureReset, .atomicNonCapturing, .scriptRun, .atomicScriptRun,
.changeMatchingOptions:
return true

case .lookahead, .negativeLookahead, .nonAtomicLookahead,
.lookbehind, .negativeLookbehind, .nonAtomicLookbehind:
return false
}
}
}
2 changes: 0 additions & 2 deletions Sources/_RegexParser/Regex/AST/Quantification.swift
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ extension AST {
self.trivia = trivia
}

@frozen
public enum Amount: Hashable {
case zeroOrMore // *
case oneOrMore // +
Expand All @@ -47,7 +46,6 @@ extension AST {
case range(Located<Int>, Located<Int>) // {n,m}
}

@frozen
public enum Kind: String, Hashable {
case eager = ""
case reluctant = "?"
Expand Down
15 changes: 10 additions & 5 deletions Sources/_RegexParser/Regex/Parse/CaptureList.swift
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,18 @@ extension CaptureList {
public var name: String?
public var type: Any.Type?
public var optionalDepth: Int
public var location: SourceLocation

public init(
name: String? = nil,
type: Any.Type? = nil,
optionalDepth: Int
optionalDepth: Int,
_ location: SourceLocation
) {
self.name = name
self.type = type
self.optionalDepth = optionalDepth
self.location = location
}
}
}
Expand All @@ -61,13 +64,14 @@ extension AST.Node {
case let .group(g):
switch g.kind.value {
case .capture:
list.append(.init(optionalDepth: nesting))
list.append(.init(optionalDepth: nesting, g.location))

case .namedCapture(let name):
list.append(.init(name: name.value, optionalDepth: nesting))
list.append(.init(name: name.value, optionalDepth: nesting, g.location))

case .balancedCapture(let b):
list.append(.init(name: b.name?.value, optionalDepth: nesting))
list.append(.init(name: b.name?.value, optionalDepth: nesting,
g.location))

default: break
}
Expand Down Expand Up @@ -124,7 +128,8 @@ extension CaptureList.Capture: Equatable {
public static func == (lhs: Self, rhs: Self) -> Bool {
lhs.name == rhs.name &&
lhs.optionalDepth == rhs.optionalDepth &&
lhs.type == rhs.type
lhs.type == rhs.type &&
lhs.location == rhs.location
}
}
extension CaptureList: Equatable {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ extension Source {
// This follows the rules provided by UAX44-LM3, including trying to drop an
// "is" prefix, which isn't required by UTS#18 RL1.2, but is nice for
// consistency with other engines and the Unicode.Scalar.Properties names.
let str = str.filter { !$0.isWhitespace && $0 != "_" && $0 != "-" }
let str = str.filter { !$0.isPatternWhitespace && $0 != "_" && $0 != "-" }
.lowercased()
if let m = match(str) {
return m
Expand Down Expand Up @@ -428,6 +428,8 @@ extension Source {
if let cat = classifyGeneralCategory(value) {
return .generalCategory(cat)
}
case "name", "na":
return .named(value)
default:
break
}
Expand Down
Loading