Skip to content

[Integration] main (4d04019) -> swift/main #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
May 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
3f54941
Implement .as for Regex
Azoy May 3, 2022
7e1ab7d
Unify Match and AnyRegexOutput
Azoy May 3, 2022
bc51e91
Ban numeric escapes in custom character classes
hamishknight May 10, 2022
a4a4a9a
Ban confusable multi-scalar ASCII characters
hamishknight May 10, 2022
db58c1b
Reserve `<{...}>` for interpolation syntax
hamishknight May 10, 2022
a53a40b
Remove the namedCaptureOffset and StructuredCapture
Azoy May 10, 2022
87ea119
Disable resilience on _RegexParser (#397)
rxwei May 11, 2022
baf9f22
Introduce `One`
rxwei May 12, 2022
d9d02c1
Ban `]` as literal first character of custom character class
hamishknight May 12, 2022
d3ea692
Merge pull request #404 from hamishknight/ban-empty-cc
hamishknight May 12, 2022
21f7910
Subsume referencedCaptureOffsets
Azoy May 12, 2022
c7b70a4
Add optional tests
Azoy May 12, 2022
b8178c2
Merge pull request #403 from rxwei/1
rxwei May 12, 2022
9d86c21
Wrap character classes around One
Azoy May 12, 2022
24c139a
fix intersection, subtraction, symmetricDiference
Azoy May 12, 2022
489c63c
Merge pull request #410 from Azoy/more-patternconverter-updates
Azoy May 13, 2022
9cf3cfc
Merge pull request #393 from hamishknight/stricter-syntax
hamishknight May 13, 2022
adf5688
Don't get stuck on empty matches (#415)
natecook1000 May 15, 2022
4f1e0ee
Underscore internal algorithms methods (#414)
natecook1000 May 15, 2022
4f8f67a
Remove the last SPI use of _RegexParser symbols (#416)
natecook1000 May 15, 2022
a4d7be0
Keep track of initial options in compiled program (#412)
natecook1000 May 16, 2022
c000596
More unicode properties (#385)
natecook1000 May 16, 2022
812c394
Keep substring bounds when searching in Regex.wholeMatch
natecook1000 May 17, 2022
ba33c0d
Merge pull request #421 from natecook1000/fix_wholematch_substring
natecook1000 May 17, 2022
7969272
Merge pull request #376 from Azoy/types-types-and-more-types
Azoy May 18, 2022
74f3b99
Add test fixtures for renderAsBuilderDSL (#423)
natecook1000 May 19, 2022
88dc9dd
Fix algorithms overload resolution issues (#402)
natecook1000 May 19, 2022
06dbc16
Introduce Source.lookahead
hamishknight May 24, 2022
8242df6
Remove `throws` from a couple of lexing methods
hamishknight May 24, 2022
e80322b
Add ASTBuilder helper for char class set operations
hamishknight May 24, 2022
1e57c5a
Simplify character class parsing a little
hamishknight May 24, 2022
95dc487
Dump the inverted bit of a custom character class
hamishknight May 24, 2022
9d84967
Allow empty comments
hamishknight May 24, 2022
24b64cd
Lex whitespace in range quantifiers
hamishknight May 24, 2022
8388d0f
Parse end-of-line comments in custom character classes
hamishknight May 24, 2022
5b0524a
Allow trivia between character class range operands
hamishknight May 24, 2022
bd9bf23
Merge pull request #431 from hamishknight/trivia-pursuit
hamishknight May 25, 2022
720ddd2
Implement named backreferences
hamishknight May 25, 2022
4b7d534
Remove `namedCaptureOffsets` from MECaptureList
hamishknight May 25, 2022
471e073
Merge pull request #433 from hamishknight/named-refs
hamishknight May 25, 2022
5495a75
Make `RegexCompilationError` internal
rxwei May 26, 2022
a936e9e
Merge pull request #438 from rxwei/internal-regex-compilation-error
rxwei May 26, 2022
f1b8581
Formalize Unicode block properties
hamishknight May 26, 2022
05f73db
Parse Java character properties
hamishknight May 26, 2022
4d04019
Merge pull request #440 from hamishknight/chunk-loader
hamishknight May 27, 2022
6d1d146
Merge branch 'main' into main-merge
hamishknight May 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,18 @@ let availabilityDefinition = PackageDescription.SwiftSetting.unsafeFlags([
"-Xfrontend",
"-define-availability",
"-Xfrontend",
#"SwiftStdlib 5.7:macOS 9999, iOS 9999, watchOS 9999, tvOS 9999"#,
"SwiftStdlib 5.7:macOS 9999, iOS 9999, watchOS 9999, tvOS 9999",
])

let stdlibSettings: [PackageDescription.SwiftSetting] = [
/// Swift settings for building a private stdlib-like module that is to be used
/// by other stdlib-like modules only.
let privateStdlibSettings: [PackageDescription.SwiftSetting] = [
.unsafeFlags(["-Xfrontend", "-disable-implicit-concurrency-module-import"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-string-processing-module-import"]),
]

/// Swift settings for building a user-facing stdlib-like module.
let publicStdlibSettings: [PackageDescription.SwiftSetting] = [
.unsafeFlags(["-enable-library-evolution"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-concurrency-module-import"]),
.unsafeFlags(["-Xfrontend", "-disable-implicit-string-processing-module-import"]),
Expand Down Expand Up @@ -43,7 +51,7 @@ let package = Package(
.target(
name: "_RegexParser",
dependencies: [],
swiftSettings: stdlibSettings),
swiftSettings: privateStdlibSettings),
.testTarget(
name: "MatchingEngineTests",
dependencies: [
Expand All @@ -55,11 +63,11 @@ let package = Package(
.target(
name: "_StringProcessing",
dependencies: ["_RegexParser", "_CUnicode"],
swiftSettings: stdlibSettings),
swiftSettings: publicStdlibSettings),
.target(
name: "RegexBuilder",
dependencies: ["_StringProcessing", "_RegexParser"],
swiftSettings: stdlibSettings),
swiftSettings: publicStdlibSettings),
.testTarget(
name: "RegexTests",
dependencies: ["_StringProcessing"],
Expand Down
3 changes: 2 additions & 1 deletion Sources/PatternConverter/PatternConverter.swift
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ struct PatternConverter: ParsableCommand {

print()
if !skipDSL {
let render = ast.renderAsBuilderDSL(
let render = renderAsBuilderDSL(
ast: ast,
maxTopDownLevels: topDownConversionLimit,
minBottomUpLevels: bottomUpConversionLimit
)
Expand Down
30 changes: 29 additions & 1 deletion Sources/RegexBuilder/Algorithms.swift
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
//
//===----------------------------------------------------------------------===//

import _StringProcessing
@_spi(RegexBuilder) import _StringProcessing

// FIXME(rdar://92459215): We should be using 'some RegexComponent' instead of
// <R: RegexComponent> for the methods below that don't impose any additional
Expand Down Expand Up @@ -313,3 +313,31 @@ where Self: BidirectionalCollection, SubSequence == Substring {
try replace(content(), maxReplacements: maxReplacements, with: replacement)
}
}

// String split overload breakers

extension StringProtocol where SubSequence == Substring {
@available(SwiftStdlib 5.7, *)
public func split(
separator: String,
maxSplits: Int = .max,
omittingEmptySubsequences: Bool = true
) -> [Substring] {
return _split(
separator: separator,
maxSplits: maxSplits,
omittingEmptySubsequences: omittingEmptySubsequences)
}

@available(SwiftStdlib 5.7, *)
public func split(
separator: Substring,
maxSplits: Int = .max,
omittingEmptySubsequences: Bool = true
) -> [Substring] {
return _split(
separator: separator,
maxSplits: maxSplits,
omittingEmptySubsequences: omittingEmptySubsequences)
}
}
13 changes: 13 additions & 0 deletions Sources/RegexBuilder/DSL.swift
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,19 @@ extension DSLTree.Node {
}
}

/// A regex component that matches exactly one occurrence of its underlying
/// component.
@available(SwiftStdlib 5.7, *)
public struct One<Output>: RegexComponent {
public var regex: Regex<Output>

public init<Component: RegexComponent>(
_ component: Component
) where Component.RegexOutput == Output {
self.regex = component.regex
}
}

@available(SwiftStdlib 5.7, *)
public struct OneOrMore<Output>: _BuiltinRegexComponent {
public var regex: Regex<Output>
Expand Down
18 changes: 15 additions & 3 deletions Sources/_RegexParser/Regex/AST/AST.swift
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ extension AST {

extension AST {
/// A node in the regex AST.
@frozen
public indirect enum Node:
Hashable, _TreeNode //, _ASTPrintable ASTValue, ASTAction
{
Expand All @@ -53,6 +52,9 @@ extension AST {
/// Comments, non-semantic whitespace, etc
case trivia(Trivia)

/// Intepolation `<{...}>`, currently reserved for future use.
case interpolation(Interpolation)

case atom(Atom)

case customCharacterClass(CustomCharacterClass)
Expand All @@ -78,6 +80,7 @@ extension AST.Node {
case let .quantification(v): return v
case let .quote(v): return v
case let .trivia(v): return v
case let .interpolation(v): return v
case let .atom(v): return v
case let .customCharacterClass(v): return v
case let .empty(v): return v
Expand Down Expand Up @@ -130,7 +133,7 @@ extension AST.Node {
case .conditional, .customCharacterClass, .absentFunction:
return true
case .alternation, .concatenation, .quantification, .quote, .trivia,
.empty:
.empty, .interpolation:
return false
}
}
Expand Down Expand Up @@ -194,6 +197,16 @@ extension AST {
}
}

public struct Interpolation: Hashable, _ASTNode {
public let contents: String
public let location: SourceLocation

public init(_ contents: String, _ location: SourceLocation) {
self.contents = contents
self.location = location
}
}

public struct Empty: Hashable, _ASTNode {
public let location: SourceLocation

Expand Down Expand Up @@ -249,7 +262,6 @@ extension AST {
}

public struct Reference: Hashable {
@frozen
public enum Kind: Hashable {
// \n \gn \g{n} \g<n> \g'n' (?n) (?(n)...
// Oniguruma: \k<n>, \k'n'
Expand Down
59 changes: 51 additions & 8 deletions Sources/_RegexParser/Regex/AST/Atom.swift
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ extension AST {
self.location = loc
}

@frozen
public enum Kind: Hashable {
/// Just a character
///
Expand Down Expand Up @@ -146,7 +145,6 @@ extension AST.Atom {

// Characters, character types, literals, etc., derived from
// an escape sequence.
@frozen
public enum EscapedBuiltin: Hashable {
// TODO: better doc comments

Expand Down Expand Up @@ -399,7 +397,6 @@ extension AST.Atom {
}

extension AST.Atom.CharacterProperty {
@frozen
public enum Kind: Hashable {
/// Matches any character, equivalent to Oniguruma's '\O'.
case any
Expand Down Expand Up @@ -430,27 +427,73 @@ extension AST.Atom.CharacterProperty {
/// Character name in the form `\p{name=...}`
case named(String)

/// Numeric type.
case numericType(Unicode.NumericType)

/// Numeric value.
case numericValue(Double)

/// Case mapping.
case mapping(MapKind, String)

/// Canonical Combining Class.
case ccc(Unicode.CanonicalCombiningClass)

/// Character age, as per UnicodeScalar.Properties.age.
case age(major: Int, minor: Int)

/// A block property.
case block(Unicode.Block)

case posix(Unicode.POSIXProperty)

/// Some special properties implemented by PCRE and Oniguruma.
case pcreSpecial(PCRESpecialCategory)
case onigurumaSpecial(OnigurumaSpecialProperty)

/// Some special properties implemented by Java.
case javaSpecial(JavaSpecial)

public enum MapKind: Hashable {
case lowercase
case uppercase
case titlecase
}
}

// TODO: erm, separate out or fold into something? splat it in?
@frozen
public enum PCRESpecialCategory: String, Hashable {
case alphanumeric = "Xan"
case posixSpace = "Xps"
case perlSpace = "Xsp"
case universallyNamed = "Xuc"
case perlWord = "Xwd"
}

/// Special Java properties that correspond to methods on
/// `java.lang.Character`, with the `java` prefix replaced by `is`.
public enum JavaSpecial: String, Hashable, CaseIterable {
case alphabetic = "javaAlphabetic"
case defined = "javaDefined"
case digit = "javaDigit"
case identifierIgnorable = "javaIdentifierIgnorable"
case ideographic = "javaIdeographic"
case isoControl = "javaISOControl"
case javaIdentifierPart = "javaJavaIdentifierPart" // not a typo, that's actually the name
case javaIdentifierStart = "javaJavaIdentifierStart" // not a typo, that's actually the name
case javaLetter = "javaLetter"
case javaLetterOrDigit = "javaLetterOrDigit"
case lowerCase = "javaLowerCase"
case mirrored = "javaMirrored"
case spaceChar = "javaSpaceChar"
case titleCase = "javaTitleCase"
case unicodeIdentifierPart = "javaUnicodeIdentifierPart"
case unicodeIdentifierStart = "javaUnicodeIdentifierStart"
case upperCase = "javaUpperCase"
case whitespace = "javaWhitespace"
}
}

extension AST.Atom {
/// Anchors and other built-in zero-width assertions.
@frozen
public enum AssertionKind: String {
/// \A
case startOfSubject = #"\A"#
Expand Down Expand Up @@ -824,7 +867,7 @@ extension AST.Node {
case .alternation, .concatenation, .group,
.conditional, .quantification, .quote,
.trivia, .customCharacterClass, .empty,
.absentFunction:
.absentFunction, .interpolation:
return nil
}
}
Expand Down
15 changes: 11 additions & 4 deletions Sources/_RegexParser/Regex/AST/CustomCharClass.swift
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ extension AST {
self.location = sr
}

@frozen
public enum Member: Hashable {
/// A nested custom character class `[[ab][cd]]`
case custom(CustomCharacterClass)
Expand All @@ -52,20 +51,23 @@ extension AST {
public var lhs: Atom
public var dashLoc: SourceLocation
public var rhs: Atom
public var trivia: [AST.Trivia]

public init(_ lhs: Atom, _ dashLoc: SourceLocation, _ rhs: Atom) {
public init(
_ lhs: Atom, _ dashLoc: SourceLocation, _ rhs: Atom,
trivia: [AST.Trivia]
) {
self.lhs = lhs
self.dashLoc = dashLoc
self.rhs = rhs
self.trivia = trivia
}
}
@frozen
public enum SetOp: String, Hashable {
case subtraction = "--"
case intersection = "&&"
case symmetricDifference = "~~"
}
@frozen
public enum Start: String {
case normal = "["
case inverted = "[^"
Expand Down Expand Up @@ -98,6 +100,11 @@ extension CustomCC.Member {
return false
}

public var asTrivia: AST.Trivia? {
guard case .trivia(let t) = self else { return nil }
return t
}

public var isSemantic: Bool {
!isTrivia
}
Expand Down
2 changes: 0 additions & 2 deletions Sources/_RegexParser/Regex/AST/Quantification.swift
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ extension AST {
self.trivia = trivia
}

@frozen
public enum Amount: Hashable {
case zeroOrMore // *
case oneOrMore // +
Expand All @@ -47,7 +46,6 @@ extension AST {
case range(Located<Int>, Located<Int>) // {n,m}
}

@frozen
public enum Kind: String, Hashable {
case eager = ""
case reluctant = "?"
Expand Down
17 changes: 16 additions & 1 deletion Sources/_RegexParser/Regex/Parse/CaptureList.swift
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,21 @@ extension CaptureList {
}
}

extension CaptureList {
/// Retrieve the capture index of a given named capture, or `nil` if there is
/// no such capture.
public func indexOfCapture(named name: String) -> Int? {
// Named references are guaranteed to be unique for literal ASTs by Sema.
// The DSL tree does not use named references.
captures.indices.first(where: { captures[$0].name == name })
}

/// Whether the capture list has a given named capture.
public func hasCapture(named name: String) -> Bool {
indexOfCapture(named: name) != nil
}
}

// MARK: Generating from AST

extension AST.Node {
Expand Down Expand Up @@ -103,7 +118,7 @@ extension AST.Node {
break
}

case .quote, .trivia, .atom, .customCharacterClass, .empty:
case .quote, .trivia, .atom, .customCharacterClass, .empty, .interpolation:
break
}
}
Expand Down
Loading