Skip to content

Updates for algorithms proposal #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Documentation/Evolution/ProposalOverview.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Covers the "interior" syntax, extended syntaxes, run-time construction of a rege

Proposes a slew of Regex-powered algorithms.

Introduces `CustomMatchingRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.
Introduces `CustomPrefixMatchRegexComponent`, which is a monadic-parser style interface for external parsers to be used as components of a regex.

## Unicode for String Processing

Expand Down
10 changes: 5 additions & 5 deletions Documentation/Evolution/RegexTypeOverview.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ The result builder allows for inline failable value construction, which particip

Swift regexes describe an unambiguous algorithm, where choice is ordered and effects can be reliably observed. For example, a `print()` statement inside the `TryCapture`'s transform function will run whenever the overall algorithm naturally dictates an attempt should be made. Optimizations can only elide such calls if they can prove it is behavior-preserving (e.g. "pure").

`CustomMatchingRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step:
`CustomPrefixMatchRegexComponent`, discussed in [String Processing Algorithms][pitches], allows industrial-strength parsers to be used a regex components. This allows us to drop the overly-permissive pre-parsing step:

```swift
func processEntry(_ line: String) -> Transaction? {
Expand Down Expand Up @@ -431,7 +431,7 @@ Regular expressions have a deservedly mixed reputation, owing to their historica

* "Regular expressions are bad because you should use a real parser"
- In other systems, you're either in or you're out, leading to a gravitational pull to stay in when... you should get out
- Our remedy is interoperability with real parsers via `CustomMatchingRegexComponent`
- Our remedy is interoperability with real parsers via `CustomPrefixMatchRegexComponent`
- Literals with refactoring actions provide an incremental off-ramp from regex syntax to result builders and real parsers
* "Regular expressions are bad because ugly unmaintainable syntax"
- We propose literals with source tools support, allowing for better syntax highlighting and analysis
Expand Down Expand Up @@ -516,7 +516,7 @@ Regex are compiled into an intermediary representation and fairly simple analysi

### Future work: parser combinators

What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomMatchingRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system.
What we propose here is an incremental step towards better parsing support in Swift using parser-combinator style libraries. The underlying execution engine supports recursive function calls and mechanisms for library extensibility. `CustomPrefixMatchRegexComponent`'s protocol requirement is effectively a [monadic parser](https://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf), meaning `Regex` provides a regex-flavored combinator-like system.

An issues with traditional parser combinator libraries are the compilation barriers between call-site and definition, resulting in excessive and overly-cautious backtracking traffic. These can be eliminated through better [compilation techniques](https://core.ac.uk/download/pdf/148008325.pdf). As mentioned above, Swift's support for custom static compilation is still under development.

Expand Down Expand Up @@ -565,9 +565,9 @@ Regexes are often used for tokenization and tokens can be represented with Swift

### Future work: baked-in localized processing

- `CustomMatchingRegexComponent` gives an entry point for localized processors
- `CustomPrefixMatchRegexComponent` gives an entry point for localized processors
- Future work includes (sub?)protocols to communicate localization intent

-->

[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md
[pitches]: https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md
43 changes: 22 additions & 21 deletions Documentation/Evolution/StringProcessingAlgorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ We propose:

1. New regex-powered algorithms over strings, bringing the standard library up to parity with scripting languages
2. Generic `Collection` equivalents of these algorithms in terms of subsequences
3. `protocol CustomMatchingRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes
3. `protocol CustomPrefixMatchRegexComponent`, which allows 3rd party libraries to provide their industrial-strength parsers as intermixable components of regexes

This proposal is part of a larger [regex-powered string processing initiative](https://forums.swift.org/t/declarative-string-processing-overview/52459). Throughout the document, we will reference the still-in-progress [`RegexProtocol`, `Regex`](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/StronglyTypedCaptures.md), and result builder DSL, but these are in flux and not formally part of this proposal. Further discussion of regex specifics is out of scope of this proposal and better discussed in another thread (see [Pitch and Proposal Status](https://github.com/apple/swift-experimental-string-processing/issues/107) for links to relevant threads).
This proposal is part of a larger [regex-powered string processing initiative](https://github.com/apple/swift-evolution/blob/main/proposals/0350-regex-type-overview.md), the status of each proposal is tracked [here](https://github.com/apple/swift-experimental-string-processing/blob/main/Documentation/Evolution/ProposalOverview.md). Further discussion of regex specifics is out of scope of this proposal and better discussed in their relevant reviews.

## Motivation

Expand Down Expand Up @@ -91,18 +91,18 @@ Note: Only a subset of Python's string processing API are included in this table

### Complex string processing

Even with the API additions, more complex string processing quickly becomes unwieldy. Up-coming support for authoring regexes in Swift help alleviate this somewhat, but string processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required.
Even with the API additions, more complex string processing quickly becomes unwieldy. String processing in the modern world involves dealing with localization, standards-conforming validation, and other concerns for which a dedicated parser is required.

Consider parsing the date field `"Date: Wed, 16 Feb 2022 23:53:19 GMT"` in an HTTP header as a `Date` type. The naive approach is to search for a substring that looks like a date string (`16 Feb 2022`), and attempt to post-process it as a `Date` with a date parser:

```swift
let regex = Regex {
capture {
oneOrMore(.digit)
Capture {
OneOrMore(.digit)
" "
oneOrMore(.word)
OneOrMore(.word)
" "
oneOrMore(.digit)
OneOrMore(.digit)
}
}

Expand All @@ -128,21 +128,21 @@ DEBIT 03/24/2020 IRX tax payment ($52,249.98)
Parsing a currency string such as `$3,020.85` with regex is also tricky, as it can contain localized and currency symbols in addition to accounting conventions. This is why Foundation provides industrial-strength parsers for localized strings.


## Proposed solution
## Proposed solution

### Complex string processing

We propose a `CustomMatchingRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex:
We propose a `CustomPrefixMatchRegexComponent` protocol which allows types from outside the standard library participate in regex builders and `RegexComponent` algorithms. This allows types, such as `Date.ParseStrategy` and `FloatingPointFormatStyle.Currency`, to be used directly within a regex:

```swift
let dateRegex = Regex {
capture(dateParser)
Capture(dateParser)
}

let date: Date = header.firstMatch(of: dateRegex).map(\.result.1)

let currencyRegex = Regex {
capture(.localizedCurrency(code: "USD").sign(strategy: .accounting))
Capture(.localizedCurrency(code: "USD").sign(strategy: .accounting))
}

let amount: [Decimal] = statement.matches(of: currencyRegex).map(\.result.1)
Expand All @@ -167,24 +167,25 @@ We also propose the following regex-powered algorithms as well as their generic
|`matches(of:)`| Returns a collection containing all matches of the specified `RegexComponent` |


## Detailed design
## Detailed design

### `CustomMatchingRegexComponent`
### `CustomPrefixMatchRegexComponent`

`CustomMatchingRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement; Conformers can be used with all of the string algorithms generic over `RegexComponent`.
`CustomPrefixMatchRegexComponent` inherits from `RegexComponent` and satisfies its sole requirement. Conformers can be used with all of the string algorithms generic over `RegexComponent`.

```swift
/// A protocol for custom match functionality.
public protocol CustomMatchingRegexComponent : RegexComponent {
/// Match the input string within the specified bounds, beginning at the given index, and return
/// the end position (upper bound) of the match and the matched instance.
/// A protocol allowing custom types to function as regex components by
/// providing the raw functionality backing `prefixMatch`.
public protocol CustomPrefixMatchRegexComponent: RegexComponent {
/// Process the input string within the specified bounds, beginning at the given index, and return
/// the end position (upper bound) of the match and the produced output.
/// - Parameters:
/// - input: The string in which the match is performed.
/// - index: An index of `input` at which to begin matching.
/// - bounds: The bounds in `input` in which the match is performed.
/// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if
/// there isn't a match.
func match(
func consuming(
_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
Expand All @@ -198,8 +199,8 @@ public protocol CustomMatchingRegexComponent : RegexComponent {
We use Foundation `FloatingPointFormatStyle<Decimal>.Currency` as an example for protocol conformance. It would implement the `match` function with `Match` being a `Decimal`. It could also add a static function `.localizedCurrency(code:)` as a member of `RegexComponent`, so it can be referred as `.localizedCurrency(code:)` in the `Regex` result builder:

```swift
extension FloatingPointFormatStyle<Decimal>.Currency : CustomMatchingRegexComponent {
public func match(
extension FloatingPointFormatStyle<Decimal>.Currency : CustomPrefixMatchRegexComponent {
public func consuming(
_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
Expand Down
39 changes: 39 additions & 0 deletions Sources/_StringProcessing/Regex/CustomComponents.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift.org open source project
//
// Copyright (c) 2021-2022 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
//===----------------------------------------------------------------------===//

@available(SwiftStdlib 5.7, *)
/// A protocol allowing custom types to function as regex components by
/// providing the raw functionality backing `prefixMatch`.
public protocol CustomPrefixMatchRegexComponent: RegexComponent {
/// Process the input string within the specified bounds, beginning at the given index, and return
/// the end position (upper bound) of the match and the produced output.
/// - Parameters:
/// - input: The string in which the match is performed.
/// - index: An index of `input` at which to begin matching.
/// - bounds: The bounds in `input` in which the match is performed.
/// - Returns: The upper bound where the match terminates and a matched instance, or `nil` if
/// there isn't a match.
func consuming(
_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
) throws -> (upperBound: String.Index, output: RegexOutput)?
}

@available(SwiftStdlib 5.7, *)
extension CustomPrefixMatchRegexComponent {
public var regex: Regex<RegexOutput> {
let node: DSLTree.Node = .matcher(RegexOutput.self, { input, index, bounds in
try consuming(input, startingAt: index, in: bounds)
})
return Regex(node: node)
}
}
29 changes: 0 additions & 29 deletions Sources/_StringProcessing/Regex/DSLConsumers.swift

This file was deleted.

12 changes: 6 additions & 6 deletions Tests/RegexBuilderTests/CustomTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ import _StringProcessing
@testable import RegexBuilder

// A nibbler processes a single character from a string
private protocol Nibbler: CustomMatchingRegexComponent {
private protocol Nibbler: CustomPrefixMatchRegexComponent {
func nibble(_: Character) -> RegexOutput?
}

extension Nibbler {
// Default implementation, just feed the character in
func match(
func consuming(
_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
Expand Down Expand Up @@ -49,10 +49,10 @@ private struct Asciibbler: Nibbler {
}
}

private struct IntParser: CustomMatchingRegexComponent {
private struct IntParser: CustomPrefixMatchRegexComponent {
struct ParseError: Error, Hashable {}
typealias RegexOutput = Int
func match(_ input: String,
func consuming(_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
) throws -> (upperBound: String.Index, output: Int)? {
Expand All @@ -71,7 +71,7 @@ private struct IntParser: CustomMatchingRegexComponent {
}
}

private struct CurrencyParser: CustomMatchingRegexComponent {
private struct CurrencyParser: CustomPrefixMatchRegexComponent {
enum Currency: String, Hashable {
case usd = "USD"
case ntd = "NTD"
Expand All @@ -84,7 +84,7 @@ private struct CurrencyParser: CustomMatchingRegexComponent {
}

typealias RegexOutput = Currency
func match(_ input: String,
func consuming(_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
) throws -> (upperBound: String.Index, output: Currency)? {
Expand Down
4 changes: 2 additions & 2 deletions Tests/RegexBuilderTests/RegexDSLTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -855,9 +855,9 @@ class RegexDSLTests: XCTestCase {
var patch: Int
var dev: String?
}
struct SemanticVersionParser: CustomMatchingRegexComponent {
struct SemanticVersionParser: CustomPrefixMatchRegexComponent {
typealias RegexOutput = SemanticVersion
func match(
func consuming(
_ input: String,
startingAt index: String.Index,
in bounds: Range<String.Index>
Expand Down