-
Notifications
You must be signed in to change notification settings - Fork 49
Add additional Unicode API to RegexBuilder.CharacterClass #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds Unicode property APIs to CharacterClass to bring that type more in line with what's supported via `/\p{Property=Value}/`.
This adds notes about the corresponding regex syntax to all applicable CharacterClass symbols.
@swift-ci Please test |
Note: Add single API for advancing in an input's character/scalar view depending on semantic level. |
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall in favor, a little uncanny if we don't have an equivalent to regex literal character classes for things like .
.
/// ``CharacterClass.anyNonNewline``. | ||
/// | ||
/// This character class is equivalent to the regex syntax "dot" | ||
/// metacharacter in single-line mode: `(?s:.)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's not what this is. This is .
.
/// This character class is equivalent to the regex syntax "dot" | ||
/// metacharacter with single-line mode disabled: `(?-s:.)`. | ||
public static var anyNonNewline: CharacterClass { | ||
.init(DSLTree.CustomCharacterClass(members: [.atom(.any)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.any
? Aren't these two things the same?
/// A character class that matches any single `Character`, or extended | ||
/// grapheme cluster, regardless of the current semantic level. | ||
/// | ||
/// This character class is equivalent to `\X` in regex syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including newlines right? Is this the real "any" above?
|
||
/// A character class that matches any digit. | ||
/// | ||
/// This character class is equivalent to `\d` in regex syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it vary based on options? How is this different than any in that regard?
@@ -72,27 +98,58 @@ extension RegexComponent where Self == CharacterClass { | |||
])) | |||
} | |||
|
|||
public static var horizontalWhitespace: CharacterClass { | |||
.init(unconverted: .horizontalWhitespace) | |||
/// A character class that matches any element that is a "word character". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, is there any better description than "word character"? "Word character" can be mentioned as an aside but that's more of a historical note. @Azoy does Unicode have another name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class of <word_character> includes all the Alphabetic values from the Unicode character database ...
https://unicode.org/reports/tr18/#RL1.4
astCharacterProperty(.generalCategory(category.extendedGeneralCategory!)) | ||
} | ||
|
||
public static func binaryProperty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this code in a different file?
/// - Returns: The modified regular expression. | ||
public func asciiOnlyWhitespace(_ useASCII: Bool = true) -> Regex<RegexOutput> { | ||
wrapInOption(.asciiOnlySpace, addingIf: useASCII) | ||
public func asciiOnlyClasses(_ kinds: RegexCharacterClassKind = .all) -> Regex<RegexOutput> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do to .any
, properties, etc?
@available(SwiftStdlib 5.7, *) | ||
public struct RegexCharacterClassKind: OptionSet, Hashable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider or debate whether RegexBuilder.CharacterClass
should be Swift.RegexCharacterClass
? Then this would be a Kind
under it.
Closing this stale PR |
This includes revisions to the options API and additional
CharacterClass
type to bring it into alignment with the functionality that we're offering through regex literals. For example,/\p{NumericValue=1}/
can be written in RegexBuilder syntax asCharacterClass.numericValue(1)
.