Skip to content

Implement (at least part of) UTS#18 RL1.3 - Operators in character sets #341

Closed
@trishume

Description

@trishume

I'm working on a syntax highlighting engine in Rust that requires an Oniguruma-compatible regex engine. I'm trying to port it from the onig crate to fancy-regex, but there's some features it doesn't support yet (see trishume/syntect#34).

One of these features is the && operator and nesting in character sets, for example [a-w&&[^c-g]z]. I was thinking this would be added to fancy-regex but @robinst pointed out this comment which suggests that you plan for them to be in the regex crate.

It would be nice if the regex crate supported UTS#18 RL1.3 in full, but the && operator and nesting are all that Oniguruma-compatibility of fancy-regex requires.

I imagine this would take some changes to regex-syntax and then a pass to convert the fancy character sets down to basic character sets. I haven't thought enough about it to know if there are any unicode-related issues that might make this more complex, perhaps by making a tiny fancy character set compile to an enormous basic character set.

@BurntSushi do you have any insight on how difficult you think this would be to add for a contributor not familiar with the internals of regex?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions