Description
I'm working on a syntax highlighting engine in Rust that requires an Oniguruma-compatible regex engine. I'm trying to port it from the onig
crate to fancy-regex, but there's some features it doesn't support yet (see trishume/syntect#34).
One of these features is the &&
operator and nesting in character sets, for example [a-w&&[^c-g]z]
. I was thinking this would be added to fancy-regex but @robinst pointed out this comment which suggests that you plan for them to be in the regex
crate.
It would be nice if the regex
crate supported UTS#18 RL1.3 in full, but the &&
operator and nesting are all that Oniguruma-compatibility of fancy-regex requires.
I imagine this would take some changes to regex-syntax
and then a pass to convert the fancy character sets down to basic character sets. I haven't thought enough about it to know if there are any unicode-related issues that might make this more complex, perhaps by making a tiny fancy character set compile to an enormous basic character set.
@BurntSushi do you have any insight on how difficult you think this would be to add for a contributor not familiar with the internals of regex
?