Description
Perl and PCRE define a means of defining names for parts of a regex and then reusing to those parts by name much as one will often define functions in Rust and then call them by name.
(?(DEFINE)
(?P<quant>many|some|five)
(?P<adj>blue|large|interesting)
(?P<object>cars|elephants|problems)
(?P<noun_phrase>(?&quant)\ (?&adj)\ (?&object))
(?P<verb>borrow|solve|resemble)
)
(?&noun_phrase)\ (?&verb)\ (?&noun_phrase)
I’m not very familiar with the implementation details, but I believe that most of the changes needed to implement this would be in regex-syntax, plus the part of the compiler that builds the Hir. The compiler would replace each call by the definition and then compile that. The resulting Hir would then be identical to one where the named expressions were written out by hand.
Given the design goals of this crate, I would recommend keeping the language regular by returning an error when a recursive call is detected. This can be done during compilation by simply keeping a stack of calls and returning an error if a call is made to a group that is already in the stack.
However, it would still be desirable to keep the list of definitions around even after the regex is fully compiled. This would allow them to be reused while building new regexes. Suppose one had a regex containing definitions, called a grammar:
let grammar = Regex::new(r"(?x)
(?(DEFINE)
(?P<quant>many|some|five)
(?P<adj>blue|large|interesting)
(?P<object>cars|elephants|problems)
(?P<noun_phrase>(?&quant)\ (?&adj)\ (?&object))
(?P<verb>borrow|solve|resemble)
)").unwrap();
This grammar could be reused to create multiple new regexes by substitution:
let sentence = Regex::new(format!(r"(?x)
(?&noun_phrase)\ (?&verb_phrase)\ (?&noun_phrase)
(?(DEFINE)
(?P<adverb>quickly|throughly|confidently)
(?P<verb_phrase>(?&adverb)\ (?&verb))
)
{grammar}")).unwrap();
Obviously, one could imagine additional APIs beyond simple textual substitution, but that is a separate topic. Similarly, the exact syntax for definitions and calls could be different. PCRE supports three variations originating from Perl5, Python, and Ruby; we could choose any of them or invent our own.