Skip to content

add multi-regex matching #156

Closed
Closed
@BurntSushi

Description

@BurntSushi

Lately I've been thinking a lot about providing a "multi regex" similar to RE2's "regex set" functionality. The problem they solve is, "I have multiple regexes that I want to run over some large search text once, and I want to see every match." The poor man's way of doing this is to combine them in a single regex of alternations, e.g., re1|re2|re3|.... Two problems with that though:

  1. The current search machinery reports non-overlapping matches. That is, it's impossible for one alternation in a regex to share a match with another alternation in the same regex.
  2. To check which expressions matched, one adds capture groups and then inspect them after a match. Requiring capture groups for this functionality is bad because it incurs a performance penalty and simply isn't needed.

We can start relatively simple by providing an API that answers these three questions:

  1. Do any of the given regexes match anywhere? (analogous to is_match)
  2. If so, which of those regexes match? Where do they match? (analogous to find)
  3. Can you show me all matches? (analogous to find_iter)

Adding capture groups to this API seems possible, but is tricky, so I suggest doing that after an initial implementation is done.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions