Skip to content

DFA misses an opportunity to short circuit with an anchored RegexSet #186

Closed
@dprien

Description

@dprien

While trying to implement a lexer, I noticed that RegexSet::matches exhibits very bad performance when used on long strings.
The following test case runs in 1 minute, 16 seconds (cargo build --release):

extern crate regex;
fn main() {
    let mut test_input = String::from("42 ");
    for _ in 0 .. 100_000 { test_input.push_str("XXXXXXXXXXXXXXXXXXXX"); } // Delete me

    let set = regex::RegexSet::new(&[r"^[0-9]+", r"^THIS_SHOULD_NEVER_MATCH"]).unwrap();
    for _ in 0 .. 10_000 { println!("{:?}", set.matches(&test_input)); }
}

Without the line marked with // Delete me, the program runs in about 50 ms.

If I understand correctly, matches should try to match only at the very beginning of the string, thanks to the ^ anchor. However, it would seem that matches traverses the whole string on each call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions