Skip to content

Optimisation opportunity: literals in the middle of regex #510

Closed
@RReverser

Description

@RReverser

This has been discussed in private before, but decided to make it into an actual issue so that it doesn't get lost in time.

Right now, regexes like .*literal and literal.* get optimised by extracting literal prefixes + forward DFA or suffixes + reverse DFA correspondingly, however regexes like .*literal.* are not and are executed as a full DFA, even though they could be a generic case of literal optimisation where literal is detected, found and then both forward and reverse DFA are executed from its boundaries into different directions.

Example benchmark from original discussion: https://gist.github.com/RReverser/a48c9a7332df9ee7bc7abe5f1f708bb7

And numbers showing the current difference:

test with_dots            ... bench:         109 ns/iter (+/- 8)
test with_dots_and_s_mode ... bench:         109 ns/iter (+/- 9)
test without_back_dots    ... bench:         249 ns/iter (+/- 5)
test without_dots         ... bench:          38 ns/iter (+/- 0)
test without_front_dots   ... bench:          56 ns/iter (+/- 1)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions