Closed
Description
I'm looking at replacing oniguruma with regex in some situations for the Ruby that I'm building.
I am benchmarking the following three Regexp
s over this several megabyte text corpus:
bench('Email', '[\w\.+-]+@[\w\.-]+\.[\w\.-]+')
bench('URI', 'https?://(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)')
bench('IP', '\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b')
For Email
, regex is 10x faster than oniguruma. For URI
, regex is 2x slower than oniguruma. For IP
, regex is 20x slower than oniguruma.
regex performance
Email: 92 matches
..................................................
compile: 85.52ms elapsed in 50 iterations (avg. 1.71ms / iteration)
scan: 1569.22ms elapsed in 50 iterations (avg. 31.38ms / iteration)
scan with block: 1025.49ms elapsed in 50 iterations (avg. 20.5ms / iteration)
URI: 5388 matches
..................................................
compile: 71.84ms elapsed in 50 iterations (avg. 1.43ms / iteration)
scan: 2336.46ms elapsed in 50 iterations (avg. 46.72ms / iteration)
scan with block: 2045.79ms elapsed in 50 iterations (avg. 40.91ms / iteration)
IP: 6 matches
..................................................
compile: 10.79ms elapsed in 50 iterations (avg. 0.21ms / iteration)
scan: 25693.73ms elapsed in 50 iterations (avg. 513.87ms / iteration)
scan with block: 25642.21ms elapsed in 50 iterations (avg. 512.84ms / iteration)
oniguruma performance (via rust-onig)
Email: 92 matches
..................................................
compile: 5.89ms elapsed in 50 iterations (avg. 0.11ms / iteration)
scan: 16335.45ms elapsed in 50 iterations (avg. 326.7ms / iteration)
scan with block: 16228.96ms elapsed in 50 iterations (avg. 324.57ms / iteration)
URI: 5388 matches
..................................................
compile: 1.68ms elapsed in 50 iterations (avg. 0.03ms / iteration)
scan: 1366.95ms elapsed in 50 iterations (avg. 27.33ms / iteration)
scan with block: 1349.82ms elapsed in 50 iterations (avg. 26.99ms / iteration)
IP: 6 matches
..................................................
compile: 3.79ms elapsed in 50 iterations (avg. 0.07000000000000001ms / iteration)
scan: 1465.14ms elapsed in 50 iterations (avg. 29.3ms / iteration)
scan with block: 1431.35ms elapsed in 50 iterations (avg. 28.62ms / iteration)
If you're interested in doing so, you can invoke this benchmark in Artichoke with:
cargo run --release --bin string_scan_bench -- artichoke-frontend/ruby/benches/string_scan.rb
The benchmark on master (with oniguruma) is different than the benchmark on this branch because I've tweaked the Regexp
s to remove lookahead patterns.
Metadata
Metadata
Assignees
Labels
No labels