Closed
Description
Here's a reproduction:
use regex::bytes::Regex;
fn main() {
let hay = "I have 12, he has 2!";
let re = Regex::new(r"\b..\b").unwrap();
for m in re.find_iter(hay.as_bytes()) {
println!("{:?}", String::from_utf8_lossy(m.as_bytes()));
}
}
Actual output:
"I "
"12"
Expected output:
"I "
"12"
", "
"he"
" 2"
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=55914c890dfb6a68fc72b9c6fd986298
The same bug is present even if we use ASCII word boundaries: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eef23f309c9f608eb683aac982648301
Here's a smaller reproduction:
use regex::bytes::Regex;
fn main() {
let hay = "az,,b";
let re = Regex::new(r"\b..\b").unwrap();
for m in re.find_iter(hay.as_bytes()) {
println!("{:?}", String::from_utf8_lossy(m.as_bytes()));
}
}
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c7507e4d095141004909f9deb1c6cdd7
Originally reported against ripgrep: BurntSushi/ripgrep#1275