Skip to content

incorrect case for word boundaries #579

Closed
@BurntSushi

Description

@BurntSushi

Here's a reproduction:

use regex::bytes::Regex;

fn main() {
    let hay = "I have 12, he has 2!";
    let re = Regex::new(r"\b..\b").unwrap();
    for m in re.find_iter(hay.as_bytes()) {
        println!("{:?}", String::from_utf8_lossy(m.as_bytes()));
    }
}

Actual output:

"I "
"12"

Expected output:

"I "
"12"
", "
"he"
" 2"

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=55914c890dfb6a68fc72b9c6fd986298

The same bug is present even if we use ASCII word boundaries: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eef23f309c9f608eb683aac982648301

Here's a smaller reproduction:

use regex::bytes::Regex;

fn main() {
    let hay = "az,,b";
    let re = Regex::new(r"\b..\b").unwrap();
    for m in re.find_iter(hay.as_bytes()) {
        println!("{:?}", String::from_utf8_lossy(m.as_bytes()));
    }
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c7507e4d095141004909f9deb1c6cdd7

Originally reported against ripgrep: BurntSushi/ripgrep#1275

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions