Skip to content

Pattern panics non-Unicode bytes::Regex #367

Closed
@l4AgXc

Description

@l4AgXc

I am using rustc version 1.14.0 and regex version 0.2.1. I found a pattern that panics a non-Unicode bytes::Regex:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:323

This program demonstrates the panic:

extern crate regex;
fn main() {
    let _ = regex::bytes::Regex::new(r"(?-u).[.]\S\w\x00\x02\x03\x05\x06\x08\x0c\x0e\x10\x12\x14\x16\x17\x19\x1a\x1c\x1e\x21\x23\x25\x27\x28\x2a\x2c\x31\x33\x35\x36\x38\x3b\x3d\x3e\x40\x42\x44\x46\x48\x4a\x4c\x4d\x4f\x50\x52\x53\x55\x56\x58\x59\x5b\x5d\x61\x63\x65\x67\x69\x6a\x6c\x6e\x6f\x70\x72\x74\x76\x77\x79\x7c\x7e\x81\x82\x84\x86\x88\x89\x8b\x8c\x8e\x91\x93\x95\x97\x99\x9b\x9d\x9e\xa1\xa3\xa5\xa6\xa8\xaa\xac\xad\xaf\xb0\xb2\xb4\xb5\xb7\xb9\xbb\xbd\xbe\xc0\xc3\xc5\xc7\xc9\xcb\xcc\xce\xcf\xd1\xd3\xd4\xd6\xd7\xd9\xdb\xdc\xde\xe2\xe3\xe5\xe6\xe8\xe9\xeb\xee\xf1\xf3\xf5\xf7\xf9\xfb\xfc\xfe");
}

The pattern consists of 136 unique literal byte values, plus the sub-patterns ., [.], \S, and \w. Here is what I have been able to find out:

  • The pattern is close to minimal. Removing any of the 136 literal bytes, or any of the four sub-patterns, avoids the panic. You can add to the pattern anywhere and it still panics.
  • Order doesn't matter. I constructed the pattern by starting with a much larger bytes::RegexSetBuilder that panicked, removing as much as I could while still having it panic, and sorting.
  • The panic isn't related to backslash escapes. I.e., you can replace \x44 with a literal D and it still panics.
  • Some modifications avoid the panic and some do not. For example, changing \x00 to \x01 still panics, but changing \xde to \xdf does not. Changing \w to \b still panics, but changing \w to \d does not.
  • Non-Unicode is necessary. If you omit the (?-u) (or unicode(false) when using a RegexBuilder or RegexSetBuilder), then it does not panic.

The same thing happens with bytes::RegexBuilder, bytes::RegexSet, and bytes::RegexSetBuilder. The 140 necessary elements can be distributed across multiple patterns when using a builder. Here is an example of a bytes::RegexSetBuilder that panics:

extern crate regex;
fn main() {
    let mut rsb = regex::bytes::RegexSetBuilder::new([
        r"\xa3\xd7\x40\x95\x59\xd4\x2a\x86\x93\xaf",
        r"\x16\xa1\x14\x19\x00\x2c\x27\xcc\x10\xcb\xee\xf5\xeb\xfb\xb5\xd9\x46\x25\x23\x38\x36\x35\x56\x31\x4a\x44\x4c\x99\xc7\x9d\x3d\w\xc0\x9b\x3b\x12\xdb\x89",
        r"\x84",
        r"\xcf\x8c",
        r"\x7c\xbd\x97\xfc\x3e\x6c\x79\x7e\xc3\x9e\x5b\x42\xf3\x17\x06\x08\xc5\xac\x05\x53\xe9",
        r"\xdc\xc9\x8b\x1a\x02\x1c\x76\x6a\xd3\xb4\x91\x0c\x1e\x03\x70\x77\x55\x52\x0e",
        r"\xde\xb2\xad.\x8e\x88\xd6\x81\xf9\xb7\xfe\xce\xf7\xb0\xe6",
        r"\xd1\x4d\x72\x6f\x74\x63\x61\x6e\x67\x65\x50\x4f\x33\xe2\xe5\xf1\xe8[.]\xe3",
        r"\xa6\xbe\xb9\xaa\xbb\x28\x69\xa5\x48\xa8",
        r"\S\x5d\x21\x58\x82",
    ].iter());
    rsb.unicode(false);
    match rsb.build() {
        Ok(_) => println!("ok"),
        Err(e) => println!("error {}", e),
    };
}

Here are all 140 elements of the pattern in order:

.
[.]
\S
\w
\x00
\x02
\x03
\x05
\x06
\x08
\x0c
\x0e
\x10
\x12
\x14
\x16
\x17
\x19
\x1a
\x1c
\x1e
\x21
\x23
\x25
\x27
\x28
\x2a
\x2c
\x31
\x33
\x35
\x36
\x38
\x3b
\x3d
\x3e
\x40
\x42
\x44
\x46
\x48
\x4a
\x4c
\x4d
\x4f
\x50
\x52
\x53
\x55
\x56
\x58
\x59
\x5b
\x5d
\x61
\x63
\x65
\x67
\x69
\x6a
\x6c
\x6e
\x6f
\x70
\x72
\x74
\x76
\x77
\x79
\x7c
\x7e
\x81
\x82
\x84
\x86
\x88
\x89
\x8b
\x8c
\x8e
\x91
\x93
\x95
\x97
\x99
\x9b
\x9d
\x9e
\xa1
\xa3
\xa5
\xa6
\xa8
\xaa
\xac
\xad
\xaf
\xb0
\xb2
\xb4
\xb5
\xb7
\xb9
\xbb
\xbd
\xbe
\xc0
\xc3
\xc5
\xc7
\xc9
\xcb
\xcc
\xce
\xcf
\xd1
\xd3
\xd4
\xd6
\xd7
\xd9
\xdb
\xdc
\xde
\xe2
\xe3
\xe5
\xe6
\xe8
\xe9
\xeb
\xee
\xf1
\xf3
\xf5
\xf7
\xf9
\xfb
\xfc
\xfe

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions