Closed
Description
I am using rustc version 1.14.0 and regex version 0.2.1. I found a pattern that panics a non-Unicode bytes::Regex
:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:323
This program demonstrates the panic:
extern crate regex;
fn main() {
let _ = regex::bytes::Regex::new(r"(?-u).[.]\S\w\x00\x02\x03\x05\x06\x08\x0c\x0e\x10\x12\x14\x16\x17\x19\x1a\x1c\x1e\x21\x23\x25\x27\x28\x2a\x2c\x31\x33\x35\x36\x38\x3b\x3d\x3e\x40\x42\x44\x46\x48\x4a\x4c\x4d\x4f\x50\x52\x53\x55\x56\x58\x59\x5b\x5d\x61\x63\x65\x67\x69\x6a\x6c\x6e\x6f\x70\x72\x74\x76\x77\x79\x7c\x7e\x81\x82\x84\x86\x88\x89\x8b\x8c\x8e\x91\x93\x95\x97\x99\x9b\x9d\x9e\xa1\xa3\xa5\xa6\xa8\xaa\xac\xad\xaf\xb0\xb2\xb4\xb5\xb7\xb9\xbb\xbd\xbe\xc0\xc3\xc5\xc7\xc9\xcb\xcc\xce\xcf\xd1\xd3\xd4\xd6\xd7\xd9\xdb\xdc\xde\xe2\xe3\xe5\xe6\xe8\xe9\xeb\xee\xf1\xf3\xf5\xf7\xf9\xfb\xfc\xfe");
}
The pattern consists of 136 unique literal byte values, plus the sub-patterns .
, [.]
, \S
, and \w
. Here is what I have been able to find out:
- The pattern is close to minimal. Removing any of the 136 literal bytes, or any of the four sub-patterns, avoids the panic. You can add to the pattern anywhere and it still panics.
- Order doesn't matter. I constructed the pattern by starting with a much larger
bytes::RegexSetBuilder
that panicked, removing as much as I could while still having it panic, and sorting. - The panic isn't related to backslash escapes. I.e., you can replace
\x44
with a literalD
and it still panics. - Some modifications avoid the panic and some do not. For example, changing
\x00
to\x01
still panics, but changing\xde
to\xdf
does not. Changing\w
to\b
still panics, but changing\w
to\d
does not. - Non-Unicode is necessary. If you omit the
(?-u)
(orunicode(false)
when using aRegexBuilder
orRegexSetBuilder
), then it does not panic.
The same thing happens with bytes::RegexBuilder
, bytes::RegexSet
, and bytes::RegexSetBuilder
. The 140 necessary elements can be distributed across multiple patterns when using a builder. Here is an example of a bytes::RegexSetBuilder
that panics:
extern crate regex;
fn main() {
let mut rsb = regex::bytes::RegexSetBuilder::new([
r"\xa3\xd7\x40\x95\x59\xd4\x2a\x86\x93\xaf",
r"\x16\xa1\x14\x19\x00\x2c\x27\xcc\x10\xcb\xee\xf5\xeb\xfb\xb5\xd9\x46\x25\x23\x38\x36\x35\x56\x31\x4a\x44\x4c\x99\xc7\x9d\x3d\w\xc0\x9b\x3b\x12\xdb\x89",
r"\x84",
r"\xcf\x8c",
r"\x7c\xbd\x97\xfc\x3e\x6c\x79\x7e\xc3\x9e\x5b\x42\xf3\x17\x06\x08\xc5\xac\x05\x53\xe9",
r"\xdc\xc9\x8b\x1a\x02\x1c\x76\x6a\xd3\xb4\x91\x0c\x1e\x03\x70\x77\x55\x52\x0e",
r"\xde\xb2\xad.\x8e\x88\xd6\x81\xf9\xb7\xfe\xce\xf7\xb0\xe6",
r"\xd1\x4d\x72\x6f\x74\x63\x61\x6e\x67\x65\x50\x4f\x33\xe2\xe5\xf1\xe8[.]\xe3",
r"\xa6\xbe\xb9\xaa\xbb\x28\x69\xa5\x48\xa8",
r"\S\x5d\x21\x58\x82",
].iter());
rsb.unicode(false);
match rsb.build() {
Ok(_) => println!("ok"),
Err(e) => println!("error {}", e),
};
}
Here are all 140 elements of the pattern in order:
.
[.]
\S
\w
\x00
\x02
\x03
\x05
\x06
\x08
\x0c
\x0e
\x10
\x12
\x14
\x16
\x17
\x19
\x1a
\x1c
\x1e
\x21
\x23
\x25
\x27
\x28
\x2a
\x2c
\x31
\x33
\x35
\x36
\x38
\x3b
\x3d
\x3e
\x40
\x42
\x44
\x46
\x48
\x4a
\x4c
\x4d
\x4f
\x50
\x52
\x53
\x55
\x56
\x58
\x59
\x5b
\x5d
\x61
\x63
\x65
\x67
\x69
\x6a
\x6c
\x6e
\x6f
\x70
\x72
\x74
\x76
\x77
\x79
\x7c
\x7e
\x81
\x82
\x84
\x86
\x88
\x89
\x8b
\x8c
\x8e
\x91
\x93
\x95
\x97
\x99
\x9b
\x9d
\x9e
\xa1
\xa3
\xa5
\xa6
\xa8
\xaa
\xac
\xad
\xaf
\xb0
\xb2
\xb4
\xb5
\xb7
\xb9
\xbb
\xbd
\xbe
\xc0
\xc3
\xc5
\xc7
\xc9
\xcb
\xcc
\xce
\xcf
\xd1
\xd3
\xd4
\xd6
\xd7
\xd9
\xdb
\xdc
\xde
\xe2
\xe3
\xe5
\xe6
\xe8
\xe9
\xeb
\xee
\xf1
\xf3
\xf5
\xf7
\xf9
\xfb
\xfc
\xfe