Closed
Description
Rust version:
(15:14:21) jamie@woody $ rustc --version
rustc 1.32.0-nightly (00e03ee57 2018-11-22)
(15:15:24) jamie@woody $ cargo --version
cargo 1.32.0-nightly (b3d0b2e54 2018-11-15)
This might be a duplicate of some other already-open bugs, but it's hard to guess at root causes from symptoms.
Consider the regex /(aa$)?/
and the input "aaz", using a partial match.
The docs say that $
should match $ the end of text (or end-of-line with multi-line mode)
.
The final ?
means that the regex engine can partial-match any string --- either strings that end in "aa" (capturing "aa") or any string (capturing nothing).
Since the input "aaz" does not end in "aa":
- I expect this regex to match and capture nothing.
- The regex actually matches and captures "aa".
Here's the kernel of my test program:
match Regex::new(&query.pattern) {
Ok(re) => {
queryResult.validPattern = true;
for i in 0..query.inputs.len() {
let input = query.inputs.get(i).unwrap();
eprintln!("Input: {}", input);
let mut matched = false;
let mut matchedString = "".to_string();
let mut captureGroups: Vec<String> = Vec::new();
// Partial-match semantics
match re.captures(&input) {
Some(caps) => {
matched = true;
matchedString = caps.get(0).unwrap().as_str().to_string();
captureGroups = Vec::new();
for i in 1..caps.len() {
match caps.get(i) {
Some(m) => {
captureGroups.push(m.as_str().to_string());
},
None => {
captureGroups.push("".to_string()); // Interpret unused capture group as ""
}
}
}
},
None => {
matched = false;
}
}
let mr: MatchResult = MatchResult{
input: input.to_string(),
matched: matched,
matchContents: MatchContents{
matchedString: matchedString,
captureGroups: captureGroups,
},
};
queryResult.results.push(mr);
}
},
Err(error) => {
// Could not build.
queryResult.validPattern = false;
}
};
This is the behavior on the regex and input described above:
{"pattern": "(aa$)?", "inputs": ["aaz"]}
The pattern is: (aa$)?
Input: aaz
{
"pattern": "(aa$)?",
"inputs": [
"aaz"
],
"validPattern": true,
"results": [
{
"input": "aaz",
"matched": true,
"matchContents": {
"matchedString": "aa",
"captureGroups": [
"aa"
]
}
}
]
}
In this case, Rust is unique among the 8 languages I tried. Perl, PHP, Java, Ruby, Go, JavaScript (Node-V8), and Python all match with an empty/null capture.