Skip to content

Incorrect match behavior on $ #557

Closed
@davisjam

Description

@davisjam

Rust version:

(15:14:21) jamie@woody $ rustc --version
rustc 1.32.0-nightly (00e03ee57 2018-11-22)

(15:15:24) jamie@woody $ cargo --version
cargo 1.32.0-nightly (b3d0b2e54 2018-11-15)

This might be a duplicate of some other already-open bugs, but it's hard to guess at root causes from symptoms.

Consider the regex /(aa$)?/ and the input "aaz", using a partial match.

The docs say that $ should match $ the end of text (or end-of-line with multi-line mode).

The final ? means that the regex engine can partial-match any string --- either strings that end in "aa" (capturing "aa") or any string (capturing nothing).

Since the input "aaz" does not end in "aa":

  • I expect this regex to match and capture nothing.
  • The regex actually matches and captures "aa".

Here's the kernel of my test program:

  match Regex::new(&query.pattern) {
    Ok(re) => {
      queryResult.validPattern = true;

      for i in 0..query.inputs.len() {
        let input = query.inputs.get(i).unwrap();
        eprintln!("Input: {}", input);

        let mut matched = false;
        let mut matchedString = "".to_string();
        let mut captureGroups: Vec<String> = Vec::new();

        // Partial-match semantics
        match re.captures(&input) {
          Some(caps) => {
            matched = true;

            matchedString = caps.get(0).unwrap().as_str().to_string();
            captureGroups = Vec::new();
            for i in 1..caps.len() {
              match caps.get(i) {
                Some(m) => {
                  captureGroups.push(m.as_str().to_string());
                },
                None => {
                  captureGroups.push("".to_string()); // Interpret unused capture group as ""
                }
              }
            }
          },
          None => {
            matched = false;
          }
        }

        let mr: MatchResult = MatchResult{
          input: input.to_string(),
          matched: matched,
          matchContents: MatchContents{
            matchedString: matchedString,
            captureGroups: captureGroups,
          },
        };

        queryResult.results.push(mr);
      }
    },
    Err(error) => {
      // Could not build.
      queryResult.validPattern = false;
    }
  };

This is the behavior on the regex and input described above:

{"pattern": "(aa$)?", "inputs": ["aaz"]}

The pattern is: (aa$)?
Input: aaz
{
  "pattern": "(aa$)?",
  "inputs": [
    "aaz"
  ],
  "validPattern": true,
  "results": [
    {
      "input": "aaz",
      "matched": true,
      "matchContents": {
        "matchedString": "aa",
        "captureGroups": [
          "aa"
        ]
      }
    }
  ]
}

In this case, Rust is unique among the 8 languages I tried. Perl, PHP, Java, Ruby, Go, JavaScript (Node-V8), and Python all match with an empty/null capture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions