Skip to content

Parse error recovery is obversable by macros in several cases #103534

Open
@fmease

Description

@fmease

Introduction

In multiple cases, macro fragment specifiers like expr and stmt match more token streams than they should as a consequence of the parser trying to recover from obviously invalid Rust code to provide better immediate and subsequent error messages to the user.

Why Is This a Concern?

The user should be allowed to assume that a fragment specifier only matches valid Rust code, anything else would make the fragment specifier not live up to its name and as a result render it useless (to exaggerate).

One use of macros is the ability to define embedded / internal domain-specific languages (DSLs). Part of that is defining new syntax which might not necessarily be valid Rust syntax. Declarative macros allow users to create several macro rules / matchers enabling relatively fine-grained matching on tokens. Obviously, when writing those rules, macro authors need to know what a given fragment specifier accepts in order to confidently determine which specific rule applies for a given input. If the grammar used by a fragment specifier is actually larger than promised and basically unknown (implementation-defined to be precise), this becomes an impossible task.

Not only that. If we don't do anything, the grammars matched by fragment specifiers will keep changing over time as more and more recovery code gets added. This breaks Rust's backward compatibility guarantees! Macro calls that used to compile at some fixed point in time might potentially no longer compile in a future version of the compiler. In fact, backward compatibility has already been broken multiple times in the past without notice by (some) PRs introducing more error recovery.

Examples

There might be many more cases than listed below but it takes a lot of time experimenting and looking through the parser. I'll try to extend the list over time.

Expressions

macro_rules! check {
    ($e:expr) => {};

    // or `===`, `!==`, `<>`, `<=>`, `or` instead of `and`
    ($a:literal and $b:literal) => {};
    ($a:literal AND $b:literal) => {};

    (not $a:literal) => {};
    (NOT $a:literal) => {};
}

check! { 0 AND 1 } // passes (all good)
check! { 0 and 1 } // ⚠️ FAILS but should pass! "`and` is not a logical operator"

check! { NOT 1 } // passes (all good)
check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"
stderr
error: `and` is not a logical operator
  --> src/lib.rs:13:12
   |
13 | check! { 0 and 1 } // ⚠️ FAILS but should pass! "invalid comparison operator"
   |            ^^^ help: use `&&` to perform logical conjunction
   |
   = note: unlike in e.g., Python and PHP, `&&` and `||` are used for logical operators

error: unexpected `1` after identifier
  --> src/lib.rs:16:14
   |
16 | check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"
   |          ----^
   |          |
   |          help: use `!` to perform bitwise not

Statements

macro_rules! check {
    ($s:stmt) => {};

    // or `auto` instead of `var`
    (var $i:ident) => {};
    (VAR $i:ident) => {};
}

check! { VAR x } // passes (all good)
check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
stderr
error: invalid variable declaration
  --> src/lib.rs:10:10
   |
10 | check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ^^^
   |
help: write `let` instead of `var` to introduce a new variable
   |
10 | check! { let x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ~~~

Other Fragments (e.g. Items, Types)

[no known cases at the time of this writing (active search ongoing)]

Editorial Notes

editorial notes

I used to list some more cases above (which some of you might have noticed) but I've since removed them as they've turned out to be incorrect. Here they are:

macro_rules! check {
    ($e:expr) => {};

    ($a:literal++) => {};
    ($a:literal@@) => {};

    ($n:ident : $e:expr) => {};
}

check! { 1@@ } // passes (all good)
check! { 1++ } // FAILS but would fail ANYWAY! "Rust has no postfix increment operator"
// ^ without the recovery code, we would try to parse a binary expression (operator `+`) and fail at the 2nd `+`

check! { struct : loop {} } // passes (all good)
check! { main : loop {} } // FAILS but would fail ANYWAY! "malformed loop label"
// ^ without the recovery code, we would try to parse type ascription and fail at `loop {}` which isn't a type

Related issue: #90256 (concerning procedural macros).

@rustbot label A-macros A-diagnostics A-parser T-compiler

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-diagnosticsArea: Messages for errors, warnings, and lintsA-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)A-parserArea: The lexing & parsing of Rust source code to an ASTC-bugCategory: This is a bug.C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.metabugIssues about issues themselves ("bugs about bugs")

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions