Skip to content

RFC: flexible syntax for macro invocations #2387

Closed
@paulstansifer

Description

@paulstansifer

Currently, macro invocations piggyback off an existing syntactic form, the array literal. We'd like more flexibility.

Macro invocation syntax

The proposed invocation syntax will extend the grammar roughly as follows (the exact syntax for identifying an invocation will be decided later. However, they will need to be distinguished from function invocations at parse-time.):

Expr ::= ... | Identifier "{" Balanced* "}"
Balanced ::= "(" Balanced* ")" | "[" Balanced* "]" | "{" Balanced* "}" | AnyOtherToken

Parsing macro invocations

The tricky part is having macros consume Balanceds in a useful way. An example invocation to an example macro:

my_let {
    x := 4*7;
    y := str::len("(-:") + 18;
    x + (y*x)
}

Here's how we'd like to define my_let (rep() is like Macro by Example's ...):

pat_macro {
    my_let { /*BNF-like notation here*/ 
         rep(var=Identifier ":=" val=Expr ";") body=Expr
    }
    => /* transcribe this, with interpolation of `var`, `body`, and `val`*/
    { |rep(var)| body } (rep(val))
    /* like ((lambda (var ...) body) val ...) */
}

Proposed implementation

I believe that this can be implemented in a minimally-invasive way. pat_macro will be a syntax extension which takes a BNF-like notation for the invocation parser on the inside of the <macro_name>{}, and a Balanced on the right side of the =>. (The only reason not to parse it as an Expression is that rustc has no data structure for incomplete ASTs.) (It would be friendly to also parse it as an expression, using dummy values for interpolated syntax, to check that it will parse correctly.)

At macro expansion time, the Balanced will need to be parsed (well, re-parsed) according the the grammar of the macro. We can do this by building a lexer that takes a Balanced instead of a string as input. The parser will interpret the macro's BNF-like pattern, delegating to the Rust parser for things like Identifier and Expr.

There are two ways for the shim lexer to deal with interpolated syntax. The bad one is to pretty-print the interpolated ASTs and re-lex them before sending them to the parser again. The better one is to use special tokens to hand the parser pre-parsed ASTs for it to return immediately.

Possible extensions

Syntax for lexer-skipping syntax extensions

If we remove # from ordinary macro invocation syntax, we can use it to provide quotation for un-lexed syntax. Delimiters would work in a Perl-like fashion:

#regex(\w+\s*) //parens inside must match
#regex|\\w+\\s*| //backslashes escape delimiter

String-examining/lexer-skipping macros

pat_macro {
    fmt { format=StringContents("%" spec=Letter | percent="%%" | literal=NegativeCharClass("%")) "," rep(arg=Expr, ",") }
    => /* ??? */
}
fmt{"Look at this number: %u", 18u}

Making macros look inside strings should be fairly simple, but most practical applications will probably require lots more power from the macro system.

Invocations at non-expression position

What if we want macros to generate non-expressions (especially items, types)? It seems like we need a separate invocation form for every nonterminal we want to extend. Fortunately, expressions cover a lot of the interesting territory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-grammarArea: The grammar of RustA-syntaxextArea: Syntax extensionsC-enhancementCategory: An issue proposing an enhancement or a PR with one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions