Description
Currently, macro invocations piggyback off an existing syntactic form, the array literal. We'd like more flexibility.
Macro invocation syntax
The proposed invocation syntax will extend the grammar roughly as follows (the exact syntax for identifying an invocation will be decided later. However, they will need to be distinguished from function invocations at parse-time.):
Expr ::= ... | Identifier "{" Balanced* "}"
Balanced ::= "(" Balanced* ")" | "[" Balanced* "]" | "{" Balanced* "}" | AnyOtherToken
Parsing macro invocations
The tricky part is having macros consume Balanced
s in a useful way. An example invocation to an example macro:
my_let {
x := 4*7;
y := str::len("(-:") + 18;
x + (y*x)
}
Here's how we'd like to define my_let
(rep()
is like Macro by Example's ...
):
pat_macro {
my_let { /*BNF-like notation here*/
rep(var=Identifier ":=" val=Expr ";") body=Expr
}
=> /* transcribe this, with interpolation of `var`, `body`, and `val`*/
{ |rep(var)| body } (rep(val))
/* like ((lambda (var ...) body) val ...) */
}
Proposed implementation
I believe that this can be implemented in a minimally-invasive way. pat_macro
will be a syntax extension which takes a BNF-like notation for the invocation parser on the inside of the <macro_name>{}
, and a Balanced
on the right side of the =>
. (The only reason not to parse it as an Expression is that rustc has no data structure for incomplete ASTs.) (It would be friendly to also parse it as an expression, using dummy values for interpolated syntax, to check that it will parse correctly.)
At macro expansion time, the Balanced
will need to be parsed (well, re-parsed) according the the grammar of the macro. We can do this by building a lexer that takes a Balanced
instead of a string as input. The parser will interpret the macro's BNF-like pattern, delegating to the Rust parser for things like Identifier
and Expr
.
There are two ways for the shim lexer to deal with interpolated syntax. The bad one is to pretty-print the interpolated ASTs and re-lex them before sending them to the parser again. The better one is to use special tokens to hand the parser pre-parsed ASTs for it to return immediately.
Possible extensions
Syntax for lexer-skipping syntax extensions
If we remove #
from ordinary macro invocation syntax, we can use it to provide quotation for un-lexed syntax. Delimiters would work in a Perl-like fashion:
#regex(\w+\s*) //parens inside must match
#regex|\\w+\\s*| //backslashes escape delimiter
String-examining/lexer-skipping macros
pat_macro {
fmt { format=StringContents("%" spec=Letter | percent="%%" | literal=NegativeCharClass("%")) "," rep(arg=Expr, ",") }
=> /* ??? */
}
fmt{"Look at this number: %u", 18u}
Making macros look inside strings should be fairly simple, but most practical applications will probably require lots more power from the macro system.
Invocations at non-expression position
What if we want macros to generate non-expressions (especially items, types)? It seems like we need a separate invocation form for every nonterminal we want to extend. Fortunately, expressions cover a lot of the interesting territory.