Skip to content

What exactly token streams are passed to procedural macros 1.2 #50038

Closed
@petrochenkov

Description

@petrochenkov

This is an issue that needs to be resolved before stabilization of "Macros 1.2".

Procedural macros that we are going to stabilize currently have two flavors - proc_macro and proc_macro_attribute.


proc_macro macros have signature fn(TokenStream) -> TokenStream and can be invoked with "bang" forms like this:

my::proc::macro!( TOKEN_STREAM )
my::proc::macro![ TOKEN_STREAM ]
my::proc::macro! { TOKEN_STREAM }

Only the TOKEN_STREAM part is passed to the macro as TokenStream, the delimiters (brackets) are NOT passed.

Why this is bad:

  • The macro doesn't know what delimiters it was invoked with.
    It was a part of Macro 2.0 promise to give macros control over delimiters in their invocations, so e.g. vec-like macros could require square brackets like vec![1, 2, 3] and reject other brackets.
    We should not prevent this kind of control being implemented in the future.

Why this is good:

  • Brackets are mostly not a part of the "useful payload" for the macro, they are there so macro invocations could be parsed unambiguously in many context in which they can appear - expressions, types, blocks, modules, etc, etc, etc.

proc_macro_attribute macros have signature fn(TokenStream, TokenStream) -> TokenStream and can be invoked with "attribute" forms like this:

#[my::proc::macro TOKEN_STREAM] TARGET
#![my::proc::macro TOKEN_STREAM] TERGET

TARGET is a trait/impl/foreign item, or a statement and it's passed to the macro as the second TokenStream argument, but we are not interested in it right now.

The TOKEN_STREAM part is passed to the macro as the first TokenStream argument, nothing is ignored.

Why this is bad:

  • It's not clear where the path ends and where the token stream starts.
    Something like #[a::b :: + -] seems to match the grammar, but is rejected right now because paths always parsed greedily so :: is interpreted as a path separator rather than a path of the token stream.
    Annoying questions arise with generic arguments in paths like #[a<>::b::c<u8>]. Technically this is a syntactically valid path and c having type arguments is rather a semantic error and the empty <> after the module a is not an error at all, but rigth now this attribute is interpreted as #[a /* <- PATH | TOKEN_STREAM -> */ <>::b::c<u8>].
    Ideally we'd like to avoid these questions completely and have an unambiguous delimiter.
  • It's not clear where the token stream ends.
    With plain #[attr TOKEN_STREAM] it's pretty clear - the stream ends before the ] (in this sense the situation is simpler than with bang macros), but things start breaking when other macros appear.
    macro m($meta1: meta, $meta2: meta) { ... }
    
    // No way to determine where the first attribute starts and the second attribute ends
    m!( a::b::c x , y , z , d::e::f u , v , w )
    So with this attribute syntax we can't support meta anymore!
  • It's not consistent with proc_macro macros. m!(a, b, c) does not include parentheses into the token stream, but #[m(a, b, c)] does.
  • I'm not actually sure people intend to stabilize this attribute syntax suddenly expanded from traditional forms (#[attr], #[attr(list)], #[attr = literal]) to being nearly unlimited (i.e. something like #[a::b::c e f + c ,,, ;_:] being legal) right now.

Proposed solution:

  • Stabilize proc_macro as is for "Macros 1.2".

  • In the future extend the set of proc_macro plugin interfaces with one more signature fn(TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations.

  • In the future possibly support bang macro invocations without delimiters for symmetry with attributes and because they may be legitimately useful (let x = MACRO_CONST!;, see https://internals.rust-lang.org/t/idea-elide-parens-brackets-on-unparametrized-macros/6527) (the Delimiter argument is Delimiter::None in this case).

  • Restrict attribute syntax accepted by proc_macro_attribute for "Macros 1.2" to

    // Symmetric with bang macro invocations
    #[my::proc::macro(TOKEN_STREAM)]
    #[my::proc::macro[TOKEN_STREAM]]
    #[my::proc::macro { TOKEN_STREAM }]
    // Additionally
    #[my::proc::macro]
    #[my::proc::macro = TOKEN_TREE]

    Or, more radically, do not stabilize the = syntax for procedural macros 1.2.
    This is not a fundamental restriction - arbitrary token streams still can be placed inside the brackets (#[a::b::c(e f + c ,,, ;_:)]).

  • The token stream passed to the macro DOES NOT include the delimiters.

  • In the future extend the set of proc_macro_attribute plugin interfaces with one more signature fn(TokenStream, TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations (the delimiter is Delimiter::None for both #[attr] and #[attr = tt] forms but they are still discernable by the token stream being empty or not).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-decl-macros-2-0Area: Declarative macros 2.0 (#39412)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions