Skip to content

Stabilize reserved prefixes #88140

Closed
Closed
@nikomatsakis

Description

@nikomatsakis

Reserved prefixes stabilization report

Links

Summary

  • any_identifier#, any_identifier"...", and any_identifier'...' are now reserved
    syntax, and no longer tokenize.
  • This is mostly relevant to macros. E.g. quote!{ #a#b } is no longer accepted.
  • It doesn't treat keywords specially, so e.g. match"..." {} is no longer accepted.
  • Insert whitespace between the identifier and the subsequent #, ", or '
    to avoid errors.
  • Edition migrations will help you insert whitespace in such cases.

Details

To make space for new syntax in the future, we've decided to reserve syntax for prefixed identifiers and literals: prefix#identifier, prefix"string", prefix'c', and prefix#123, where prefix can be any identifier. (Except those prefixes that already have a meaning, such as b'...' (byte strings) and r"..." (raw strings).)

This provides syntax we can expand into in the future without requiring an edition boundary. We may use this for temporary syntax until the next edition, or for permanent syntax if appropriate.

Without an edition, this would be a breaking change, since macros can currently accept syntax such as hello"world", which they will see as two separate tokens: hello and "world". The (automatic) fix is simple though: just insert a space: hello "world". Likewise, prefix#ident should become prefix #ident. Edition migrations will help with this fix.

Other than turning these into a tokenization error, the RFC does not attach a meaning to any prefix yet. Assigning meaning to specific prefixes is left to future proposals, which will now—thanks to reserving these prefixes—not be breaking changes.

Some new prefixes you might potentially see in the future (though we haven't
committed to any of them yet):

  • k#keyword to allow writing keywords that don't exist yet in the current edition. For example, while async is not a keyword in edition 2015, this prefix would've allowed us to accept k#async in edition 2015 without having to wait for edition 2018 to reserve async as a keyword.
  • f"" as a short-hand for a format string. For example, f"hello {name}" as a short-hand for the equivalent format!() invocation.
  • s"" for String literals.
  • c"" or z"" for null-terminated C strings.

How unresolved questions were resolved and other interesting developments

Where and how to enforce prefixes

The biggest question was where to enforce the prefixes and emit errors. We ultimately opted to emit errors in the lexer, which meant that the lexer had to become aware of the current edition. There was an alternative of using "jointness" and enforcing the conditions in the parser. The idea was to leverage the fact that Rust tokens (at least some subset of them) record not only their content but whether they are separated by whitespace from the next token. This was intended to enable compound operators like << to be parsed as two < tokens in some parts øf the parser (types) and as a single token elsewhere (expressions), without the lexer having to know what state the parser was in. This same approach could conceptually be used so that the lexer doesn't have to know the edition.

As described in detail in this writeup, however, the jointness approach had several downsides. For example, it meant that lexing of literals was independent of prefix: we might like f"{foo("bar")}" to be lexed a a string, but that is not possible unless the lexer knows that an f string can contain embedded expressions. Similarly, which escape codes the lexer accepts depends on the prefix (e.g. \x for b""). (This is especially relevant for raw strings: whether fr"\" is accepted or not depends on what meaning we assign to fr.) Jointness also had forwards compatbility hazards with macro arm ordering. Finally, the lexer-based approach can be converted to a jointness-based approach later, as it currently gives errors much earlier in the process.

There were also advantages to jointness: it would allow more procedural macro prototyping, and it means that the lexer would remain independent of edition.

Edition used for procedural macro APIs

There are some procedural macro APIs that lex tokens from strings. Those APIs have not traditionally taken a span or other information from which an edition can be derived. Those APIs will be documented with the Edition that they use to do lexing. In the future we may wish to add new APIs that take a Span or other parameter and use that to derive the Edition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-langRelevant to the language team, which will review and decide on the PR/issue.disposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions