Open
Description
The lexical specification needs some cleanup and organization. Some things I can think of:
- There should be an overall introduction and overview of the lexical structure.
- Paths are in the Lexical chapter, but I don't think they should be. Start documenting name resolution. #937.
- The UTF8BOM/SHEBANG definition is floating in a chapter outside of the Lexical chapter. I think it is relevant to lexing, so it should be somehow incorporated in the Lexical chapter. (Not sure how, probably need to rearrange things a little.) Input format #1459
- I think there should be an appendix consolidating all the Lexer rules blocks. This should be generated automatically. DONE: Add a new grammar renderer #1787 https://doc.rust-lang.org/nightly/reference/grammar.html#lexer-summary
- The "input format" subchapter is almost completely useless, and could be moved somewhere else. Input format #1459
- There should be a note about token ambiguity (this can be relatively brief, but should be mentioned). This depends on the lexer/parser implementation. rustc works by splitting tokens into smaller parts. The proc_macro parser works by only issuing the smaller tokens, and using the Spacing to determine if they should be combined later on. The tokens that I'm aware of that cause this issue are:
Token | Possibly Split Into |
---|---|
+= |
+ = |
&& |
& & |
|| |
| | |
<< |
< < |
<- |
< - |
>> |
> > |
>>= |
> >= |
>= |
> = |
+= |
+ = |
See also:
rust-lang/wg-grammar#3
https://internals.rust-lang.org/t/pre-pre-rfc-canonical-lexer-specification/4099