Description
Summary:
I a
, b
, ..., z
are multiple consecutive span nodes that need to be combined into a single new span node, then its combined span should be
span(a).to(span(b)).to(span(c))....to(span(z))`
"Span node" is either an individual token (every token has a span), or some larger AST node having a Span
field.
Examples:
-10 // an expression
a + 11 // one more expression
a::<u8>::b // a path
The combined span for these pieces of code will be
// expressions are span nodes because they have their own spans in AST, `-` doesn't have a larger AST node so it's treated as a primitive token
span(expr(-10)) = span(token(-)).to(span(expr(10)));
span(expr(a + 11)) = span(expr(a)).to(span(token(+))).to(span(expr(11)));
// the whole generic arg is a span node because it has its own span in AST
span(path(a::<u8>::b)) = span(ident(a)).to(span(token(::))).to(span(generic_arg(<u8>))).to(span(token(::))).to(span(ident(b)));
Status quo:
Currently the resulting span is typically built like span_first_token.to(span_last_token)
.
So anything in the middle and the internal structure (i.e. nodes, as opposed to tokens) are ignored.
Why we need to change it:
The to
operation will automatically take macro variables into account, and will try to put the resulting span into the best suitable macro context (this was implemented in #119673).
E.g. in $tt + 5
the combined expression span will be put into the context of the macro using $tt
as a macro parameter.
Note, that the same thing often happens in the current parser as well, but not consistently, e.g. $a::$b
will produce an incorrect resulting path span because the ::
in the middle is not considered.
This will also give us some single relatively well predictable rule for combining AST spans.
Implementation:
This work should be parallelizable relatively well (but may require a one time initial setup).
I'll review PRs doing this, they can be assigned to me.
It may be convenient to have a rolling value in the Parser
structure for span of the current (or previous?) span node.
The parser has a lot of bespoke diagnostic logic (including snapshotting) that stands in the way of any systematic improvements like this.
How this can be tested:
Make a macro that emits complex nodes using tokens from different contexts, e.g.
macro m($l:tt $op:tt $r:tt) {
2 + 3;
$l + 3;
2 $op 3;
2 + $r;
$l $op 3;
$l + $r;
2 $op $r;
$l $op $r;
}
m!(2 + 3);
and emit some diagnostic using those nodes' spans (maybe can add a special internal diagnostic for this testing).