Skip to content

Trivia interface #2

Open
Open
@c42f

Description

@c42f

I was thinking a bit about the "right" interface to trivia.

The rust-analyzer people are discussing it over at rust-lang/rust-analyzer#6584 so they've got some good background reading there. It seems generally awkward with no obviously right answer.

IIUC there's two common interfaces:

  • Roslyn, Swift libsyntax — attach multiple trivia tokens trivia to each side of nontrivia nodes (or just nontrivia leaf nodes?). The nontrivia nodes themselves become "fatter", and the depth of the tree is increased by 1.
  • rust-analyzer, IntelliJ(??) — attach trivia tokens as arbitrary children of any interior nodes. So they're generally siblings of nontrivia nodes.

The rust-analyzer model is appealing because it leads to simpler data structures with less internal structure. Also it's more general because the trivia might be naturally interspersed with nontrivia children but without a natural attachment to any of the children. But we could go for either approach, or something else entirely.

Whitespace trivia

A useful observation: we can't attaching whitespace so that

  • Every node represents a contiguous span of bytes in the source file - a fundamental property of green trees
  • We respect the visual tree structure — in the sense that nested nodes should only contain whitespace relevant to their own internal tree structure (rather than where they are placed in the larger tree.)

In general for a refactoring pass, I guess whitespace will become inconsistent during refactoring and will need to be regenerated. This is obviously true for moving blocks but it's even true for refactoring as simple as renaming identifiers. For example, renaming elements of expressions which span multiple lines:

func(arg1, arg2, ...
     argN, argN1)
^^^^
# problematic whitespace if length of func symbol changes

So I'm kind of convinced that there's no natural representation of whitespace within the green tree, so we may as well do whatever is efficient and simple to implement.

Symbols

Consider a simple thing like (b + c) + (b + c)^2 and a pass which identifies common subexpressions to get

x = b + c
(x) + (x)^2

Here we can and should remove the parentheses (which are trivia after parsing, due to being used for grouping only). What do we even do here? Like whitespace, it seems refactorings will regularly break this kind of trivia and require that it's regenerated from a model of the precedence rules.

Comments

What about comments? This is much more relevant and I think we should aim for "comments are likely to survive symbolic refactoring and remain attached in the right places".

It seems likely there's cases where one or other model wins here, depending on the situation. Some prototyping with simple example refactoring passes might be necessary to get a feel for the pros and cons.

Impact on the parser

One big benefit we have in the ParseStream interface is that trivia is mostly invisible to the parser. So in theory we can adjust trivia attachment heuristics (within whichever model is chosen) independently of the parser code. Julia is sensitive to whitespace and newlines in selected situations, but after parsing is done this information is no longer needed and it may be consistent to split and recombine trivia however we like by floating the boundaries of nodes across the trivia tokens.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions