Skip to content

[5.7] Recover from parser errors #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 30, 2022

Conversation

hamishknight
Copy link
Contributor

@hamishknight hamishknight commented Jun 28, 2022

5.7 cherry-pick of #481

Currently we use Swift error handling for parser errors. While this is convenient, it has a number of drawbacks:

  • Any AST parsed gets thrown away as soon as we encounter an error. This prevents clients from being able to get any useful information from invalid AST (rdar://93677069).

  • Multiple diagnostics cannot be issued, meaning that e.g a basic syntactic error could obscure a more useful semantic error.

  • It doesn't extend nicely to e.g warning diagnostics, meaning that we'd eventually end up with 2 ways of emitting diagnostics.

  • The thrown errors relied on recordLoc blocks to annotate them with source location info, which could lead to errors without location info if we forgot to add the appropriate recordLoc calls. Additionally, in some cases we want a more fine grained location info than the block would give us.

Therefore this PR removes the use of Swift error handling throughout the parser. The parser is now a total function that always returns an AST. If errors are encountered while parsing, they are recorded, and are attached to the resulting AST by the parser. The parser attempts to recover as much of the AST it can when encountering an error. As such, there is now are now .invalid atom and character property kinds. Sema then runs and can attach more diagnostics onto the AST.

For now, the compiler interface remains the same, and we pick a single error to throw, but this will be changed in a later PR to allow multiple errors and warnings, as well as AST recovery. This also means we can better preserve the capture type in the presence of parser errors.

Fortunately, in most cases, this is quite a mechanical transformation. It entails:

  • Moving the lexical analysis methods onto the Parser. We were already passing ParsingContext parameters for most of them, so it's not clear they were benefitting from the isolation that Source offered. Effectively this means that all parsing has access to the context and diagnostics.

  • Converting error throwing statements into calls to the parser's error method (or unreachable method for unreachables).

This PR also updates the parser tests to be able to be able to match against multiple diagnostics.

Part of the fix for rdar://93677069
Resolves #449

This stores both a source location, and has the
ability to be `nil`, which is necessary to enable
parser recovery in cases where we expect a number
but parse something that e.g overflows.
Currently we use Swift error handling for parser
errors. While this is convenient, it has a number
of drawbacks:

- Any AST parsed gets thrown away as soon as we
encounter an error. This prevents clients from
being able to get any useful information from
invalid AST (rdar://93677069).

- Multiple diagnostics cannot be issued, meaning
that e.g a basic syntactic error could obscure a
more useful semantic error.

- It doesn't extend nicely to e.g warning
diagnostics, meaning that we'd eventually end up
with 2 ways of emitting diagnostics.

- The thrown errors relied on `recordLoc` blocks
to annotate them with source location info, which
could lead to errors without location info if we
forgot to add the appropriate `recordLoc` calls.
Additionally, in some cases we want a more fine
grained location info than the block would give us.

Therefore this commit removes the use of Swift
error handling throughout the parser. The parser
is now a total function that _always_ returns an
AST. If errors are encountered while parsing, they
are recorded, and are attached to the resulting
AST by the parser. The parser attempts to recover
as much of the AST it can when encountering an
error. As such, there is now are now `.invalid`
atom and character property kinds. Sema then runs
and can attach more diagnostics onto the AST.

For now, the compiler interface remains the same,
and we pick a single error to `throw`, but this
will be changed in a later PR to allow multiple
errors and warnings, as well as AST recovery. This
also means we can better preserve the capture type
in the presence of parser errors.

Fortunately, in most cases, this is quite a
mechanical transformation. It entails:

- Moving the lexical analysis methods onto the
`Parser`. We were already passing `ParsingContext`
parameters for most of them, so it's not clear they
were benefitting from the isolation that `Source`
offered. Effectively this means that all parsing
has access to the context and diagnostics.

- Converting error throwing statements into calls
to the parser's `error` method (or `unreachable`
method for unreachables).

This commit also updates the parser tests to be
able to be able to match against multiple
diagnostics.
Scan to the closing delimiter of an invalid
identifier, and better diagnose an invalid text
segment option.
We now always run validation, which is fine
because the resulting AST can still be returned.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r5.7 5.7 Release Cherry Picks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants