[5.7] Recover from parser errors #519

hamishknight · 2022-06-28T17:27:18Z

5.7 cherry-pick of #481

Currently we use Swift error handling for parser errors. While this is convenient, it has a number of drawbacks:

Any AST parsed gets thrown away as soon as we encounter an error. This prevents clients from being able to get any useful information from invalid AST (rdar://93677069).
Multiple diagnostics cannot be issued, meaning that e.g a basic syntactic error could obscure a more useful semantic error.
It doesn't extend nicely to e.g warning diagnostics, meaning that we'd eventually end up with 2 ways of emitting diagnostics.
The thrown errors relied on recordLoc blocks to annotate them with source location info, which could lead to errors without location info if we forgot to add the appropriate recordLoc calls. Additionally, in some cases we want a more fine grained location info than the block would give us.

Therefore this PR removes the use of Swift error handling throughout the parser. The parser is now a total function that always returns an AST. If errors are encountered while parsing, they are recorded, and are attached to the resulting AST by the parser. The parser attempts to recover as much of the AST it can when encountering an error. As such, there is now are now .invalid atom and character property kinds. Sema then runs and can attach more diagnostics onto the AST.

For now, the compiler interface remains the same, and we pick a single error to throw, but this will be changed in a later PR to allow multiple errors and warnings, as well as AST recovery. This also means we can better preserve the capture type in the presence of parser errors.

Fortunately, in most cases, this is quite a mechanical transformation. It entails:

Moving the lexical analysis methods onto the Parser. We were already passing ParsingContext parameters for most of them, so it's not clear they were benefitting from the isolation that Source offered. Effectively this means that all parsing has access to the context and diagnostics.
Converting error throwing statements into calls to the parser's error method (or unreachable method for unreachables).

This PR also updates the parser tests to be able to be able to match against multiple diagnostics.

Part of the fix for rdar://93677069
Resolves #449

This stores both a source location, and has the ability to be `nil`, which is necessary to enable parser recovery in cases where we expect a number but parse something that e.g overflows.

Currently we use Swift error handling for parser errors. While this is convenient, it has a number of drawbacks: - Any AST parsed gets thrown away as soon as we encounter an error. This prevents clients from being able to get any useful information from invalid AST (rdar://93677069). - Multiple diagnostics cannot be issued, meaning that e.g a basic syntactic error could obscure a more useful semantic error. - It doesn't extend nicely to e.g warning diagnostics, meaning that we'd eventually end up with 2 ways of emitting diagnostics. - The thrown errors relied on `recordLoc` blocks to annotate them with source location info, which could lead to errors without location info if we forgot to add the appropriate `recordLoc` calls. Additionally, in some cases we want a more fine grained location info than the block would give us. Therefore this commit removes the use of Swift error handling throughout the parser. The parser is now a total function that _always_ returns an AST. If errors are encountered while parsing, they are recorded, and are attached to the resulting AST by the parser. The parser attempts to recover as much of the AST it can when encountering an error. As such, there is now are now `.invalid` atom and character property kinds. Sema then runs and can attach more diagnostics onto the AST. For now, the compiler interface remains the same, and we pick a single error to `throw`, but this will be changed in a later PR to allow multiple errors and warnings, as well as AST recovery. This also means we can better preserve the capture type in the presence of parser errors. Fortunately, in most cases, this is quite a mechanical transformation. It entails: - Moving the lexical analysis methods onto the `Parser`. We were already passing `ParsingContext` parameters for most of them, so it's not clear they were benefitting from the isolation that `Source` offered. Effectively this means that all parsing has access to the context and diagnostics. - Converting error throwing statements into calls to the parser's `error` method (or `unreachable` method for unreachables). This commit also updates the parser tests to be able to be able to match against multiple diagnostics.

Scan to the closing delimiter of an invalid identifier, and better diagnose an invalid text segment option.

We now always run validation, which is fine because the resulting AST can still be returned.

hamishknight · 2022-06-30T19:12:15Z

@swift-ci please test

hamishknight added the r5.7 5.7 Release Cherry Picks label Jun 28, 2022

hamishknight mentioned this pull request Jun 28, 2022

[5.7] [DNM] Null PR swiftlang/swift#42532

Closed

hamishknight requested a review from stephentyrone June 28, 2022 17:28

hamishknight force-pushed the totally-5.7 branch 2 times, most recently from c36e539 to c6ad784 Compare June 29, 2022 10:55

hamishknight mentioned this pull request Jun 29, 2022

[5.7] [test] Update a couple of regex diagnostic locations swiftlang/swift#59777

Merged

hamishknight force-pushed the totally-5.7 branch from c6ad784 to dfe3d67 Compare June 30, 2022 10:29

stephentyrone approved these changes Jun 30, 2022

View reviewed changes

hamishknight added 6 commits June 30, 2022 20:07

Improve a diagnostic message

7e41a44

Introduce AST.Atom.Number

baa9438

This stores both a source location, and has the ability to be `nil`, which is necessary to enable parser recovery in cases where we expect a number but parse something that e.g overflows.

Introduce Diagnostics

bcd52c1

Improve recovery for identifiers and text segment options

b6cedf5

Scan to the closing delimiter of an invalid identifier, and better diagnose an invalid text segment option.

Drop the ASTStage parameter

c1c4e61

We now always run validation, which is fine because the resulting AST can still be returned.

hamishknight force-pushed the totally-5.7 branch from dfe3d67 to c1c4e61 Compare June 30, 2022 19:12

hamishknight merged commit 0a88a36 into swiftlang:swift/release/5.7 Jun 30, 2022

hamishknight deleted the totally-5.7 branch June 30, 2022 19:23

This was referenced Jun 30, 2022

[5.7] Fix anchor bugs, de-genericize processor, add ranges collection #531

Merged

[5.7] Merge benchmarker improvements and character class bitset optimization #532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[5.7] Recover from parser errors #519

[5.7] Recover from parser errors #519

Uh oh!

hamishknight commented Jun 28, 2022 •

edited

Loading

Uh oh!

hamishknight commented Jun 30, 2022

Uh oh!

Uh oh!

[5.7] Recover from parser errors #519

[5.7] Recover from parser errors #519

Uh oh!

Conversation

hamishknight commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamishknight commented Jun 30, 2022

Uh oh!

Uh oh!

hamishknight commented Jun 28, 2022 •

edited

Loading