implement `catch` expressions

This is a sub-issue of #31436 corresponding to the addition of `catch` expressions. What follows is a rough implementation plan. It consists of four PRs. I'll happily expand and add details upon request.

The high-level summary is that it would be nice to implement catch by having the HIR support the ability of a `break` to target arbitrary enclosing blocks. This would not be something exposed in the main language, which still permits only breaking out of loops (but we might choose to do so in the future, that'd require an RFC).

Here are some high-level notes on how to go about this. It's a "medium-sized" job, I think. It may be best to begin with some refactorings, as you will see. 

### Refactoring 1: **Normalize how labeled and unlabeled breaks work.**

  - Currently, the `ExprBreak` and `ExprAgain` variants of `hir::Expr_` reference a `Option<Label>` indicating where we should break to
  - During HIR lowering, if the user gave an explicit label (e.g., `'a: loop { ... break 'a }`), it will get translated into a specific node-id (the `loop_id` field of the `Label` struct)
  - This is done by consulting the name resolution tables in the `lower_label()` method of HIR lowering
    - Since we don't plan to change the surface language, we don't really have to change anything in name resolution, thankfully.
  - Annoyingly, if you have an **anonymous** break (e.g., `loop { break; }`), then we don't store any label or indication
    of where you break to. Instead, each pass must carry a stack of loops along with it. This is annoying and it would be nice to change it, so that every HIR `break` (and `continue`) already knows precisely where it is going.
  - In other words, we want to change the type of the `break` variant to `ExprBreak(Label, Option<P<Expr>>)` -- the "label", in particular, is no longer optional.
    - Probably we want to change the `Label` struct so that the "name" field is optional, for pretty-printing if nothing else
  - On the other hand, this refactoring may not be *that* important since most passes will still have to maintain a stack, but it seems good.
  - To do the refactoring:
    - lowering needs to grow a stack of "in-scope" loops, so that it can check which is on the top
    - the `librustc_passes/loops.rs` code currently does some sanity check and reports errors for malformed `break` (e.g., `break` outside of a loop)
    - that logic basically moves into lowering
    - we will then want to update the passes that consume `ExprBreak` and `ExprAgain`:
        - MIR construction, liveness, CFG construction
        - the changes are small, since you basically just want to take the `Some`
          path that exists (i.e., the code is there to handle the `None` that we are removing
        - in the case of `liveness`, the `loop_scope` stack can just be removed entirely, we won't be using it anymore
          
**Why do this?** This refactoring is an improvement because it consolidates the knowledge of where a `break` goes into one pass: HIR lowering. Right now, that knowledge is repeated in several passes, each of which must maintain a stack of loops. It will still be necessary to keep that stack of loops, but we'll be using it differently. In today's code, we sometimes search the stack for a loop with a given id (labeled break) and sometimes just take the top of the stack (unlabeled break). After this refactoring, we are always searching for the entry with a given id. This is important, because in the next section we are going to add blocks into these stacks, and so if we wanted to keep unlabeled breaks as they are, we'd have to ensure that when we check the top of the stack, we don't find some random block, but only the top-most loop.

### Next step: add `catch` into AST

The goal here is just to do the plumbing of parsing `catch` and getting it into the AST.

- Extend the parser to support `catch`
- Add a feature-gate 
- Produce corresponding AST nodes
- In HIR lowering, you can keep a stack of in-scope `catch` nodes 
    - for the first PR, if `?` is used inside of a `catch` node, I would just abort with `bug!("unimplemented: ? in catch")`
    - but we'll use the stack for more stuff later :P
    
### Next step: allow HIR `break` to target blocks

The goal here is to make it possible for breaks to target blocks in HIR. We won't need to change the data structures very much for this to work. Probably it's a good idea to rename the field `loop_id` in `hir::Label` to something like `node_id` or `target_id`, since it won't always be naming a loop anymore.

In principle, this change can be landed as an initial PR. Annoyingly, though, since we are not changing the surface syntax, there isn't a good way to **test** this code without also adding support for `catch`, so we may want to do the two changes together in one PR. Though I'd also be ok to land them separately, or perhaps add some sort of internal testing attributes that enable anonymous `break` to break out of the innermost block (`#[rustc_break_from_block]`).

Anyway, we basically don't need to change HIR lowering at all. All the existing breaks are still valid breaks. The only change is that (in the next PR...) HIR lowering will produce new breaks that target blocks and not just loops. So we need to prepare the passes that consume breaks to be prepared for that. These are the same set of passes we saw in the first refactoring: MIR construction, liveness, and CFG construction. As we saw in that first refactoring, each of these passes maintains an internal stack of loops which they already use to search and find the appropriate place that a break should branch to. So the gist of the change is to extend these stacks to include breaks too. Here are some details on each pass:

- **liveness:**
    - the `with_loop_nodes` helper is the one that sets up the targets for breaks and so forth. At present it pushes onto a stack (`self.loop_scope`), but that should have been removed in the first refactoring. It also stores into a map (`self.break_ln`) with the "place that a break with a given target should go to" (`ln` stands for "liveness node"). 
    - in the `propagate_through_block(blk, succ_ln)` method, we want to therefore record `self.break_ln.insert(node_id, succ_ln)`, as `succ_ln` is the liveness node for the successor of the block
- **CFG construction:**
    - CFG construction keeps a vec of `LoopScope` entries rather than a map like liveness. We'll need to push onto this for blocks. The relevant function is `fn block(&mut self, blk: &hir::Block, pred: CFGIndex) -> CFGIndex`; annoyingly, the way that the CFG construction works is that you are given the predecessor node (`pred`) and you return the successor. This means that whenever we process a block, we'll have to create a dummy node (`self.add_dummy_node(&[])`) that we can add into the `LoopScope` vector to use as the target of blocks. Once we're finished processing the contents of the block, it can branch to this dummy node and then we will return the dummy node.
    - (See the efficiency note below.)
- **MIR building:**
    - MIR building works similarly; the helper that pushes on the stack is `in_loop_scope()`
    - one difference here is that we have to have the `ExprBreak` that targets a block generate an appopriate value; this should work the same way that it does for `let x = loop { break 22 };`.
 
**Efficiency note.** We may not want to create a bunch of dummy basic blocks for **every block**. Unfortunately, because MIR and CFG construction works forward, this is sort of hard to avoid. To fix this, I think we will want to extend the HIR for a block to include a boolean indicating whether it is targeted by any breaks, so we can only add the dummy node for those cases where it is needed. This will also mean you have to edit the HAIR, which is a kind of simple, HIR-like representation that exists briefly as we lower to MIR (`src/librustc_mir/hair/`). This change should be fairly straightforward -- in particular, until the next step, NO blocks can be targeted by breaks, so we can just set this value to always be false.

### The final change: adding `catch` to the HIR

With those pieces in place, we can finally put all the various bits together. Basically when building the HIR for `catch { ... }`, we would do the following:

- create a block that is eligible to be the target of a break (the boolean is true)
- push the id for this block onto the internal stack of catch nodes
- when we see `?`, we would construct an `ExprBreak` targeting this block (with a value), instead of a `ExprReturn`

That's "it"! Then we can write some tests!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement `catch` expressions #39849

Refactoring 1: Normalize how labeled and unlabeled breaks work.

Next step: add `catch` into AST

Next step: allow HIR `break` to target blocks

The final change: adding `catch` to the HIR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

implement catch expressions #39849

Description

Refactoring 1: Normalize how labeled and unlabeled breaks work.

Next step: add catch into AST

Next step: allow HIR break to target blocks

The final change: adding catch to the HIR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

implement `catch` expressions #39849

Next step: add `catch` into AST

Next step: allow HIR `break` to target blocks

The final change: adding `catch` to the HIR