Description
This is a sub-issue of #31436 corresponding to the addition of catch
expressions. What follows is a rough implementation plan. It consists of four PRs. I'll happily expand and add details upon request.
The high-level summary is that it would be nice to implement catch by having the HIR support the ability of a break
to target arbitrary enclosing blocks. This would not be something exposed in the main language, which still permits only breaking out of loops (but we might choose to do so in the future, that'd require an RFC).
Here are some high-level notes on how to go about this. It's a "medium-sized" job, I think. It may be best to begin with some refactorings, as you will see.
Refactoring 1: Normalize how labeled and unlabeled breaks work.
- Currently, the
ExprBreak
andExprAgain
variants ofhir::Expr_
reference aOption<Label>
indicating where we should break to - During HIR lowering, if the user gave an explicit label (e.g.,
'a: loop { ... break 'a }
), it will get translated into a specific node-id (theloop_id
field of theLabel
struct) - This is done by consulting the name resolution tables in the
lower_label()
method of HIR lowering- Since we don't plan to change the surface language, we don't really have to change anything in name resolution, thankfully.
- Annoyingly, if you have an anonymous break (e.g.,
loop { break; }
), then we don't store any label or indication
of where you break to. Instead, each pass must carry a stack of loops along with it. This is annoying and it would be nice to change it, so that every HIRbreak
(andcontinue
) already knows precisely where it is going. - In other words, we want to change the type of the
break
variant toExprBreak(Label, Option<P<Expr>>)
-- the "label", in particular, is no longer optional.- Probably we want to change the
Label
struct so that the "name" field is optional, for pretty-printing if nothing else
- Probably we want to change the
- On the other hand, this refactoring may not be that important since most passes will still have to maintain a stack, but it seems good.
- To do the refactoring:
- lowering needs to grow a stack of "in-scope" loops, so that it can check which is on the top
- the
librustc_passes/loops.rs
code currently does some sanity check and reports errors for malformedbreak
(e.g.,break
outside of a loop) - that logic basically moves into lowering
- we will then want to update the passes that consume
ExprBreak
andExprAgain
:- MIR construction, liveness, CFG construction
- the changes are small, since you basically just want to take the
Some
path that exists (i.e., the code is there to handle theNone
that we are removing - in the case of
liveness
, theloop_scope
stack can just be removed entirely, we won't be using it anymore
Why do this? This refactoring is an improvement because it consolidates the knowledge of where a break
goes into one pass: HIR lowering. Right now, that knowledge is repeated in several passes, each of which must maintain a stack of loops. It will still be necessary to keep that stack of loops, but we'll be using it differently. In today's code, we sometimes search the stack for a loop with a given id (labeled break) and sometimes just take the top of the stack (unlabeled break). After this refactoring, we are always searching for the entry with a given id. This is important, because in the next section we are going to add blocks into these stacks, and so if we wanted to keep unlabeled breaks as they are, we'd have to ensure that when we check the top of the stack, we don't find some random block, but only the top-most loop.
Next step: add catch
into AST
The goal here is just to do the plumbing of parsing catch
and getting it into the AST.
- Extend the parser to support
catch
- Add a feature-gate
- Produce corresponding AST nodes
- In HIR lowering, you can keep a stack of in-scope
catch
nodes- for the first PR, if
?
is used inside of acatch
node, I would just abort withbug!("unimplemented: ? in catch")
- but we'll use the stack for more stuff later :P
- for the first PR, if
Next step: allow HIR break
to target blocks
The goal here is to make it possible for breaks to target blocks in HIR. We won't need to change the data structures very much for this to work. Probably it's a good idea to rename the field loop_id
in hir::Label
to something like node_id
or target_id
, since it won't always be naming a loop anymore.
In principle, this change can be landed as an initial PR. Annoyingly, though, since we are not changing the surface syntax, there isn't a good way to test this code without also adding support for catch
, so we may want to do the two changes together in one PR. Though I'd also be ok to land them separately, or perhaps add some sort of internal testing attributes that enable anonymous break
to break out of the innermost block (#[rustc_break_from_block]
).
Anyway, we basically don't need to change HIR lowering at all. All the existing breaks are still valid breaks. The only change is that (in the next PR...) HIR lowering will produce new breaks that target blocks and not just loops. So we need to prepare the passes that consume breaks to be prepared for that. These are the same set of passes we saw in the first refactoring: MIR construction, liveness, and CFG construction. As we saw in that first refactoring, each of these passes maintains an internal stack of loops which they already use to search and find the appropriate place that a break should branch to. So the gist of the change is to extend these stacks to include breaks too. Here are some details on each pass:
- liveness:
- the
with_loop_nodes
helper is the one that sets up the targets for breaks and so forth. At present it pushes onto a stack (self.loop_scope
), but that should have been removed in the first refactoring. It also stores into a map (self.break_ln
) with the "place that a break with a given target should go to" (ln
stands for "liveness node"). - in the
propagate_through_block(blk, succ_ln)
method, we want to therefore recordself.break_ln.insert(node_id, succ_ln)
, assucc_ln
is the liveness node for the successor of the block
- the
- CFG construction:
- CFG construction keeps a vec of
LoopScope
entries rather than a map like liveness. We'll need to push onto this for blocks. The relevant function isfn block(&mut self, blk: &hir::Block, pred: CFGIndex) -> CFGIndex
; annoyingly, the way that the CFG construction works is that you are given the predecessor node (pred
) and you return the successor. This means that whenever we process a block, we'll have to create a dummy node (self.add_dummy_node(&[])
) that we can add into theLoopScope
vector to use as the target of blocks. Once we're finished processing the contents of the block, it can branch to this dummy node and then we will return the dummy node. - (See the efficiency note below.)
- CFG construction keeps a vec of
- MIR building:
- MIR building works similarly; the helper that pushes on the stack is
in_loop_scope()
- one difference here is that we have to have the
ExprBreak
that targets a block generate an appopriate value; this should work the same way that it does forlet x = loop { break 22 };
.
- MIR building works similarly; the helper that pushes on the stack is
Efficiency note. We may not want to create a bunch of dummy basic blocks for every block. Unfortunately, because MIR and CFG construction works forward, this is sort of hard to avoid. To fix this, I think we will want to extend the HIR for a block to include a boolean indicating whether it is targeted by any breaks, so we can only add the dummy node for those cases where it is needed. This will also mean you have to edit the HAIR, which is a kind of simple, HIR-like representation that exists briefly as we lower to MIR (src/librustc_mir/hair/
). This change should be fairly straightforward -- in particular, until the next step, NO blocks can be targeted by breaks, so we can just set this value to always be false.
The final change: adding catch
to the HIR
With those pieces in place, we can finally put all the various bits together. Basically when building the HIR for catch { ... }
, we would do the following:
- create a block that is eligible to be the target of a break (the boolean is true)
- push the id for this block onto the internal stack of catch nodes
- when we see
?
, we would construct anExprBreak
targeting this block (with a value), instead of aExprReturn
That's "it"! Then we can write some tests!