Description
The current implementation of two-phase borrows (PRs #46537, #47489, #48197) should be thrown out and replaced with something more tailored to specific goals of the original RFC 2025.
The current implementation was written as if we could apply the concept of two-phase borrowing to any &mut
-borrow that we find in the MIR, and then we tacked on a restriction to certrain autorefs. The latter restriction was originally motivated by not wanting the borrowck analysis to be too tightly tied to the particular details of what strategy is used to construct MIR (the general form as implemented would accept or reject code based on whether one e.g. introduced extra moves of temps).
However, in our attempts to get two-phase borrows working as we expected, we have run into some issues (e.g. #48070, #48418). These seem to mostly be arising because the existing code was trying to be general purpose, but there is not an obvious straight-forward way to fix the aforementioned bugs (which tend to be either regressions taking the form of rejecting code that is accepted w/o two-phase borrows or diagnostics regressions), at least not without injecting soundness bugs.
After some discussion with @nikomatsakis, we decided the best way forward would be to redo the two-phase borrow support.
In order to have some context, in the details immediately below here is an outline of how two-phase borrow support is currently implemented.
(note that the plan may be to throw away much of this code, at least the code in the rustc_mir
crate. Don't worry too much about trying to preserve the overall algorithm outlined here.)
- The types representing
&mut
borrows track whether they support two-phase borrows (i.e. they track whether they arose from method-call autoref adjustments); see the fieldallow_two_phase_borrow
inAutoBorrowMutability
. - That type information is fed into the MIR construction; see
allow_two_phase_borrow
inmir::BorrowKind
. - When we do dataflow analysis, we have one analysis to compute the reservations of borrows, and a second analysis to compute the activation points. This second analysis is really just computing all of the uses that follow some reservation; it is up to the code using the analysis to know whether it needs the first such use in the control flow, or if it wants any such use.
- (This sort of weirdness is in part why this code base is making it hard to resolve issues like two-phase borrows creates extra error messages #48418.)
- Also, the dataflow does not consult the
allow_two_phase_borrow
information carried in themir::BorrowKind
. The dataflow results are the same regardless of that setting; the only difference is in how the dataflow is interpreted. (pnkfelix's original motivation for this was that he thought this would make it easier to debug code, by ensuring that everyone is staring at the same dataflow results...)
- In the MIR-borrowck code, we consult both of the above two analysis results (reservations and activations).
- For each reservation, we see if it allows two-phase borrows; if so, we check it for conflicts using one special path in the compiler. If not, then we use the old path that is more stringent.
- Every activation is also checked (for whether it conflicts with any shared borrows, namely ones that were started during the reservation of that borrow).
With those details out the way, in these additional details below I outline the basic idea of the two-phase borrow rewrite:
- One of the motivations of the original code was to express two-phase borrows in a manner that worked with both lexical lifetimes (yet still using mir-borrowck) and non-lexical lifetimes (NLL). This mattered when NLL was under very active development, but now it seems safe to introduce a tight coupling between NLL and two-phase borrows.
- In other words, its okay e.g. to require some notion of a non-lexical reservation region that ends at the point of the activation, even though such a thing cannot be expression under the lexical regions model.
- Use the existing information on the types and the MIR to track which
&mut
-borrows support two-phase borrows. Note that this flag is only saying that such a borrow may allow two-phases; further conditions need to hold in the constructed MIR for the phases to be observable. - Revise the
mir::dataflow::impls::borrows
to track activations more precisely, both in the sense of using theallow_two_phase_borrow
flag, and also in terms of encoding the following activation semantics:- For any borrow
tmp = &mut place
that says it allows two-phase borrows, determine if there is a unique use oftmp
that post-dominates the borrow. Also determine if the borrow dominates that use.- (In other words, does the use of
tmp
have solely that borrow as its definition, and does the definition have that use as its only use.)
- (In other words, does the use of
- If this condition does not hold (i.e., if the use has more than one definition, or the borrow can flow to more than one "first use"), then the borrow just immediately activates the borrow; we don't have phasing here. To use dataflow terminology, the borrow statement
tmp = &mut place
, whentmp
does not have a uniquely determined "first use", causes the gen-bits for both the reservation and for the activation to be set to 1. - If the condition does hold, then the borrow
tmp = &mut place
just reserves (i.e. the borrow statement sets the reservation gen-bit to 1), and the (uniquely determined) associated use activates (i.e. use statement sets the activation gen-bit to 1). - If
tmp
does have a uniquely determined first-use, but there are also control-flow splits before that use is reached (e.g. due to unwind paths from function calls), then it may suffice to again says that the borrow immediately activates. But another option may be to say that the borrow solely reserves, and any control flow branch is not post-dominated by the unique first use causes an immediate activation.- @nikomatsakis noted that in such scenarios, the
tmp
in question should be considered dead, and thus the NLL region won't cover the flow of such branches anyway. So this may be an irrelevant detail. But @pnkfelix just wanted to point it out in case it arises...
- @nikomatsakis noted that in such scenarios, the
- For any borrow
- It may be simplest, in terms of reusing the existing borrow-check code, to continue to allow the reservation and activation bits to be set to 1 at the same time. But its possible this intuition is wrong; @nikomatsakis had outlined a desired to actually represent distinct regions, one for the reservation, and one for the activation. That might be implemented by actually representing such regions, or it might be implemented by having the dataflow bits reflect that the reservation and activation bits are never both set to 1 for a given place.
- We believe that with the new conditions above, the resulting dataflow should ensure that you never have an activation setting a bit to 1 that was already set to 1, at least for two-phased borrows. This should resolve some problems we were wrestling with, in terms of duplicate errors. It may also open up potential for simplifying the supporting code in
rustc_mir::borrow_check
. - The current
rustc_mir::borrow_check
support for reservations and activations may be salvageable, depending on the details of how the dataflow changes are done. You would probably throw away the special-case reading ofallow_two_phase_borrows
inrustc_mir::borrowck
, since this information should now be influencing the dataflow results and there should be no need for the mir-borrowck to also incorporate it into its own analysis.- The current mir-borrowck has checks both at the point of reservation of a borrow and at its (separate) activation.
- You would want to continue some sort of checking at the point of reservation, since we need to continue ensuring that reservations act just as shared borrows do.
- How to handle the activation point is more of an open question. Either
- you can do checking at the activation point (which requires finding out whether it has a conflict with any shared borrows in scope, taking care that an activation not interfere with its own reservation), or
- you can do the checking of any borrows that occur during the reservation and before the activation, somehow ensuring that they do not extend beyond the activation point. @nikomatsakis outlined the option to @pnkfelix attributing the mental model to @RalfJung, but @pnkfelix honestly is not clear on how to actually implement this check without doing something that ends up looking a lot like a check at the activation point anyway...