-
Notifications
You must be signed in to change notification settings - Fork 405
Always process ChannelMonitorUpdate
s asynchronously
#1897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always process ChannelMonitorUpdate
s asynchronously
#1897
Conversation
ChannelMonitorUpdate
s asynchronously
All that to remove 100 LoC in our messaging pipelines :)
|
Codecov ReportBase: 87.41% // Head: 87.59% // Increases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #1897 +/- ##
==========================================
+ Coverage 87.41% 87.59% +0.17%
==========================================
Files 101 101
Lines 45633 47683 +2050
Branches 45633 47683 +2050
==========================================
+ Hits 39891 41768 +1877
- Misses 5742 5915 +173
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
ab58f1c
to
4b348e6
Compare
Should revisit #1886 (comment) as a part of this PR. |
4b348e6
to
ace0660
Compare
e0bb80d
to
809dabc
Compare
Rebased, Note that the "Always process...async" commits will all need to be squashed down at the end as there are test failures introduced in between them which are fixed by completing the series. They're left separate for now for reviewability. |
809dabc
to
4217ec3
Compare
4217ec3
to
d27d3bd
Compare
ac1a550
to
78606e7
Compare
self.pending_monitor_updates.push(monitor_update); | ||
return Ok(self.pending_monitor_updates.last().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we don't call monitor_updating_paused
here unlike most other places. What determines if it should be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is in the MonitorUpdateInProgress
-already-set case. monitor_update_paused
primarily just sets some flags and then also MonitorUpdateInProgress
. This if block just manually sets those flags instead. We could use the util but there's not much reason to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. I'm not a big fan of the util, to be honest. Seems the code would be much readable without it, given the amount of unnecessary parameters in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, definitely agree, guess I just kept using it cause it was already there and does "the thing". I'll remove it in a followup if that's okay.
We haven't had a `MonitorUpdateInProgress` check in `Channel::is_live` for some time.
In the coming commits we'll move to async `ChannelMonitorUpdate` application, which means we'll want to generate a `ChannelMonitorUpdate` (including a new counterparty commitment transaction) before we actually send it to our counterparty. To do that today we'd have to actually sign the commitment transaction by calling the signer, then drop it, apply the `ChannelMonitorUpdate`, then re-sign the commitment transaction to send it to our peer. In this commit we instead split `send_commitment_no_status_check` and `send_commitment_no_state_update` into `build_` and `send_` variants, allowing us to generate new counterparty commitment transactions without actually signing, then build them for sending, with signatures, later.
When a `ChannelMonitor` update completes, we may need to take some further action, such as exposing an `Event` to the user initiating another `ChannelMonitor` update, etc. This commit adds the basic structure to track such actions and serialize them as required. Note that while this does introduce a new map which is written as an even value which users cannot opt out of, the map is only filled in when users use the asynchronous `ChannelMonitor` updates. As these are still considered beta, breaking downgrades for such users is considered acceptable here.
In order to support fully async `ChannelMonitor` updating, we need to ensure that we can replay `ChannelMonitorUpdate`s if we shut down after persisting a `ChannelManager` but without completing a `ChannelMonitorUpdate` persistence. In order to support that we (obviously) have to store the `ChannelMonitorUpdate`s in the `ChannelManager`, which we do here inside the `Channel`. We do so now because in the coming commits we will start using the async persistence flow for all updates, and while we won't yet support fully async monitor updating it's nice to get some of the foundational structures in place now.
Over the next few commits, this macro will replace the `handle_monitor_update_res` macro. It takes a different approach - instead of receiving the message(s) that need to be re-sent after the monitor update completes and pushing them back into the channel, we'll not get the messages from the channel at all until we're ready for them. This will unify our message sending into only actually fetching + sending messages in the common monitor-update-completed code, rather than both there *and* in the functions that call `Channel` when new messages are originated.
8ba64e2
to
46d0711
Compare
Rebased and added one fixup. |
|
||
/// Only fails in case of signer rejection. Used for channel_reestablish commitment_signed | ||
/// generation when we shouldn't change HTLC/channel state. | ||
fn send_commitment_no_state_update<L: Deref>(&self, logger: &L) -> Result<(msgs::CommitmentSigned, (Txid, Vec<(HTLCOutputInCommitment, Option<&HTLCSource>)>)), ChannelError> where L::Target: Logger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: doesn't seem like we use the returned HTLCs anywhere so we could remove them from the return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it in a followup? That's an existing thing, no? (plus you're about to rewrite all that code :p)
let (_, mut additional_update) = self.send_commitment_no_status_check(logger).map_err(|e| (None, e))?; | ||
// send_commitment_no_status_check may bump latest_monitor_id but we want them to be | ||
let mut additional_update = self.build_commitment_no_status_check(logger); | ||
// build_commitment_no_status_check may bump latest_monitor_id but we want them to be | ||
// strictly increasing by one, so decrement it here. | ||
self.latest_monitor_update_id = monitor_update.update_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this track the pending queued updates as well, such that we can ensure we only resume the channel after all pending updates complete, rather than having to flap between monitor_paused
and monitor_restored
after each successful update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure why this comment is on this line, but in general we currently don't per the MonitorEvent::Completed::monitor_update_id
docs (which ChainMonitor
does implement properly). That's mostly because we previously didn't do any pending monitor update tracking in ChannelManager
, but we're going to once we persist the new monitor update queue, so we should probably change this, but it doesn't have to be now.
if let Some(monitor_update) = monitor_update_opt.take() { | ||
let update_id = monitor_update.update_id; | ||
let update_res = self.chain_monitor.update_channel(funding_txo_opt.unwrap(), monitor_update); | ||
break handle_new_monitor_update!(self, update_res, update_id, peer_state_lock, peer_state, chan_entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we only break if the monitor update didn't complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's even the purpose of the is_shutdown
check below? We can't guarantee the closing negotiation will be done by the time we get there, and we'll only check once as we break out of the loop immediately after. We also already remove the channel once we finish negotiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Channel::get_shutdown
's docs reference it - May jump to the channel being fully shutdown (see [
Self::is_shutdown]) in which case no [
ChannelMonitorUpdate] will be returned).
. Specifically, if the channel hasn't yet been funded then we'll jump to being fully shutdown, in which case we should just outright remove the channel. However, it looks like those are wrong! The check for whether to generate a monitor update is separate, and in face we definitely should do what the docs say! I've added another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice. Only thing now is to ensure we don't queue a shutdown
for a channel that has yet to be funded, which is a requirement in BOLT-2, but feel free to leave that for a follow-up.
cc9a9ef
to
0c85c30
Compare
Rebased and addressed feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if let Some(monitor_update) = monitor_update_opt.take() { | ||
let update_id = monitor_update.update_id; | ||
let update_res = self.chain_monitor.update_channel(funding_txo_opt.unwrap(), monitor_update); | ||
break handle_new_monitor_update!(self, update_res, update_id, peer_state_lock, peer_state, chan_entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice. Only thing now is to ensure we don't queue a shutdown
for a channel that has yet to be funded, which is a requirement in BOLT-2, but feel free to leave that for a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to squash.
self.pending_monitor_updates.push(monitor_update); | ||
return Ok(self.pending_monitor_updates.last().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. I'm not a big fan of the util, to be honest. Seems the code would be much readable without it, given the amount of unnecessary parameters in some cases.
In a previous PR, we added a `MonitorUpdateCompletionAction` enum which described actions to take after a `ChannelMonitorUpdate` persistence completes. At the time, it was only used to execute actions in-line, however in the next commit we'll start (correctly) leaving the existing actions until after monitor updates complete.
The TODO mentioned in `handle_monitor_update_res` about how we might forget about HTLCs in case of permanent monitor update failure still applies in spite of all our changes. If a channel is drop'd in general, monitor-pending updates may be lost if the monitor update failed to persist. This was always the case, and is ultimately the general form of the the specific TODO, so we simply leave comments there
4e34ff1
to
d8619ba
Compare
Squashed the named fixups and fixed the missing doclink. Note that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do those commits need to be squashed?
We currently have two codepaths on most channel update functions - most methods return a set of messages to send a peer iff the `ChannelMonitorUpdate` succeeds, but if it does not we push the messages back into the `Channel` and then pull them back out when the `ChannelMonitorUpdate` completes and send them then. This adds a substantial amount of complexity in very critical codepaths. Instead, here we swap all our channel update codepaths to immediately set the channel-update-required flag and only return a `ChannelMonitorUpdate` to the `ChannelManager`. Internally in the `Channel` we store a queue of `ChannelMonitorUpdate`s, which will become critical in future work to surface pending `ChannelMonitorUpdate`s to users at startup so they can complete. This leaves some redundant work in `Channel` to be cleaned up later. Specifically, we still generate the messages which we will now ignore and regenerate later. This commit updates the `ChannelMonitorUpdate` pipeline across all the places we generate them.
In the previous commit, we moved all our `ChannelMonitorUpdate` pipelines to use a new async path via the `handle_new_monitor_update` macro. This avoids having two message sending pathways and simply sends messages in the "monitor update completed" flow, which is shared between sync and async monitor updates. Here we reuse the new macro for handling `funding_signed` messages when doing an initial `ChannelMonitor` persistence. This provides a similar benefit, simplifying the code a trivial amount, but importantly allows us to fully remove the original `handle_monitor_update_res` macro.
Building on the previous commits, this finishes our transition to doing all message-sending in the monitor update completion pipeline, unifying our immediate- and async- `ChannelMonitor` update and persistence flows.
The `Channel::get_shutdown` docs are very clear - if the channel jumps to `Shutdown` as a result of not being funded when we go to initiate shutdown we should not generate a `ChannelMonitorUpdate` as there's no need to bother with the shutdown script - we're force-closing anyway. However, this wasn't actually implemented, potentially causing a spurious monitor update for no reason.
d8619ba
to
2adb8ee
Compare
Sadly some tests pass in the intermediary commits because the new code assumes that all |
Based on #1887, #1863, and #1886. Will require #1507, I think as well.We currently have two codepaths on most channel update functions -
most methods return a set of messages to send a peer iff the
ChannelMonitorUpdate
succeeds, but if it does not we push themessages back into the
Channel
and then pull them back out whenthe
ChannelMonitorUpdate
completes and send them then. This addsa substantial amount of complexity in very critical codepaths.
Instead, here we swap all our channel update codepaths to
immediately set the channel-update-required flag and only return a
ChannelMonitorUpdate
to theChannelManager
. Internally in theChannel
we store a queue ofChannelMonitorUpdate
s, which willbecome critical in future work to surface pending
ChannelMonitorUpdate
s to users at startup so they can complete.This also lays the groundwork for supporting fully async
ChannelMonitor
updates by tracking in-flight updates so that we can re-run them on startup if theChannelManager
was persisted but the monitor update did not finish.