Skip to content

Delay RAA-after-next processing until PaymentSent is are handled #2112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

In 0ad1f4c943bdc9037d0c43d1b74c745befa065f0 we fixed a nasty bug
where a failure to persist a `ChannelManager` faster than a
`ChannelMonitor` could result in the loss of a `PaymentSent` event,
eventually resulting in a `PaymentFailed` instead!

As noted in that commit, there's still some risk, though its been
substantially reduced - if we receive an `update_fulfill_htlc`
message for an outbound payment, and persist the initial removal
`ChannelMonitorUpdate`, then respond with our own
`commitment_signed` + `revoke_and_ack`, followed by receiving our
peer's final `revoke_and_ack`, and then persist the
`ChannelMonitorUpdate` generated from that, all prior to completing
a `ChannelManager` persistence, we'll still forget the HTLC and
eventually trigger a `PaymentFailed` rather than the correct
`PaymentSent`.

Here we fully fix the issue by delaying the final
`ChannelMonitorUpdate` persistence until the `PaymentSent` event
has been processed and document the fact that a spurious
`PaymentFailed` event can still be generated for a sent payment.

The original fix in 0ad1f4c943bdc9037d0c43d1b74c745befa065f0 is
still incredibly useful here, allowing us to avoid blocking the
first `ChannelMonitorUpdate` until the event processing completes,
as this would cause us to add event-processing delay in our general
commitment update latency. Instead, we ultimately race the user
handling the `PaymentSent` event with how long it takes our
`revoke_and_ack` + `commitment_signed` to make it to our
counterparty and receive the response `revoke_and_ack`. This should
give the user plenty of time to handle the event before we need to
make progress.

Sadly, because we change our `ChannelMonitorUpdate` semantics, this
change requires a number of test changes, avoiding checking for a
post-RAA `ChannelMonitorUpdate` until after we process a
`PaymentSent` event. Note that this does not apply to payments we
learned the preimage for on-chain - ensuring `PaymentSent` events
from such resolutions will be addressed in a future PR. Thus, tests
which resolve payments on-chain switch to a direct call to the
`expect_payment_sent` function with the claim-expected flag unset.

Depends on #2111, is just the last commit on it. See discussion there for merge-ordering.

@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 5 times, most recently from 6789818 to 6cb3b5e Compare March 17, 2023 07:04
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2023

Codecov Report

Patch coverage: 99.53% and project coverage change: +0.39% 🎉

Comparison is base (d4ad826) 90.40% compared to head (097fc94) 90.79%.

❗ Current head 097fc94 differs from pull request most recent head 31049ed. Consider uploading reports for the commit 31049ed to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
+ Coverage   90.40%   90.79%   +0.39%     
==========================================
  Files         106      106              
  Lines       56268    59332    +3064     
  Branches    56268    59332    +3064     
==========================================
+ Hits        50868    53871    +3003     
- Misses       5400     5461      +61     
Files Changed Coverage Δ
lightning/src/events/mod.rs 44.44% <ø> (+1.92%) ⬆️
lightning/src/ln/channelmanager.rs 88.37% <97.22%> (+2.84%) ⬆️
lightning-invoice/src/utils.rs 97.67% <100.00%> (-0.03%) ⬇️
lightning/src/chain/chainmonitor.rs 95.01% <100.00%> (-0.03%) ⬇️
lightning/src/ln/chanmon_update_fail_tests.rs 98.71% <100.00%> (+0.01%) ⬆️
lightning/src/ln/channel.rs 91.42% <100.00%> (+1.95%) ⬆️
lightning/src/ln/functional_test_utils.rs 89.30% <100.00%> (+0.48%) ⬆️
lightning/src/ln/functional_tests.rs 98.26% <100.00%> (+0.11%) ⬆️
lightning/src/ln/monitor_tests.rs 98.52% <100.00%> (-0.01%) ⬇️
lightning/src/ln/outbound_payment.rs 91.93% <100.00%> (+1.68%) ⬆️
... and 3 more

... and 27 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from 6cb3b5e to 26e3e00 Compare March 17, 2023 21:09
@wpaulino wpaulino self-requested a review March 21, 2023 20:02
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 3 times, most recently from c7e8f27 to f94de18 Compare March 28, 2023 22:01
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 3 times, most recently from e34dab6 to 031bd1b Compare April 6, 2023 18:09
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 7 times, most recently from 578ca9b to 3ff2911 Compare April 17, 2023 20:55
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 3 times, most recently from 422f630 to efde36c Compare May 4, 2023 01:43
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from efde36c to 9a36e4a Compare May 4, 2023 21:28
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from ba1207a to 097fc94 Compare July 28, 2023 05:51
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me pending a second reviewer!

@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from 097fc94 to 3df2604 Compare July 28, 2023 22:42
@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes.

@@ -1152,7 +1152,7 @@ impl OutboundPayments {
payment_id,
payment_hash,
path,
}, None));
}, Some(ev_completion_action)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does PaymentPathSuccessful also need the completion action? There's no coverage here, setting it to None and all tests pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it there means we won't lose PaymentPathSuccessful events on a rare restart edge case, but currently isn't externally visible and will only have an impact if we change event handling (eg where event handling can return an err) - because users always have to handle all events in one big batch before we remove them, you either fully handle events or you don't handle any events. If we move to event handling being able to return an err, a user could handle one event and not the other and lose PaymentPathSuccessful events, maybe. It doesn't matter much, really, but I'd prefer to leave it.

wpaulino
wpaulino previously approved these changes Aug 14, 2023
@@ -5928,10 +5938,16 @@ where
let peer_state = &mut *peer_state_lock;
match peer_state.channel_by_id.entry(msg.channel_id) {
hash_map::Entry::Occupied(mut chan) => {
let funding_txo = chan.get().context.get_funding_txo();
let (htlcs_to_fail, monitor_update_opt) = try_chan_entry!(self, chan.get_mut().revoke_and_ack(&msg, &self.fee_estimator, &self.logger), chan);
let funding_txo_opt = chan.get().context.get_funding_txo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use expect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been added to this line, no? No big deal though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unwraping on this line is unsafe and can panic, as we could receive an RAA message for a channel that isnt yet funded, and wouldnt fail until we call into channel...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, or would have, in the version of LDK when this code was written - before the channel split :)

wpaulino
wpaulino previously approved these changes Aug 16, 2023
@wpaulino
Copy link
Contributor

Ah, needs a rebase to fix some test compile errors.

@TheBlueMatt TheBlueMatt dismissed stale reviews from wpaulino and valentinewallace via dcfd5d7 August 16, 2023 22:45
@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from 233b8ec to dcfd5d7 Compare August 16, 2023 22:45
@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Aug 16, 2023

Rebased with the following additional diff:

$ git diff
diff --git a/lightning/src/ln/functional_tests.rs b/lightning/src/ln/functional_tests.rs
index 2aecf72d4..440d9df3e 100644
--- a/lightning/src/ln/functional_tests.rs
+++ b/lightning/src/ln/functional_tests.rs
@@ -10096,8 +10096,8 @@ fn do_test_multi_post_event_actions(do_reload: bool) {
                nodes[1].node.peer_disconnected(&nodes[0].node.get_our_node_id());
                nodes[2].node.peer_disconnected(&nodes[0].node.get_our_node_id());

-               reconnect_nodes(&nodes[0], &nodes[1], (false, false), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (false, false));
-               reconnect_nodes(&nodes[0], &nodes[2], (false, false), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (false, false));
+               reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[1]));
+               reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[2]));
        }

        let events = nodes[0].node.get_and_clear_pending_events();
diff --git a/lightning/src/ln/payment_tests.rs b/lightning/src/ln/payment_tests.rs
index 44d7566b0..fe5bd37df 100644
--- a/lightning/src/ln/payment_tests.rs
+++ b/lightning/src/ln/payment_tests.rs
@@ -3746,7 +3746,7 @@ fn do_test_custom_tlvs_consistency(first_tlvs: Vec<(u64, Vec<u8>)>, second_tlvs:
                }

                do_claim_payment_along_route(&nodes[0], &[&[&nodes[1], &nodes[3]], &[&nodes[2], &nodes[3]]], false, our_payment_preimage);
-               expect_payment_sent(&nodes[0], our_payment_preimage, Some(Some(2000)), true);
+               expect_payment_sent(&nodes[0], our_payment_preimage, Some(Some(2000)), true, true);
        } else {
                // Expect fail back
                let expected_destinations = vec![HTLCDestination::FailedPayment { payment_hash: our_payment_hash }];

@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch from dcfd5d7 to 4e93462 Compare August 16, 2023 22:47
wpaulino
wpaulino previously approved these changes Aug 16, 2023
@valentinewallace
Copy link
Contributor

Needs rebase :(

This is a trivial refactor which will be used in the next commit.
In 0ad1f4c we fixed a nasty bug
where a failure to persist a `ChannelManager` faster than a
`ChannelMonitor` could result in the loss of a `PaymentSent` event,
eventually resulting in a `PaymentFailed` instead!

As noted in that commit, there's still some risk, though its been
substantially reduced - if we receive an `update_fulfill_htlc`
message for an outbound payment, and persist the initial removal
`ChannelMonitorUpdate`, then respond with our own
`commitment_signed` + `revoke_and_ack`, followed by receiving our
peer's final `revoke_and_ack`, and then persist the
`ChannelMonitorUpdate` generated from that, all prior to completing
a `ChannelManager` persistence, we'll still forget the HTLC and
eventually trigger a `PaymentFailed` rather than the correct
`PaymentSent`.

Here we fully fix the issue by delaying the final
`ChannelMonitorUpdate` persistence until the `PaymentSent` event
has been processed and document the fact that a spurious
`PaymentFailed` event can still be generated for a sent payment.

The original fix in 0ad1f4c is
still incredibly useful here, allowing us to avoid blocking the
first `ChannelMonitorUpdate` until the event processing completes,
as this would cause us to add event-processing delay in our general
commitment update latency. Instead, we ultimately race the user
handling the `PaymentSent` event with how long it takes our
`revoke_and_ack` + `commitment_signed` to make it to our
counterparty and receive the response `revoke_and_ack`. This should
give the user plenty of time to handle the event before we need to
make progress.

Sadly, because we change our `ChannelMonitorUpdate` semantics, this
change requires a number of test changes, avoiding checking for a
post-RAA `ChannelMonitorUpdate` until after we process a
`PaymentSent` event. Note that this does not apply to payments we
learned the preimage for on-chain - ensuring `PaymentSent` events
from such resolutions will be addressed in a future PR. Thus, tests
which resolve payments on-chain switch to a direct call to the
`expect_payment_sent` function with the claim-expected flag unset.
@TheBlueMatt
Copy link
Collaborator Author

Rebased.

@TheBlueMatt TheBlueMatt merged commit cb923c6 into lightningdevkit:main Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants