Skip to content

Tell ChannelMonitors about HTLCs fulfilled after channel close #611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Apr 28, 2020

If a channel is about to hit the chain right as we receive a preimage,
prior to this commit the relevant ChannelMonitor would never learn of this
preimage.

TODO: tests + sanity check logic
TODO: I believe Matt said we needed a special update sequence number for this case. Look into why we need such a thing and how to do it 👀

Closes #595

@codecov
Copy link

codecov bot commented Apr 28, 2020

Codecov Report

Merging #611 (f85172d) into master (23a1d7a) will increase coverage by 0.02%.
The diff coverage is 93.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #611      +/-   ##
==========================================
+ Coverage   91.42%   91.45%   +0.02%     
==========================================
  Files          37       37              
  Lines       22065    22249     +184     
==========================================
+ Hits        20174    20348     +174     
- Misses       1891     1901      +10     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 85.36% <83.87%> (+0.08%) ⬆️
lightning/src/chain/channelmonitor.rs 95.44% <93.58%> (-0.23%) ⬇️
lightning/src/ln/functional_tests.rs 97.19% <96.39%> (-0.02%) ⬇️
lightning/src/chain/chainmonitor.rs 92.00% <100.00%> (ø)
lightning/src/ln/onchaintx.rs 93.78% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23a1d7a...6f1a0bf. Read the comment docs.

@ariard
Copy link

ariard commented Apr 28, 2020

@valentinewallace I don't know if it's about a special sequence number but extending ChannelMonitorUpdateStep::PaymentPreimage is needed to pass hltc_id and avoid hash collision

Also you need to extend ChannelMonitor::provide_payment_preimage and generate HTLC preimage transaction is there is such HTLC with given hash already onchain. Check broadcast_by_local_state you need to send a ClaimRequest for either a RemoteHTLC or LocalHTLC to OnchainTxHandler.

// learn the preimage as the channel already hit the chain and that's
// why it's missing.
let channel_state = &mut *channel_state_lock;
let chan_id = channel_state.short_to_id.get(&short_channel_id).unwrap().clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I dont think this would work - the channel in question likely no longer exists as we drop the Channel object once it hits the chain (and rely on ChannelMonitor). More likely need to be able to generate a ChannelMonitorUpdate with no channel knowledge using only the short_channel_id.

@TheBlueMatt
Copy link
Collaborator

Any update on this? Would be nice to land it soon-ish.

@TheBlueMatt TheBlueMatt added this to the 0.0.12 milestone Oct 4, 2020
@valentinewallace
Copy link
Contributor Author

valentinewallace commented Oct 5, 2020

Any update on this? Would be nice to land it soon-ish.

No update. Are you just checking to avoid dup work?

Otherwise, lmk if you have a preference on prioritization order of updating 681, reviewing 716 and this :)

@TheBlueMatt
Copy link
Collaborator

No update. Are you just checking to avoid dup work?

I wasn't particularly planning on trying to take it over or anything, was just double-checking that you still planned to come back to this.

Otherwise, lmk if you have a preference on prioritization order of updating 681, reviewing 716 and this :)

Nope, up to you.

@ariard
Copy link

ariard commented Oct 5, 2020

Yes it would be great to address this one as AFAIK that's our last obvious money-loss case. I can help writing test coverage if you want.

@valentinewallace valentinewallace force-pushed the fix-missing-htlc-claim branch 2 times, most recently from df942d3 to e01b7b3 Compare October 15, 2020 18:21
@valentinewallace
Copy link
Contributor Author

One todo left but advice welcome on the general approach.

@valentinewallace
Copy link
Contributor Author

I wasn't 100% sure how to handle the case where monitor update errors when we try to give it the preimage. So I modeled it after the handle_monitor_error! macro.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay see test coverage here : a23d50c.

Testing again your branch yells me the following error :

thread 'ln::functional_tests::test_unorderded_onchain_htlc_settlement' panicked at 'Attempted to apply ChannelMonitorUpdates out of order, check the update_id before passing an update to update_monitor!', lightning/src/chain/channelmonitor.rs:1177:13

This sounds to corroborate my comment update_fullfil_htlc, sadly I think this needs more work than expected to be sure we handle this case well. Namely, coordinate more offchain-to-onchain logic to be sure we keep reference for an offchain channel even when a commitment transaction hit the chain and all its HTLCs are well-resolved, whatever the source of resolution on the forward channel.

Ok(res) => {
let mon_info = MonitorUpdateInfo {
funding_outpoint: self.funding_txo.unwrap(),
latest_monitor_update_id: self.latest_monitor_update_id,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work, you're passing the monitor_update_id of a downstream channel to an upstream one, there is no guarantee they match as downstream/upstream channels are updated asynchronously.

Here downstream/upstream are to be interpreted as the order of HTLC relay, Alice -> Bob -> Caroll. Upstream is Alice-Bob, downstream is Bob-Caroll.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think we might as well just pass a sentinel here - we'd have to store a mapping of closed channels -> last monitor update id in ChannelManager otherwise and it doesn't seem worth it - we should just pass u64::MAX and note in documentation that "after the channel is closed more updates may occur until all HTLCs have been resolved, possibly using u64::MAX as the update_id"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, looking more in the logs, I think there is an underlying issue masked by the first monitor id order one.

  • in process_pending_monitor_events, if we receive a CommitmentTxBroadcast for a given channel, we remove it from channel_state.short_to_id, thus stopping any off-chain operation on it
  • in claim_funds_from_hop, using HTLCPreviousHopData we fetch the chan_id from channel_state.short_to_id, if it's missing we return an error

It sound we need to keep a reference to a channel in channel_state until all the forward HTLCs corresponding to the incoming HTLCs of this channel have been settled, either onchain or offchain.

Instead of removing channel in process_pending_monitor_events, maybe we should mark them irrevocably disabled.

/// with knowledge of the preimage. Thus, monitor information needs to be passed
/// to the `ChannelManager` on receipt of a preimage so it's capable of updating
/// the relevant `ChannelMonitor` after the channel's data has been removed from
/// the `ChannelManager`'s `ChannelHolder`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my point above, but I fear we need to change our channel force-close model to conserve a channel reference until all its HTLC are closed. When we learn a preimage from the downstream offchain channel we need access to both the funding outpoint and monitor update id.

An alternative would be to add upstream's funding outpoint/monitor_update_id in PreviousHopData but still you would have to update monitor_update_id of all forward HTLCs, this sounds gross.

@valentinewallace
Copy link
Contributor Author

@ariard @TheBlueMatt just to confirm my understanding of the new target solution --
add a map in ChannelManager that stores a new ClosedChannel struct that only has a few fields (channel outpoint, monitor update ID, counterparty node ID), then use the map in claim_funds_internal to construct the PaymentPreimage monitor update?

@TheBlueMatt
Copy link
Collaborator

@devrandom noted on slack that we can't rely on anything in ChannelManager here - if the user persists two ChannelMonitors for a relayed HTLC without persisting the updated ChannelManager before crashing/restarting, we'll force close both channels and may need to claim the payment onchain on the inbound edge without any new data in ChannelManager. I think the original solution of just passing a sentinel for update_id may make the most sense.

@ariard
Copy link

ariard commented Oct 21, 2020

I think the original solution of just passing a sentinel for update_id may make the most sense.

Do you mean passing a sentinel value in MonitorUpdateInfo in fn update_fulfill_htlc ? You need this but even the code sounds to be still wrong, it's using the funding outpoint of the outbound edge, not the inbound one. It's either using a map or extending HTLCPreviousHopData with funding_outpoint.

Learning offchain preimage on outbound edge to pass it backward to inbound edge onchain monitor should be done atomically, if it failed in between you should replay update_fulfill_htlc. W.r.t this concern, channelmanager::update_fulfill_htlc sounds correct.

@TheBlueMatt
Copy link
Collaborator

even the code sounds to be still wrong, it's using the funding outpoint of the outbound edge, not the inbound one.

Right, that would need to be fixed, but note that as devrandom observed we cannot rely on any data in ChannelManager for this, ie the information has to be passed into ChannelMonitor and stored there.

@valentinewallace
Copy link
Contributor Author

Update: sorry the last day or so random things kept coming up, but now working on:
"once we give the ChannelMonitor the preimage, it needs to generate a ClaimRequest that its OnchainTxHandler will use to broadcast the claim transaction"
(I somehow was under the impression this was already handled, so input welcome if I'm going the wrong direction here.)

@valentinewallace valentinewallace force-pushed the fix-missing-htlc-claim branch 4 times, most recently from ba328d9 to e46173b Compare October 26, 2020 19:33
@valentinewallace valentinewallace marked this pull request as ready for review October 26, 2020 19:46
@valentinewallace
Copy link
Contributor Author

Probably still some cleanup needed on the last commit, but review on the general approach welcome.

@jkczyz jkczyz self-requested a review October 27, 2020 22:16
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the commit history, it's really helpful at least for the offchain part.

See comment above missed case with the holder commitment transaction is closing the channel. I don't think actual code is working against it. Overall, it's pretty good given the code complexity involved :)

self.payment_preimages.insert(payment_hash.clone(), payment_preimage.clone());
// If the channel is force closed, try to claim the output from this preimage
if self.lockdown_from_offchain || self.holder_tx_signed {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code doesn't cover the TODO left in test_unordered_onchain_htlc_settlement : "reverse the test with Bob's commitment reaching the chain instead of Alice's one".

Instead of relying on lockdown_from_offchain/holder_tx_signed to decide to broadcast a preimage tx, we should try :

  • a) in the presence of counterparty commitment onchain, broadcast a HTLC-Preimage transaction. Note you need to check both current_counterparty_commitment_txid/prev_counterparty_commitment_txid against counterparty_commitment_txn_on_chain.
  • b) in the absence of counterparty commitment onchain, just try to generate a HTLC-Success for both holder_commitment_txid/prev_holder_commitment_txid. We will broadcast them but doesn't matter if we have not broadcast previously a commitment transaction. They won't neither propagate or confirm due to their orphaness.

I guess b) for now is okay even if in the future we should clean out lockdown_from_offchain/holder_tx_signed with some better flag also signaling a commitment has been seen onchain.

I think claim_counterparty_htlc is okay for now but I would lean towards reusing the same codepath that block_connected to avoid code duplication. FYI, avoiding duplication was one of the motivation of introducing OnchainTxHandler. Also get_counterparty_htlc_output_claims_reqs should be clean out after #642 (don't worry for the conflict, I'll handle it).

Copy link
Contributor Author

@valentinewallace valentinewallace Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with most of your comment. Only thing:

We will broadcast them but doesn't matter if we have not broadcast previously a commitment transaction. They won't neither propagate or confirm due to their orphaness.

Hmm, it feels weird to always broadcast two HTLC-success transactions on receipt of a preimage. I think we should only do this if holder_tx_signed is true, since if it hasn't been signed there's no way it's been seen on-chain. (Alternately we could replace holder_tx_signed with a map, holder_commitment_tx_broadcasted<Txid, u64> that tracks broadcasted holder commitment txs, similar to counterparty_commitment_txn_on_chain).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk what you think of the updated solution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only do this if holder_tx_signed is true, since if it hasn't been signed there's no way it's been seen on-chain.

There is a caveat where this might not be true, in case of distributed watchtowers :

  • Waldo broadcast the holder commitment with an incoming HTLC output
  • The holder commitment confirms in block X
  • Wally sees blocks X, does nothing, not preimage yet
  • Wally receives a preimage from offchain, its holder_tx_signed might still be false

It's weird but make it easier to reason for this kind of scenario :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I see what you mean. I switched it so now we broadcast the 2 preimage claiming transactions if self.broadcasted_holder_revokable_script.is_some(). I think this should work because it'll be non-None if we see any of our commitment txs onchain. Lmk what you think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the usage with the above scenario, at least mention it's done due to distributed watchtowers setup ? Current comment is saying "Then if a holder commitment transaction has been signed", it's a bit ambiguous as laid-out scenario is a case where commitment has been signed but not by this local instance.

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valentinewallace see my replies to the 2 unsolved comments from previous review.. Both require actions, I think. Afterwards, it sounds ready to me!

Thanks for the new good quality comments and fixing the missing tx-generation case.

@valentinewallace valentinewallace force-pushed the fix-missing-htlc-claim branch 3 times, most recently from 73e013c to 0272c2d Compare November 6, 2020 13:40
@ariard
Copy link

ariard commented Nov 8, 2020

See this comment and this comment needing resolution. Otherwise, new changes are okay.

@ariard
Copy link

ariard commented Nov 10, 2020

Thanks for the added comment, ACK f29238d

@valentinewallace valentinewallace force-pushed the fix-missing-htlc-claim branch 2 times, most recently from 36f1b29 to 9cca0e4 Compare November 10, 2020 21:04
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more request for tests, but everything else looks good, just a few nits that are almost not worth fixing.

else if let Some(point) = revocation_points.2.as_ref() {
if revocation_points.0 == commitment_number + 1 { Some(point) } else { None }
} else { None };
else if let Some(point) = revocation_points.2.as_ref() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really "right" to make the "else" at a different indentation from the "if"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be worthy to document this code:

diff --git a/lightning/src/chain/channelmonitor.rs b/lightning/src/chain/channelmonitor.rs
index 79c91d9b..1878362e 100644
--- a/lightning/src/chain/channelmonitor.rs
+++ b/lightning/src/chain/channelmonitor.rs
@@ -1516,10 +1516,16 @@ impl<ChanSigner: ChannelKeys> ChannelMonitor<ChanSigner> {
                if let Some(htlc_outputs) = self.counterparty_claimable_outpoints.get(&commitment_txid) {
                        if let Some(revocation_points) = self.their_cur_revocation_points {
                                let revocation_point_option =
-                                       if revocation_points.0 == commitment_number { Some(&revocation_points.1) }
-                               else if let Some(point) = revocation_points.2.as_ref() {
-                                       if revocation_points.0 == commitment_number + 1 { Some(point) } else { None }
-                               } else { None };
+                                       // If counterparty commitment tx is the latest valid state, use their latest per-commitment point
+                                       if revocation_points.0 == commitment_number { 
+                                               Some(&revocation_points.1) 
+                                       } else if let Some(point) = revocation_points.2.as_ref() {
+                                               // If counterparty commitment tx is the previous latest valid state, use their previous latest per-commitment point
+                                               // (non-atomicity of revocation means it's lawful for them to have temporarily two valid commitment txn from our
+                                               // viewpoint)
+                                               if revocation_points.0 == commitment_number + 1 { Some(point) } else { None }
+                                       } else { None };
+
                                if let Some(revocation_point) = revocation_point_option {
                                        for (_, &(ref htlc, _)) in htlc_outputs.iter().enumerate() {
                                                if let Some(transaction_output_index) = htlc.transaction_output_index {

Can we really hit the None case or wouldn't this a hint of a more serious issue like lost monitor state ? Tweaking it as a panic we don't have coverage for this, and not even sure you can hit it even assuming state loss. Reaching it means you have at least some per_commitment_option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs is better, sure. I was just noting that the only change here was indentation - and I think the old indentation was much clearer than the new. Best to just not have any diff instead of changing unrelated code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for hitting the None case, it should be? Code reads to me like the None case should be hit if our counterparty broadcasts a non-revoked state, but maybe I need to read deeper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the docs @ariard and fixed the ugly indentation

// *we* sign a holder commitment transaction, not when e.g. a watchtower broadcasts one of our
// holder commitment transactions.
if self.broadcasted_holder_revokable_script.is_some() {
let (claim_reqs, _, _) = self.broadcast_by_holder_state(None, &self.current_holder_commitment_tx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov doesn't think you hit this branch at all in testing. It would really be nice to.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But comment it out and you break test_onchain_htlc_settlement_after_close1 ? Are we really sure about how Codecov is working :) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I had mistakenly removed a test that covered this branch, I put it back. @ariard apparently one of the tests from this discussion: #611 (comment) wasn't fully redundant, though I had been convinced it was as well.

This will be used in upcoming commits to allow us to update a channel
monitor with a preimage after its channel has closed.
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few smalls comments, for the missing test feel free to add an issue.
Otherwise SGTM.

if updates.update_id == CLOSED_CHANNEL_UPDATE_ID {
match updates.updates[0] {
ChannelMonitorUpdateStep::PaymentPreimage { .. } => {},
_ => panic!("Attempted to apply post-force-close ChannelMonitorUpdate that wasn't providing a payment preimage"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No coverage for this ? All test pass.

diff --git a/lightning/src/chain/channelmonitor.rs b/lightning/src/chain/channelmonitor.rs
index 5faa7f96..28832c03 100644
--- a/lightning/src/chain/channelmonitor.rs
+++ b/lightning/src/chain/channelmonitor.rs
@@ -1231,7 +1231,7 @@ impl<ChanSigner: ChannelKeys> ChannelMonitor<ChanSigner> {
                if updates.update_id == CLOSED_CHANNEL_UPDATE_ID {
                        match updates.updates[0] {
                                ChannelMonitorUpdateStep::PaymentPreimage { .. } => {},
-                               _ => panic!("Attempted to apply post-force-close ChannelMonitorUpdate that wasn't providing a payment preimage"),
+                               _ => {} //panic!("Attempted to apply post-force-close ChannelMonitorUpdate that wasn't providing a payment preimage"),
                        }
                        assert_eq!(updates.updates.len(), 1);
                }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, opened #751

valentinewallace and others added 4 commits November 16, 2020 15:41
If the channel is hitting the chain right as we receive a preimage,
previous to this commit the relevant ChannelMonitor would never
learn of this preimage.
Helpful for debugging. I also included the change in the provide_preimage method
signature which will be used in an upcoming commit, because commit-wise it was
easier to combine the changes.
Now callers will separately retrieve the claim requests/
holder revokable script and the new watched holder outputs.
This will be used in the next commit for times when we
need to get holder claim requests, but don't have access to
the holder commitment transaction.
If we receive a preimage for an outgoing HTLC that solves an output on a
backwards force-closed channel, we need to claim the output on-chain.

Note that this commit also gets rid of the channel monitor redundantly setting
`self.counterparty_payment_script` in `check_spend_counterparty_transaction`.

Co-authored-by: Antoine Riard <[email protected]>
Co-authored-by: Valentine Wallace <[email protected]>
@ariard
Copy link

ariard commented Nov 17, 2020

ACK 6f1a0bf

@TheBlueMatt TheBlueMatt merged commit 4e82003 into lightningdevkit:master Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement off-chain downstream to on-chain upstream incoming HTLC claiming
4 participants