Skip to content

Fill out failure codes for onion packet #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

yuntai
Copy link
Contributor

@yuntai yuntai commented Sep 6, 2018

Partial patch for #146
Some additions of returning errors and fix in failure message (additional parameters for UPDATE).
Moving on to work on handling returned failure codes (and looping them back to the router).

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore the full_stack_target test failure, I'll fix it pre-merge should be easy to fix.

break Some(("CLTV expiry is too close", 0x1000 | 14, self.get_channel_update(chan).unwrap()));
}
/*
if chan.is_disabled() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I was thinking more !is_live() is this case. Maybe just return 0x1000|20 for that?

break Some(("Forwarding node has tampered with the intended HTLC values or origin node has an obsolete cltv_expiry_delta", 0x1000 | 13, self.get_channel_update(chan).unwrap()));
}
let cur_height = self.latest_block_height.load(Ordering::Acquire) as u32 + 1;
if msg.cltv_expiry <= cur_height + CLTV_EXPIRY_DELTA as u32 { // expiry_too_soon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, dont think we need the full CLTV_EXPIRY_DELTA here. The spec doesn't actually say what to use here (as far as I see) and appears to assume that meeting the min_final_cltv_expiry will suffice for every node on the path. I think all we really care about here is that we don't end up in a situation where we early-force-close a channel (ChannelMonitor::would_broadcast_at_height) because of a much-too-soon CLTV (implying, obviously, that no expiration is already past). This means:

  • for the inbound HTLC (aka msg.cltv_expiry) that we wont learn the payment preimage within channelmonitor::CLTV_CLAIM_BUFFER. This implies that the outbound HTLC CLTV must be at least CLTV_CLAIM_BUFFER before the inbound HTLC (trivially true due to CLTV_EXPIRY_DELTA being greater, though maybe we could add a comment/static assert indicating that is a requirement) and that the peer is able to do the update_fulfill_htlc/commitment/revoke dance within CLTV_EXPIRY_DELTA - CLTV_CLAIM_BUFFER blocks (which should never be an issue, though, again, we should probably add a comment noting that CLTV_EXPIRY_DELTA should always be a block or three greater than CLTV_CLAIM_BUFFER).
  • for the outbound HTLC (aka outgoing_cltv_value) that we receive a update_fail_htlc and do the commitment/revoke dance prior to the HTLC's expiration minus CLTV_CLAIM_BUFFER. Now that isnt actually possible to ensure right now, but I think we can change the check in ChannelMonitor to only force-close outbound if we're a block or three after the actual expiration as long as CLTV_EXPIRY_DELTA is sufficiently larger than CLTV_CLAIM_BUFFER (or, I think, at least 2x CLTV_CLAIM_BUFFER cause that will give us CLTV_EXPIRY_DELTA blocks to get it confirmed, which is the meaning of CLTV_EXPIRY_DELTA).

All that to say, if we tweak ChannelMonitor::would_broadcast_at_height and ChannelMonitor::block_connected we can probably just check that outgoing_htlc_value is at least the current height + a few blocks and dont worry about anything else assuming the next hop will either immediately fail it due to it being too close to their desired final_cltv_expiry or not being able to meet their own CLTV_EXPIRY_DELTA or they will fail it because it expires soon.

Copy link
Contributor Author

@yuntai yuntai Oct 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the outbound HTLC (aka outgoing_cltv_value) that we receive a update_fail_htlc and do the commitment/revoke dance prior to the HTLC's expiration minus CLTV_CLAIM_BUFFER.

I don't clearly understand this sentence.

Did you mean the outbound HTLC need to be force-closed if update_fail_htlc/commitment_dance is not received prior to corresponding incoming HTLC's expiration minus CLTV_CLAIM_BUFFER?

ah, or you meant the outbound HTLC need to be force-closed if update_fail_htlc/commitment_dance is not received prior to the HTLC's expiration minus CLTV_CLAIM_BUFFER, because we will know the downstream node will force-close channel since they are within their CLTV_CLAIM_BUFFEER prior to HTLC's expiration so we need to force-close and try to fetch pre-image from the chain.
(but with CLTV_EXPIRY_DELTA >> CLTV_CLAIM_BUFFER, it's sufficient to just time out at the actual expiry (+1) since we have ample time before our deadline?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second, kinda. We need to force-close the channel if the upstream node hasn't done the update_fail_htlc/commitment_signed dance at least in enough time for us to claim the corresponding incoming HTLC. Right now we force-close if they haven't update_fail danced at least by a few blocks before the HTLC expires, but that seems like overkill, instead I think we should change it to only force-close if they haven't update_fail danced at least a few blocks after the HTLC expires, as long as we do it at least CLTV_CLAIM_BUFFER blocks prior to the incoming HTLC expiring, which we can ensure by ensuring that CLTV_EXPIRY_DELTA is at least 2x as large as CLTV_CLAIM_BUFFER.

else if code == 0x1000 | 13 {
res.extend_from_slice(&byte_utils::be32_to_array(msg.cltv_expiry));
}
res.extend_from_slice(&chan_update.encode_with_len()[..]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should include the channel_update if we are returning the amt_to_forward or cltv_expiry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE|11, 12 and 13 seem to include channel_udpate in the RFC.

@yuntai
Copy link
Contributor Author

yuntai commented Oct 9, 2018

WIP. started implementing handling onion failure codes in the origin node.

@yuntai yuntai force-pushed the 201809-issue146-failurenonion branch 2 times, most recently from 924ebde to 37d1dac Compare October 14, 2018 14:02
@yuntai
Copy link
Contributor Author

yuntai commented Oct 15, 2018

Still, WIP but the current patch contains some breaking changes.

  • Add channel_update_timestamp to RouteHop for its outbound to store chan_update timestamp when a route is calculated to compare it against a new chan_update coming with onion failure message
    (Maybe it'd make more sense in Route as a separate vec.)
  • Currently, onion failure message is processed in internal_update_fail_htlc()(understandably, as we want to update network view as soon as we know about node/channel failures) and it's decided here whether the payment can be retried. But, as my understanding, we don't want to fire a PamentFailed event until the HTLC is revoked. Wanted to keep this boolean(payment_retryable) around until RAA() but split-minded where to keep it. (feeling better with keeping HTLCSource immutable but don't feel like adding a new hashmap)
  • Add PaymentData field in HTLCSource to keep htlc_msat forwarded to the first hop and block_height when the route is calculated to reconstruct incoming_htlc and CLTV forwardly when handling onion failure messages.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome work, thanks. I'll review in a bit more detail later, but in the mean time can you take the change at TheBlueMatt@2018-10-157-changes and then make the changes I left comments on?

@@ -291,6 +301,8 @@ pub struct ChannelManager {
}

const CLTV_EXPIRY_DELTA: u16 = 6 * 24 * 2; //TODO?
const CLTV_FAR_FAR_AWAY: u16 = 6 * 24 * 7; //TODO?
const FINAL_NODE_TIMEOUT: u16 = 3; //TODO?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this needs to be a new constant - channelmonitor::CLTV_CLAIM_BUFFER + HTLC_FAIL_TIMEOUT_BLOCKS should be right, I think.

break Some(("Forwarding node has tampered with the intended HTLC values or origin node has an obsolete cltv_expiry_delta", 0x1000 | 13, self.get_channel_update(chan).unwrap()));
}
let cur_height = self.latest_block_height.load(Ordering::Acquire) as u32 + 1;
if msg.cltv_expiry <= cur_height + 3 as u32 { // expiry_too_soon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we want here is CLTV_CLAIM_BUFFER + HTLC_FAIL_TIMEOUT_BLOCKS (aka "we want to have at least HTLC_FAIL_TIMEOUT_BLOCKS to fail prior to going on chain CLAIM_BUFFER blocks before expiration"). Probably add that text in a comment here as well :p.

@yuntai yuntai force-pushed the 201809-issue146-failurenonion branch from 37d1dac to 7ec3073 Compare October 17, 2018 08:03
@yuntai
Copy link
Contributor Author

yuntai commented Oct 17, 2018

Cherrypicked the commit and fixed the cltv_expiry checks. Working on writing the test so this will be ready for a review soon.

@yuntai
Copy link
Contributor Author

yuntai commented Oct 18, 2018

Still WIP. Pushed just in case any feedback on changing their structure around a bit. Also moved PaymentFailed from fail_htlc_backwards_internal() to update_fail_htlc() since it's okay for now per #209.

@TheBlueMatt
Copy link
Collaborator

Oops, no, definitely cant move PaymentFailed to update_fail_htlc, if anything PaymentSent needs to move the other way - out of update_fulfill_htlc. If a node sends an update_fail_htlc then disconnects, they can still reconnect and claim the payment. If they do the same thing for update_fulfill then we just get free money, so not as big a deal (aside from the duplicative event).

@TheBlueMatt
Copy link
Collaborator

Sorry about the conflict due to #211, its a trivial rebase, though.

@yuntai
Copy link
Contributor Author

yuntai commented Oct 18, 2018

Hmm, right. Then I guess I can move onion failure message processing after commitment_signed_dance. Actually, with a good reason, I just thought of - to prevent dos attack on our routing table from the attacker sending bogus update_fail!

yuntai and others added 8 commits October 22, 2018 01:36
to compare it agains the timestamp of channel_update coming with onion
failure message
Add PaymentData to HTLCSource to reconstruct incoming_htlc_msat and cltv for each
node when an onion failure message is returned. These values are used to
check whether an updated chan_update is valid.

Refactor onion failure code and more error cases are handled
See new comments in code for more details
@TheBlueMatt
Copy link
Collaborator

@yuntai whats the status of this? This will probably conflict up and down with #217, I'm happy to wait a few days on this if you want to do some quick back-and-forth and get this one in first, but otherwise it will probably need some rebase.

Add retryable and error_code fields to PaymentFailed event
Add RouteUpdate event ala lightningdevkit#217 containing msgs::HTLCFailChannelUpdate
Move msgs::HTLCFailChannelUpdate from update_fail_htlc's return value to
the payload of RouteUpdate fired inside fail_htlc_backwards_interanl
Some fixes in decode_update_add_htlc_onion
Move procesing onion failure to fail_htlc_backwards_internal
Test for process_onion_failure
@yuntai yuntai force-pushed the 201809-issue146-failurenonion branch from 77662bd to a69e5bb Compare October 22, 2018 09:41
@yuntai
Copy link
Contributor Author

yuntai commented Oct 22, 2018

I updated https://github.com/yuntai/rust-lightning/tree/201809-issue146-failurenonion
but the PR is not automatically updated.
oh, GitHub is back, yay!

Still, WIP needed to fill out the rest of the test cases. The current version of the patch is mergeable and expect that the remaining changes will only be inside test. I think I could open another PR to continue working on it once it's rebased and merged with that gigantic patch you have.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Oct 22, 2018

Oops, I dont think the HTLCFailChannelUpdate needs to be out-of-band - there's no requirement that it be handled in any specific order and doing so is a bit less performant. I'll rebase/review/revert that bit now unless you're around and want to do it yourself.

Err, I take that back. I guess it's nice that we don't handle an update_fail_htlc onion error and take action based on that if it might get replaced with a new one on disconnect/reconnect. In any case, I'll rebase/review as-is.

@TheBlueMatt
Copy link
Collaborator

Will take as #219.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants