Skip to content

Rewrite Channel resend tracking to make it much more reliable #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

Resending revoke_and_ack and commitment_signed (+update) messages
after monitor-update-failure or disconnection has been a highly
unreliable part of our codebase for some time (as evidenced by the
number of bugs caught in the chanmon_fail_consistency fuzz target).
This is due to its rather ad-hoc nature and tracking/behavior which
consists of checking a number of different flags to try to deduce
which messages were/were not delivered and go from there. Instead,
this commit rewrites it to simply keep track of the order messages
were generated originally, as we always resend in the
originally-generated order.

I'm anticipating this will be way more robust than the old code, in
addition to its simplicity.

Based on #314, #316, and #319. I have some tests which failed prior to this change that I need to backport from chanmon_fail_consistency, as well as #319 needs some tests.

@TheBlueMatt TheBlueMatt added this to the 0.0.9 milestone Mar 7, 2019
@TheBlueMatt TheBlueMatt force-pushed the 2019-03-chan-send-rewrite branch from 674fdca to 2cf9e39 Compare March 7, 2019 21:56
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, but also seems easy to mess up there, so glad to have chanmon_fail_consistency (does this last one test overlapping of commitment signed, i.e issued and received by both sides before any RAA, no sure with a quick skim on it ?)

/// When resending CS/RAA messages on channel monitor restoration or on reconnect, we always
/// need to ensure we resend them in the order we originally generated them. Thus, we track
/// that order here by always setting this to the opposite of the message we are generating (ie
/// when we generate a CS, we set this to RAAFirst as we should send any pending RAAs first and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think got the idea, but maybe could be a little be clearer, something like: when we get CS_1 from B and generate CS_2 to B, if we monitor failed after CS_2, we have to ensure to finish CS_1 channel update by sending RAA_1 first so at CS_2 generation we set to opposite

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote.

/// need to ensure we resend them in the order we originally generated them. Thus, we track
/// that order here by always setting this to the opposite of the message we are generating (ie
/// when we generate a CS, we set this to RAAFirst as we should send any pending RAAs first and
/// visa-versa).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vice-versa. Visa, that's the concurrence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, good point, the rewrite removed it, though.

@@ -2142,7 +2129,7 @@ impl Channel {
// When the monitor updating is restored we'll call get_last_commitment_update(),
// which does not update state, but we're definitely now awaiting a remote revoke
// before we can step forward any more, so set it here.
self.channel_state |= ChannelState::AwaitingRemoteRevoke as u32;
self.send_commitment_no_status_check()?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm reason to call send_commitment_no_status_check there is to upgrade htlc status ? (instead of self.resend_order = RAACommitmentOrder::RevokeAndACKFirst ; self.channel_state |= ChannelState::AwaitingRemoteRevoke )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, otherwise the pending HTLCs wont match when we resend the CS.

@TheBlueMatt TheBlueMatt force-pushed the 2019-03-chan-send-rewrite branch from 2cf9e39 to e6e70ab Compare March 25, 2019 20:54
Resending revoke_and_ack and commitment_signed (+update) messages
after monitor-update-failure or disconnection has been a highly
unreliable part of our codebase for some time (as evidenced by the
number of bugs caught in the chanmon_fail_consistency fuzz target).
This is due to its rather ad-hoc nature and tracking/behavior which
consists of checking a number of different flags to try to deduce
which messages were/were not delivered and go from there. Instead,
this commit rewrites it to simply keep track of the order messages
were generated originally, as we always resend in the
originally-generated order.

I'm anticipating this will be way more robust than the old code, in
addition to its simplicity.
@TheBlueMatt TheBlueMatt force-pushed the 2019-03-chan-send-rewrite branch from e6e70ab to c2a3fc7 Compare March 25, 2019 21:04
@TheBlueMatt
Copy link
Collaborator Author

Note that since restarting the fuzzers on this branch I haven't found another chanmon_fail_consistency crash, with probably 2 billion iterations in AFL etc, whereas the previous code was falling over quite quickly.

@ariard
Copy link

ariard commented Mar 26, 2019

Read again updated comment, fine, take it to close 0.0.9

@TheBlueMatt TheBlueMatt merged commit 06eddc3 into lightningdevkit:master Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants