-
Notifications
You must be signed in to change notification settings - Fork 408
Rewrite Channel resend tracking to make it much more reliable #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Channel resend tracking to make it much more reliable #320
Conversation
674fdca
to
2cf9e39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me, but also seems easy to mess up there, so glad to have chanmon_fail_consistency (does this last one test overlapping of commitment signed, i.e issued and received by both sides before any RAA, no sure with a quick skim on it ?)
src/ln/channel.rs
Outdated
/// When resending CS/RAA messages on channel monitor restoration or on reconnect, we always | ||
/// need to ensure we resend them in the order we originally generated them. Thus, we track | ||
/// that order here by always setting this to the opposite of the message we are generating (ie | ||
/// when we generate a CS, we set this to RAAFirst as we should send any pending RAAs first and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think got the idea, but maybe could be a little be clearer, something like: when we get CS_1 from B and generate CS_2 to B, if we monitor failed after CS_2, we have to ensure to finish CS_1 channel update by sending RAA_1 first so at CS_2 generation we set to opposite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote.
src/ln/channel.rs
Outdated
/// need to ensure we resend them in the order we originally generated them. Thus, we track | ||
/// that order here by always setting this to the opposite of the message we are generating (ie | ||
/// when we generate a CS, we set this to RAAFirst as we should send any pending RAAs first and | ||
/// visa-versa). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: vice-versa. Visa, that's the concurrence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, good point, the rewrite removed it, though.
@@ -2142,7 +2129,7 @@ impl Channel { | |||
// When the monitor updating is restored we'll call get_last_commitment_update(), | |||
// which does not update state, but we're definitely now awaiting a remote revoke | |||
// before we can step forward any more, so set it here. | |||
self.channel_state |= ChannelState::AwaitingRemoteRevoke as u32; | |||
self.send_commitment_no_status_check()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm reason to call send_commitment_no_status_check there is to upgrade htlc status ? (instead of self.resend_order = RAACommitmentOrder::RevokeAndACKFirst ; self.channel_state |= ChannelState::AwaitingRemoteRevoke )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, otherwise the pending HTLCs wont match when we resend the CS.
2cf9e39
to
e6e70ab
Compare
Resending revoke_and_ack and commitment_signed (+update) messages after monitor-update-failure or disconnection has been a highly unreliable part of our codebase for some time (as evidenced by the number of bugs caught in the chanmon_fail_consistency fuzz target). This is due to its rather ad-hoc nature and tracking/behavior which consists of checking a number of different flags to try to deduce which messages were/were not delivered and go from there. Instead, this commit rewrites it to simply keep track of the order messages were generated originally, as we always resend in the originally-generated order. I'm anticipating this will be way more robust than the old code, in addition to its simplicity.
e6e70ab
to
c2a3fc7
Compare
Note that since restarting the fuzzers on this branch I haven't found another chanmon_fail_consistency crash, with probably 2 billion iterations in AFL etc, whereas the previous code was falling over quite quickly. |
Read again updated comment, fine, take it to close 0.0.9 |
Resending revoke_and_ack and commitment_signed (+update) messages
after monitor-update-failure or disconnection has been a highly
unreliable part of our codebase for some time (as evidenced by the
number of bugs caught in the chanmon_fail_consistency fuzz target).
This is due to its rather ad-hoc nature and tracking/behavior which
consists of checking a number of different flags to try to deduce
which messages were/were not delivered and go from there. Instead,
this commit rewrites it to simply keep track of the order messages
were generated originally, as we always resend in the
originally-generated order.
I'm anticipating this will be way more robust than the old code, in
addition to its simplicity.
Based on #314, #316, and #319. I have some tests which failed prior to this change that I need to backport from chanmon_fail_consistency, as well as #319 needs some tests.