-
Notifications
You must be signed in to change notification settings - Fork 407
Fold sign_holder_commitment_htlc_transactions into sign_holder_commitment #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fold sign_holder_commitment_htlc_transactions into sign_holder_commitment #770
Conversation
I notice that |
Codecov Report
@@ Coverage Diff @@
## main #770 +/- ##
==========================================
+ Coverage 91.23% 91.28% +0.04%
==========================================
Files 37 37
Lines 22866 22846 -20
==========================================
- Hits 20862 20855 -7
+ Misses 2004 1991 -13
Continue to review full report at Codecov.
|
Made all signing failures panic in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @ariard.
lightning/src/ln/onchaintx.rs
Outdated
bumped_tx.input[i].witness.push(witness_script.clone().into_bytes()); | ||
} else { return None; } | ||
//TODO: panic ? | ||
let sig= self.key_storage.sign_justice_transaction(&bumped_tx, i, *amount, &per_commitment_key, htlc, &self.secp_ctx).expect("sign justice tx"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update docs to note that we don't handle Errs and panic, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant docs on generate_claim_tx
- this is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I missed this, no, I meant on sign_holder_commitment_and_htlcs
- generate_claim_tx
isn't pub (so the docs here aren't visible), and the panic impacts all the way up to users (as it can be called in a number of contexts related to ChannelMonitor
). Alternatively, we could note it in ChannelMonitor
in all the pub
functions that call through to here, but the call graph isn't exactly trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment on sign_holder_commitment_and_htlcs
Given this comment, I'm thinking that there's not enough motivation for this change and it's OK for the commitment Tx signing to continue to be decoupled from the HTLC signing. Should I close this and open a different PR for the second commit? |
I was/am OK with it given the API simplification, which in the vast-majority case folds two back-to-back calls into one, but I certainly don't feel strong about it - up to you if you want to pursue it.
Sure, the fact that a user-returned Err from ChanSigner implies you can never claim your funds needs to be documented, at least, and panic makes sense. |
fc33132
to
3fe6255
Compare
OK, I'm going to give this a try and we can see if we like the result. I pushed code changes that address that comment, but I still need to address a couple of other things. |
8e826e9
to
b36617d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm Concept ACK on this, it's clearly a simplification of those critical on-chain code paths.
lightning/src/chain/keysinterface.rs
Outdated
/// This will only ever be called with a non-revoked commitment_tx. This may be the latest | ||
/// tx or it may be the second latest if not revoked and some watchtower/secondary | ||
/// ChannelMonitor decided to broadcast before it had been updated to the latest. | ||
/// This may be called multiple times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe mention that it can be called multiple times for the same commitment_tx - its obviously called regularly for different ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
called multiple times for the same commitment_tx - its obviously called regularly for different ones.
I agree with the first part and will adjust the doc, but I don't understand what you mean by the second part. It is not called regularly - it's only called at force-close which happens rarely, right?
6bcd9ad
to
aa63818
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the code changes but see the comment about "Panic if signing fails in OnChainTx"
bumped_tx.input[i].witness.push(witness_script.clone().into_bytes()); | ||
} else { return None; } | ||
//TODO: panic ? | ||
let sig = self.key_storage.sign_justice_transaction(&bumped_tx, i, *amount, &per_commitment_key, htlc, &self.secp_ctx).expect("sign justice tx"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a doubt on this change, I'm not sure an external signer failure should bring down your Lightning node. You might a partial disk failure, e.g corrupted key data for channel A, but still your signer might still succeed to produce signatures for channel B.
Or this a kind of scenario which should be managed behind the key interface and a dummy sig should be returned ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my thoughts on this.
I believe that a highly available installation should have replication at these layers:
- signer - there should be a quorum of signers that have to agree for channel state advancement
- chanmon - there should be at least one chanmon replica reacting to on-chain events
- network handler (everything other than signer/chanmon) - probably want this in a active+hot-standby configuration, but maybe you can even do active+active or split by channel
I think in such a configuration, the signer layer should have a "TEMPFAIL" result, which should defer the operation and retry later. For example, there may not be a quorum for one channel, but another channel can go ahead. I think this requires some additional code to implement the deferral, so should be its own effort. However, the existing errors this function produced are not in that category, they indicated a bug or corruption.
I think a faulty disk should probably result in the replica brought down and operator intervention, since it's hard to automatically figure out if the replica is still sane (maybe we are seeing old data, maybe our RAM is bad). In a replicated environment, there should be enough other replicas to continue.
To summarize, I think the existing errors should be fatal, but in the future we should create a new kind of temporary error in all signer API calls. This kind of error should be generated by the signer "client-side" if it fails to arrange for a quorum and should cause the operation to be deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might a partial disk failure, e.g corrupted key data for channel A, but still your signer might still succeed to produce signatures for channel B.
I'd strongly suggest that this is a great reason to bring down a lightning node - your hardware is corrupting data, for all we know the data may be corrupted in a way that results in us revealing a private key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of temporary errors generated by the "client-side" signer, I agree that's a good direction for the future.
your hardware is corrupting data, for all we know the data may be corrupted in a way that results in us revealing a private key.
It's hard to say, on one hand, if you have only one replica and you're crashing your only monitoring instance you might lose all your channels instead of burning only one. And the other hand, your hardware failure might be so gross that you reveal private key of L1 funds if your signer is cross-layer.
I don't think there is a good answer and it's likely function of your configuration, funds at stake, confidence in your hardware, ... One more trade-off we will have to document somewhere I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there isn't a clear answer, but we do know its critical that the user get involved ASAP and figure out what issue is happening. Continuing on is usually a great way to not have the user get involved, and I'd argue that the only real way we have of doing that is to panic.
3958599
to
d657353
Compare
Code Review ACK d657353, thanks for updating with the latest suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from additional docs at #770 (comment), LGTM.
d657353
to
4a2a7dc
Compare
…ment Signing the commitment transaction is almost always followed by signing the attached HTLC transactions, so fold the signing operations into a single method.
Signatures in OnChainTx must not fail, or we stand to lose funds
It is no longer optional since it is available at construction time.
4a2a7dc
to
0b20cf6
Compare
Signing the commitment transaction is almost always followed by signing the attached HTLC transactions, so fold the signing operations into a single method.
Followup to #742.