Skip to content

Fold sign_holder_commitment_htlc_transactions into sign_holder_commitment #770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 18, 2021

Conversation

devrandom
Copy link
Member

Signing the commitment transaction is almost always followed by signing the attached HTLC transactions, so fold the signing operations into a single method.

Followup to #742.

@devrandom
Copy link
Member Author

I notice that get_fully_signed_htlc_tx and get_fully_signed_holder_tx do nothing if there's a failure to sign. It seems like these should panic.

@codecov
Copy link

codecov bot commented Jan 5, 2021

Codecov Report

Merging #770 (0b20cf6) into main (2ea3765) will increase coverage by 0.04%.
The diff coverage is 97.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #770      +/-   ##
==========================================
+ Coverage   91.23%   91.28%   +0.04%     
==========================================
  Files          37       37              
  Lines       22866    22846      -20     
==========================================
- Hits        20862    20855       -7     
+ Misses       2004     1991      -13     
Impacted Files Coverage Δ
lightning/src/ln/channel.rs 87.49% <ø> (ø)
lightning/src/ln/onchaintx.rs 94.02% <95.45%> (+0.09%) ⬆️
lightning/src/chain/channelmonitor.rs 95.81% <100.00%> (+0.13%) ⬆️
lightning/src/chain/keysinterface.rs 93.29% <100.00%> (-0.05%) ⬇️
lightning/src/util/enforcing_trait_impls.rs 100.00% <100.00%> (ø)
lightning/src/ln/functional_tests.rs 97.12% <0.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ea3765...0b20cf6. Read the comment docs.

@devrandom
Copy link
Member Author

Made all signing failures panic in OnChainTx. Such failures can result in losing funds, so should not happen.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @ariard.

bumped_tx.input[i].witness.push(witness_script.clone().into_bytes());
} else { return None; }
//TODO: panic ?
let sig= self.key_storage.sign_justice_transaction(&bumped_tx, i, *amount, &per_commitment_key, htlc, &self.secp_ctx).expect("sign justice tx");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update docs to note that we don't handle Errs and panic, then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant docs on generate_claim_tx - this is done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I missed this, no, I meant on sign_holder_commitment_and_htlcs - generate_claim_tx isn't pub (so the docs here aren't visible), and the panic impacts all the way up to users (as it can be called in a number of contexts related to ChannelMonitor). Alternatively, we could note it in ChannelMonitor in all the pub functions that call through to here, but the call graph isn't exactly trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment on sign_holder_commitment_and_htlcs

@devrandom
Copy link
Member Author

Given this comment, I'm thinking that there's not enough motivation for this change and it's OK for the commitment Tx signing to continue to be decoupled from the HTLC signing.

Should I close this and open a different PR for the second commit?

@TheBlueMatt
Copy link
Collaborator

Given this comment, I'm thinking that there's not enough motivation for this change and it's OK for the commitment Tx signing to continue to be decoupled from the HTLC signing.

I was/am OK with it given the API simplification, which in the vast-majority case folds two back-to-back calls into one, but I certainly don't feel strong about it - up to you if you want to pursue it.

Should I close this and open a different PR for the second commit?

Sure, the fact that a user-returned Err from ChanSigner implies you can never claim your funds needs to be documented, at least, and panic makes sense.

@devrandom
Copy link
Member Author

OK, I'm going to give this a try and we can see if we like the result. I pushed code changes that address that comment, but I still need to address a couple of other things.

@devrandom devrandom force-pushed the holder-htlc-sigs branch 2 times, most recently from 8e826e9 to b36617d Compare January 7, 2021 21:57
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Concept ACK on this, it's clearly a simplification of those critical on-chain code paths.

/// This will only ever be called with a non-revoked commitment_tx. This may be the latest
/// tx or it may be the second latest if not revoked and some watchtower/secondary
/// ChannelMonitor decided to broadcast before it had been updated to the latest.
/// This may be called multiple times.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe mention that it can be called multiple times for the same commitment_tx - its obviously called regularly for different ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called multiple times for the same commitment_tx - its obviously called regularly for different ones.

I agree with the first part and will adjust the doc, but I don't understand what you mean by the second part. It is not called regularly - it's only called at force-close which happens rarely, right?

@devrandom devrandom force-pushed the holder-htlc-sigs branch 2 times, most recently from 6bcd9ad to aa63818 Compare January 14, 2021 22:05
Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the code changes but see the comment about "Panic if signing fails in OnChainTx"

bumped_tx.input[i].witness.push(witness_script.clone().into_bytes());
} else { return None; }
//TODO: panic ?
let sig = self.key_storage.sign_justice_transaction(&bumped_tx, i, *amount, &per_commitment_key, htlc, &self.secp_ctx).expect("sign justice tx");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a doubt on this change, I'm not sure an external signer failure should bring down your Lightning node. You might a partial disk failure, e.g corrupted key data for channel A, but still your signer might still succeed to produce signatures for channel B.

Or this a kind of scenario which should be managed behind the key interface and a dummy sig should be returned ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my thoughts on this.

I believe that a highly available installation should have replication at these layers:

  • signer - there should be a quorum of signers that have to agree for channel state advancement
  • chanmon - there should be at least one chanmon replica reacting to on-chain events
  • network handler (everything other than signer/chanmon) - probably want this in a active+hot-standby configuration, but maybe you can even do active+active or split by channel

I think in such a configuration, the signer layer should have a "TEMPFAIL" result, which should defer the operation and retry later. For example, there may not be a quorum for one channel, but another channel can go ahead. I think this requires some additional code to implement the deferral, so should be its own effort. However, the existing errors this function produced are not in that category, they indicated a bug or corruption.

I think a faulty disk should probably result in the replica brought down and operator intervention, since it's hard to automatically figure out if the replica is still sane (maybe we are seeing old data, maybe our RAM is bad). In a replicated environment, there should be enough other replicas to continue.

To summarize, I think the existing errors should be fatal, but in the future we should create a new kind of temporary error in all signer API calls. This kind of error should be generated by the signer "client-side" if it fails to arrange for a quorum and should cause the operation to be deferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might a partial disk failure, e.g corrupted key data for channel A, but still your signer might still succeed to produce signatures for channel B.

I'd strongly suggest that this is a great reason to bring down a lightning node - your hardware is corrupting data, for all we know the data may be corrupted in a way that results in us revealing a private key.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of temporary errors generated by the "client-side" signer, I agree that's a good direction for the future.

your hardware is corrupting data, for all we know the data may be corrupted in a way that results in us revealing a private key.

It's hard to say, on one hand, if you have only one replica and you're crashing your only monitoring instance you might lose all your channels instead of burning only one. And the other hand, your hardware failure might be so gross that you reveal private key of L1 funds if your signer is cross-layer.

I don't think there is a good answer and it's likely function of your configuration, funds at stake, confidence in your hardware, ... One more trade-off we will have to document somewhere I guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there isn't a clear answer, but we do know its critical that the user get involved ASAP and figure out what issue is happening. Continuing on is usually a great way to not have the user get involved, and I'd argue that the only real way we have of doing that is to panic.

@devrandom devrandom force-pushed the holder-htlc-sigs branch 2 times, most recently from 3958599 to d657353 Compare January 15, 2021 18:41
@ariard
Copy link

ariard commented Jan 18, 2021

Code Review ACK d657353, thanks for updating with the latest suggestions.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from additional docs at #770 (comment), LGTM.

…ment

Signing the commitment transaction is almost always followed by signing the attached HTLC transactions, so fold the signing operations into a single method.
Signatures in OnChainTx must not fail, or we stand to lose funds
It is no longer optional since it is available at construction time.
@TheBlueMatt TheBlueMatt merged commit 21a44da into lightningdevkit:main Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants