Skip to content

2019 04 in flight txn tracking clean #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ariard
Copy link

@ariard ariard commented Apr 10, 2019

#333 without WIP Bump Tx stuff

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from d3967e7 to d74361c Compare April 10, 2019 22:33
@TheBlueMatt TheBlueMatt added this to the 0.0.9 milestone Apr 22, 2019
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I had these pending, I presume worth publishing until I get a chance to re-review, hopefully tomorrow.

@@ -390,6 +393,9 @@ pub struct ChannelMonitor {

destination_script: Script,


onchain_events_waiting_threshold_conf: HashMap<u32, Vec<OnchainEvent>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment describing exactly how this is used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe meaning of the u32 still isnt explained.

/// once they mature to enough confirmations (HTLC_FAIL_ANTI_REORG_DELAY)
#[derive(Clone, PartialEq)]
enum OnchainEvent {
/// Outpoint under claim process by our own tx, at maturation we remote it from our_claim_txn_waiting_first_conf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there is no variable called our_claim_txn_waiting_first_conf.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's inside the added commit, what is best practice referring it even if not present in commit or doing a generic reference?

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented May 22, 2019

From IRC:

<BlueMatt> ariard: wait, no, yea, I'm still confused, it looks like this acts purely based on height
<BlueMatt> ariard: what I thought it was going to do was something like: step (1) create transactions (ie what we do now), step (2) add some mapping from txid to things-this-tx-resolves, step (3) monitor for confirmation of any tacked txids, step (4) when you see one, kick off a block-count-timer and then once you get enough confirmations, handle it
<BlueMatt> it looks like #336 skips (2) and (3), though I dont think addint them would be too hard?

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from d74361c to 3bb62d1 Compare May 22, 2019 19:04
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly done reviewing, but here's some initial feedback.

@@ -2046,7 +2046,17 @@ impl ChannelMonitor {
payment_preimage.0.copy_from_slice(&input.witness[1]);
htlc_updated.push((source, Some(payment_preimage), payment_hash));
} else {
htlc_updated.push((source, None, payment_hash));
log_trace!(self, "Failing HTLC with payment_hash {} timeout by a spend tx, waiting confirmation until {} height", log_bytes!(payment_hash.0), height + HTLC_FAIL_ANTI_REORG_DELAY - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...we need some kind of mechanism to prevent us from waiting until after an HTLC expires to fail it backwards. In fact, currently, we only close-to-chain when we're CLTV_CLAIM_BUFFER (6 blocks) away from expiry, and we only fail back after HTLC_FAIL_ANTI_REORG_DELAY (6 blocks), so we're kinda screwed. We should significantly increase CLTV_CLAIM_BUFFER, add some static asserts, and then make sure we always fail back maybe one or two blocks before expiry if we have an unconfirmed resolving tx pending.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm isn't more CLTV_EXPIRY_DELTA and HTLC_FAIL_ANTI_REORG_DELAY which are interwined ? Let's say remote commitment transaction hits onchain, and we resolves a received_HTLC with a timeout tx, after HTLC_FAIL_ANTI_REORG_DELAY we pass failure backward. As it's way inferior to CLTV_EXPIRY_DELTA, it should be safe, assuming would_broadcast_at_height react timely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yea, sorry, you're right, we should still try to check how far we are from where we need to have failed back by, but, indeed, its a question of CLTV_EXPIRY_DELTA.

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from 3bb62d1 to fc06077 Compare May 31, 2019 01:44
@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from fc06077 to 47db5be Compare July 1, 2019 18:23
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I'll finish this review this week, but this is what I came up with when looking a week ago or so.

if let &Some(ref local_tx) = &self.current_local_signed_commitment_tx {
for &(ref htlc, _, ref source) in &local_tx.htlc_outputs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this block only run in case either commitment_txid == current local txid or remote local txid? (ie doesn't this hunk, as written, always fail dust HTLCs backwards?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes modified to only fail backward if commitment tx is last or previous local one.

But we still need to iterate on both HTLCs set as you may receive a update_fail_htlc + commitment signed from your counterpart. In case of last local tx hitting the chain, our in-memory last local tx datas won't contain the failed dust HTLC output. That's a edge-case but test_failure_delay_dust_htlc_local_commitment is explicitly testing that.

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 6 times, most recently from b33d1df to e3d42ed Compare July 3, 2019 22:37
@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from e3d42ed to ca03df9 Compare July 8, 2019 23:20
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review individual commits, just the total diff. Didn't get materially to tests yet either, but it does look like the tests could use some love, definitely a few more new tests would be much appreciated (eg to fix the issues I point out).

@@ -332,6 +332,8 @@ pub struct ChannelManager {
channel_state: Mutex<ChannelHolder>,
our_network_key: SecretKey,

channel_closing_waiting_threshold_conf: Mutex<HashMap<u32, Vec<[u8; 32]>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs please. Also, looks like right now this is always only ever taken while channel_state is taken. It could either be moved into the channel_state mutex or the places its taken broken up and scoped so that the locks don't always overlap. Don't sweat making fine-grained locks yet anyway, until we go around and split channels into their own locks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated doc and moved it in ChannelHolder ? IIRC this buffer is just there to avoid inconsistencies between channel_manager and channel_monitor in case of reorgs and an HTLC-timeout being swept by a preimage tx, in this case we need to pass success backward and we need the channel to be still open. Do you think it's ok right now before we put more thoughts in channel locks refactoring ?

@@ -390,6 +393,9 @@ pub struct ChannelMonitor {

destination_script: Script,


onchain_events_waiting_threshold_conf: HashMap<u32, Vec<OnchainEvent>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe meaning of the u32 still isnt explained.

assert!(predicted_weight >= single_htlc_tx.get_weight());
assert_eq!(single_htlc_tx.input.len(), 1);
self.our_claim_txn_waiting_first_conf.insert(single_htlc_tx.input[0].previous_output.clone(), (height + 3, TxMaterial::Revoked { script: redeemscript, pubkey: Some(revocation_pubkey), key: revocation_key, is_htlc: true, amount: htlc.amount_msat / 1000 }, fee_estimator.get_est_sat_per_1000_weight(ConfirmationTarget::HighPriority)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, probably want something smarter than just height + 3, gien the HTLC may be expiring in < 3 blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, need to make sure we add to the map with the feerate actually used, not a new one fetched from the fee_estimator. Further, and this is somewhat of a strange edge-case, but in case we got called twice for the same block, but the feerate information changed between calls, we don't want to overwrite the entry in the map with new info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these comments apply equally to several other similar lines below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On feerate, updated to cache only actual value used. In case we got block twice, I kept only the first version seen, doesn't seem efficient, in case of bump needed, to send two bumped transactions at same height even their respective previous states has different feerate base.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you may have the tx_broadcaster discard duplicate claim tx based on spend outpoint.

msg: update
});
log_trace!(self, "Detected channel-closing tx {} spending {}:{}, waiting until {} to close channel {}", tx.txid(), inp.previous_output.txid, inp.previous_output.vout, height + HTLC_FAIL_ANTI_REORG_DELAY - 1, log_bytes!(channel_id[..]));
match channel_closing_lock.entry(height + HTLC_FAIL_ANTI_REORG_DELAY - 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this and a bunch of other places): Is it possible that we need to fail this backwards before we reach HTLC_FAIL_ANTI_REORG_DELAY? ie because the inbound HTLC is close to timing out (and, thus, will result in a needless channel closure upstream). I think we need to check if the expiry + CLTV_EXPIRY_DELTA is at least HTLC_FAIL_ANTI_REORG_DELAY + CLTV_CLAIM_BUFFER (maybe + some headroom) away?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to do now, but I'm more and more thinking that channel_manager shouldn't be a ChainListener and get all blockchain events through ChannelMonitor and register outpoint to watch if needed. It would avoid inconsistencies between both of them due to different handling of block height timer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, see comment at https://github.com/rust-bitcoin/rust-lightning/pull/336/files#r304483221 I'd agree if block_connected in ChannelManager grows, but as long as we can keep it small, I don't see why its necessary.


add_dynamic_output!(htlc_timeout_tx, 0);
//TODO: we should delay broadcast of HTLC-timeout tx until locktime - 1 to avoid being bounce off by mempool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actually sure what you're saying here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through core mempool rules the other day, I read that locktime transactions are going to be rejected from the mempool is locktime isn't equal or more to tip + 1 to avoid DoS vectors which means if we should broadcast timeout txn only at this given height to avoid having to rebuild it few blocks later. And for so track them with an OnchainEvent. It is how the mempool is working ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I see your point. Still, broadcast_transaction should imply re-announcement (potentially at the appropriate time), if only because mempool transactions may time out. The docs on broadcast_transaction should make sure this is clear, but we shouldn't bother dealing with that.

}

// HTLCs set may differ between last and previous local commitment txn, in case of one hitting chain, ensure we cancel all of them backward
let is_local_tx = if self.current_local_signed_commitment_tx.is_some() || self.prev_local_signed_commitment_tx.is_some() { true } else { false };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isnt enough to verify that the given transaction is a local transaction, is it? I think you also need to check that commitment_txid == local_tx.txid for either prev or current?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth trying to add a test case for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, was broken but not spotted by any other tests, seems it's really an edge case because if it's spending the funding output, dust htlcs are going to be cancelled on the behalf of remote commitment. Nevermind, switched the logic to be sure we have a previous_commitment or latest_commitment. Added test_no_failure_dust_htlc_local_commitment, turn to true the is_local_tx in check_spend_local_transaction

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 4 times, most recently from df16818 to 0d3575a Compare July 12, 2019 18:03
@ariard
Copy link
Author

ariard commented Jul 12, 2019

Updated at 0d3575a with your last review, I think we need to think more how we handle timelocks given #336 (comment), #336 (comment) and #336 (comment)

/// before to close channel. Doing it to early would block us to pass backward
/// preimage in case of successful claim on onward channel. Key is block
/// height timer, value is array of channels id, waiting their closing.
pub(super) channel_closing_waiting_threshold_conf: HashMap<u32, Vec<[u8; 32]>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why? If our counterparty broadcasts a commitment transaction, and it gets reorged out, we shoulnd't try to relay payments over the channel, that commitment transaction is probably gonna be mined again in a block soon, we should just move forward happily with a closed channel.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm dumb, that's exactly what my test was trying to do, i.e testing some edge cases which is doable but doesn't make sense on a high-level. And from then, adding all this non-sense struct in channel_manager, I've dropped this commit and rewrote test_sweep_outbound_htlc_faiilure_update.

Antoine Riard added 7 commits July 17, 2019 15:26
Broadcasting a commitment tx means that we have to fail
inbound HTLC in backward channel. Doing it prematurely would
put us at risk in case of reorg. So we delay passing failure
update upstream until solving tx mature to HTLC_FAIL_ANTI_
REORG_DELAY.
Requirements differ if HTLC is a revoked/non-revoked dust/
non-revoked non-dust one.

Add connect_blocks in test_utils to fix broken tests due to
anti-reorg delay enforcement

Remove anti-duplicate htlc update stuff in ManySimpleChannelMonitor
Modify ChainListener API by adding height field to block_disconnect
Add test_failure_delay_htlc_local_commitment and
test_no_failure_dust_htlc_local_commitment

Move some bits of check_spend_remote as we need to fail dust HTLCs
which can be spread on both prev/lastest local commitment tx
Add test_sweep_outbound_htlc_failure_update
We need also to track claim tx until their maturation to know
when we may safely remove them from could-be-bumped-txn buffer
@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 3 times, most recently from ce5135f to ba511d5 Compare July 18, 2019 23:13
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty much ready to go.

//- htlc update there as failure-trigger tx (revoked commitment tx, non-revoked commitment tx, HTLC-timeout tx) has been disconnected
//- our claim tx on a commitment tx output
}
self.our_claim_txn_waiting_first_conf.retain(|_, ref mut v| if v.0 == height + 3 { false } else { true });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this now wrong given get_height_timer? Maybe also track what block we saw the ting which is being spent explicitly to avoid any confusion.

@@ -28,7 +28,7 @@ use secp256k1;
use chain::chaininterface::{BroadcasterInterface,ChainListener,ChainWatchInterface,FeeEstimator};
use chain::transaction::OutPoint;
use ln::channel::{Channel, ChannelError};
use ln::channelmonitor::{ChannelMonitor, ChannelMonitorUpdateErr, ManyChannelMonitor, CLTV_CLAIM_BUFFER, HTLC_FAIL_TIMEOUT_BLOCKS, HTLC_FAIL_ANTI_REORG_DELAY};
use ln::channelmonitor::{ChannelMonitor, ChannelMonitorUpdateErr, ManyChannelMonitor, CLTV_CLAIM_BUFFER, GRACE_PERIOD_BLOCKS, ANTI_REORG_DELAY};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New names are nice, but GRACE_PERIOD_BLOCKS isnt sufficiently clear, IMO, maybe NETWORK_SYNC_GRACE_PERIOD_BLOCKS?

@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from ba511d5 to 9d8bfa2 Compare July 19, 2019 20:07
Antoine Riard added 2 commits July 19, 2019 17:19
Rename HTLC_FAIL_ANTI_REORG_DELAY to ANTI_REORG_DELAY because
we are going to rely on it also to remove bump candidates outpoint
from tracker after claim get enough depth.

Rename HTLC_FAIL_TIMEOUT_BLOCKS to LATENCY_GRACE_PERIOD_BLOCKS because
it's carrying more meaningfully that we are doing a favor to our
peer instead of ruthlessly enforcing the contract.

CLTV_EXPIRY_DELTA should be > to LATENCY_GRACE_PERIOD_BLOCKS +
+CLTV_CLAIM_BUFFER + ANTI_REORG_DELAY + LATENCY_GRACE_PERIOD_BLOCKS
When we reached height + LATENCY_GRACE_PERIOD_BLOCKS and we have pending
unsolved outbound HTLC, we fail onchain with
our local commitment tx. At this point we expect to get in chain in a
worst-case delay of CLTV_CLAIM_BUFFER. When our HTLC-timeout is
confirmed with ANTI_REORG_DELAY we may safely fail backward the
corresponding inbound output.
When we generate a justice tx, a htlc tx on remote commitment or
a htlc tx on local commitment we track them until first conf.
for (input, info) in spend_tx.input.iter_mut().zip(inputs_info.iter()) {
let (redeemscript, revocation_key) = sign_input!(sighash_parts, input, info.0, info.1);
let height_timer = Self::get_height_timer(height, info.2);
match self.our_claim_txn_waiting_first_conf.entry(input.previous_output.clone()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to happen in this PR, but in #347 this is gonna bite us - we don't track that all of these inputs are spent in the same transaction, with different expiries, so we may try to bump only one input, and then end up getting that confirmed, leaving nothing in the mempool trying to punish other inputs.

We must adapt our delay between two bumps of claim txn in respect
to the timelock encumbering the targeted outpoint. If HTLC or
revoked output is near to expire, we should try to get our claim
in every block. If it's reasonably in the future, we may give us
more latency to bump
@ariard ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from 9d8bfa2 to 757bcc2 Compare July 19, 2019 21:31
@TheBlueMatt TheBlueMatt merged commit 8470e60 into lightningdevkit:master Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants