2019 04 in flight txn tracking clean #336

ariard · 2019-04-10T22:04:39Z

#333 without WIP Bump Tx stuff

TheBlueMatt

Hmm, I had these pending, I presume worth publishing until I get a chance to re-review, hopefully tomorrow.

TheBlueMatt · 2019-04-14T20:25:48Z

src/ln/channelmonitor.rs

@@ -390,6 +393,9 @@ pub struct ChannelMonitor {

 	destination_script: Script,

+
+	onchain_events_waiting_threshold_conf: HashMap<u32, Vec<OnchainEvent>>,


Can you add a comment describing exactly how this is used?

THe meaning of the u32 still isnt explained.

src/ln/channelmonitor.rs

src/chain/chaininterface.rs

TheBlueMatt · 2019-04-14T22:09:41Z

src/ln/channelmonitor.rs

+/// once they mature to enough confirmations (HTLC_FAIL_ANTI_REORG_DELAY)
+#[derive(Clone, PartialEq)]
+enum OnchainEvent {
+	/// Outpoint under claim process by our own tx, at maturation we remote it from our_claim_txn_waiting_first_conf


Hmm, there is no variable called our_claim_txn_waiting_first_conf.

Yes it's inside the added commit, what is best practice referring it even if not present in commit or doing a generic reference?

TheBlueMatt · 2019-05-22T01:13:54Z

From IRC:

<BlueMatt> ariard: wait, no, yea, I'm still confused, it looks like this acts purely based on height
<BlueMatt> ariard: what I thought it was going to do was something like: step (1) create transactions (ie what we do now), step (2) add some mapping from txid to things-this-tx-resolves, step (3) monitor for confirmation of any tacked txids, step (4) when you see one, kick off a block-count-timer and then once you get enough confirmations, handle it
<BlueMatt> it looks like #336 skips (2) and (3), though I dont think addint them would be too hard?

TheBlueMatt

Mostly done reviewing, but here's some initial feedback.

src/ln/channelmonitor.rs

src/chain/chaininterface.rs

TheBlueMatt · 2019-05-23T21:02:05Z

src/ln/channelmonitor.rs

@@ -2046,7 +2046,17 @@ impl ChannelMonitor {
 					payment_preimage.0.copy_from_slice(&input.witness[1]);
 					htlc_updated.push((source, Some(payment_preimage), payment_hash));
 				} else {
-					htlc_updated.push((source, None, payment_hash));
+					log_trace!(self, "Failing HTLC with payment_hash {} timeout by a spend tx, waiting confirmation until {} height", log_bytes!(payment_hash.0), height + HTLC_FAIL_ANTI_REORG_DELAY - 1);


Hmmm...we need some kind of mechanism to prevent us from waiting until after an HTLC expires to fail it backwards. In fact, currently, we only close-to-chain when we're CLTV_CLAIM_BUFFER (6 blocks) away from expiry, and we only fail back after HTLC_FAIL_ANTI_REORG_DELAY (6 blocks), so we're kinda screwed. We should significantly increase CLTV_CLAIM_BUFFER, add some static asserts, and then make sure we always fail back maybe one or two blocks before expiry if we have an unconfirmed resolving tx pending.

Hmmm isn't more CLTV_EXPIRY_DELTA and HTLC_FAIL_ANTI_REORG_DELAY which are interwined ? Let's say remote commitment transaction hits onchain, and we resolves a received_HTLC with a timeout tx, after HTLC_FAIL_ANTI_REORG_DELAY we pass failure backward. As it's way inferior to CLTV_EXPIRY_DELTA, it should be safe, assuming would_broadcast_at_height react timely.

Ah, yea, sorry, you're right, we should still try to check how far we are from where we need to have failed back by, but, indeed, its a question of CLTV_EXPIRY_DELTA.

TheBlueMatt

Ugh, I'll finish this review this week, but this is what I came up with when looking a week ago or so.

src/ln/channelmonitor.rs

src/chain/chaininterface.rs

src/ln/channelmonitor.rs

TheBlueMatt · 2019-06-07T13:52:30Z

src/ln/channelmonitor.rs

 		if let &Some(ref local_tx) = &self.current_local_signed_commitment_tx {
+			for &(ref htlc, _, ref source) in &local_tx.htlc_outputs {


Shouldn't this block only run in case either commitment_txid == current local txid or remote local txid? (ie doesn't this hunk, as written, always fail dust HTLCs backwards?)

Oh yes modified to only fail backward if commitment tx is last or previous local one.

But we still need to iterate on both HTLCs set as you may receive a update_fail_htlc + commitment signed from your counterpart. In case of last local tx hitting the chain, our in-memory last local tx datas won't contain the failed dust HTLC output. That's a edge-case but test_failure_delay_dust_htlc_local_commitment is explicitly testing that.

src/ln/functional_tests.rs

src/ln/channelmonitor.rs

TheBlueMatt

Didn't review individual commits, just the total diff. Didn't get materially to tests yet either, but it does look like the tests could use some love, definitely a few more new tests would be much appreciated (eg to fix the issues I point out).

TheBlueMatt · 2019-07-04T03:01:51Z

src/ln/channelmanager.rs

@@ -332,6 +332,8 @@ pub struct ChannelManager {
 	channel_state: Mutex<ChannelHolder>,
 	our_network_key: SecretKey,

+	channel_closing_waiting_threshold_conf: Mutex<HashMap<u32, Vec<[u8; 32]>>>,


docs please. Also, looks like right now this is always only ever taken while channel_state is taken. It could either be moved into the channel_state mutex or the places its taken broken up and scoped so that the locks don't always overlap. Don't sweat making fine-grained locks yet anyway, until we go around and split channels into their own locks.

Updated doc and moved it in ChannelHolder ? IIRC this buffer is just there to avoid inconsistencies between channel_manager and channel_monitor in case of reorgs and an HTLC-timeout being swept by a preimage tx, in this case we need to pass success backward and we need the channel to be still open. Do you think it's ok right now before we put more thoughts in channel locks refactoring ?

TheBlueMatt · 2019-07-04T03:11:12Z

src/ln/channelmonitor.rs

@@ -390,6 +393,9 @@ pub struct ChannelMonitor {

 	destination_script: Script,

+
+	onchain_events_waiting_threshold_conf: HashMap<u32, Vec<OnchainEvent>>,


THe meaning of the u32 still isnt explained.

src/ln/channelmonitor.rs

TheBlueMatt · 2019-07-04T22:12:29Z

src/ln/channelmonitor.rs

 								assert!(predicted_weight >= single_htlc_tx.get_weight());
+								assert_eq!(single_htlc_tx.input.len(), 1);
+								self.our_claim_txn_waiting_first_conf.insert(single_htlc_tx.input[0].previous_output.clone(), (height + 3, TxMaterial::Revoked { script: redeemscript, pubkey: Some(revocation_pubkey), key: revocation_key, is_htlc: true, amount: htlc.amount_msat / 1000 }, fee_estimator.get_est_sat_per_1000_weight(ConfirmationTarget::HighPriority)));


Hmm, probably want something smarter than just height + 3, gien the HTLC may be expiring in < 3 blocks.

Also, need to make sure we add to the map with the feerate actually used, not a new one fetched from the fee_estimator. Further, and this is somewhat of a strange edge-case, but in case we got called twice for the same block, but the feerate information changed between calls, we don't want to overwrite the entry in the map with new info.

Note that these comments apply equally to several other similar lines below.

On feerate, updated to cache only actual value used. In case we got block twice, I kept only the first version seen, doesn't seem efficient, in case of bump needed, to send two bumped transactions at same height even their respective previous states has different feerate base.

And you may have the tx_broadcaster discard duplicate claim tx based on spend outpoint.

TheBlueMatt · 2019-07-04T22:54:43Z

src/ln/channelmanager.rs

-										msg: update
-									});
+								log_trace!(self, "Detected channel-closing tx {} spending {}:{}, waiting until {} to close channel {}", tx.txid(), inp.previous_output.txid, inp.previous_output.vout, height + HTLC_FAIL_ANTI_REORG_DELAY - 1, log_bytes!(channel_id[..]));
+								match channel_closing_lock.entry(height + HTLC_FAIL_ANTI_REORG_DELAY - 1) {


(this and a bunch of other places): Is it possible that we need to fail this backwards before we reach HTLC_FAIL_ANTI_REORG_DELAY? ie because the inbound HTLC is close to timing out (and, thus, will result in a needless channel closure upstream). I think we need to check if the expiry + CLTV_EXPIRY_DELTA is at least HTLC_FAIL_ANTI_REORG_DELAY + CLTV_CLAIM_BUFFER (maybe + some headroom) away?

Not something to do now, but I'm more and more thinking that channel_manager shouldn't be a ChainListener and get all blockchain events through ChannelMonitor and register outpoint to watch if needed. It would avoid inconsistencies between both of them due to different handling of block height timer.

Hmmm, see comment at https://github.com/rust-bitcoin/rust-lightning/pull/336/files#r304483221 I'd agree if block_connected in ChannelManager grows, but as long as we can keep it small, I don't see why its necessary.

src/ln/channelmonitor.rs

TheBlueMatt · 2019-07-09T18:52:00Z

src/ln/channelmonitor.rs


 						add_dynamic_output!(htlc_timeout_tx, 0);
+						//TODO: we should delay broadcast of HTLC-timeout tx until locktime - 1 to avoid being bounce off by mempool


Not actually sure what you're saying here?

Going through core mempool rules the other day, I read that locktime transactions are going to be rejected from the mempool is locktime isn't equal or more to tip + 1 to avoid DoS vectors which means if we should broadcast timeout txn only at this given height to avoid having to rebuild it few blocks later. And for so track them with an OnchainEvent. It is how the mempool is working ?

Ohh, I see your point. Still, broadcast_transaction should imply re-announcement (potentially at the appropriate time), if only because mempool transactions may time out. The docs on broadcast_transaction should make sure this is clear, but we shouldn't bother dealing with that.

TheBlueMatt · 2019-07-09T19:02:13Z

src/ln/channelmonitor.rs

+		}
+
+		// HTLCs set may differ between last and previous local commitment txn, in case of one hitting chain, ensure we cancel all of them backward
+		let is_local_tx = if self.current_local_signed_commitment_tx.is_some() || self.prev_local_signed_commitment_tx.is_some() { true } else { false };


This isnt enough to verify that the given transaction is a local transaction, is it? I think you also need to check that commitment_txid == local_tx.txid for either prev or current?

Probably worth trying to add a test case for this.

Yeah, was broken but not spotted by any other tests, seems it's really an edge case because if it's spending the funding output, dust htlcs are going to be cancelled on the behalf of remote commitment. Nevermind, switched the logic to be sure we have a previous_commitment or latest_commitment. Added test_no_failure_dust_htlc_local_commitment, turn to true the is_local_tx in check_spend_local_transaction

src/ln/channelmonitor.rs

ariard · 2019-07-12T18:11:29Z

Updated at 0d3575a with your last review, I think we need to think more how we handle timelocks given #336 (comment), #336 (comment) and #336 (comment)

TheBlueMatt · 2019-07-17T15:46:49Z

src/ln/channelmanager.rs

+	/// before to close channel. Doing it to early would block us to pass backward
+	/// preimage in case of successful claim on onward channel. Key is block
+	/// height timer, value is array of channels id, waiting their closing.
+	pub(super) channel_closing_waiting_threshold_conf: HashMap<u32, Vec<[u8; 32]>>,


Wait, why? If our counterparty broadcasts a commitment transaction, and it gets reorged out, we shoulnd't try to relay payments over the channel, that commitment transaction is probably gonna be mined again in a block soon, we should just move forward happily with a closed channel.

Oh I'm dumb, that's exactly what my test was trying to do, i.e testing some edge cases which is doable but doesn't make sense on a high-level. And from then, adding all this non-sense struct in channel_manager, I've dropped this commit and rewrote test_sweep_outbound_htlc_faiilure_update.

Broadcasting a commitment tx means that we have to fail inbound HTLC in backward channel. Doing it prematurely would put us at risk in case of reorg. So we delay passing failure update upstream until solving tx mature to HTLC_FAIL_ANTI_ REORG_DELAY. Requirements differ if HTLC is a revoked/non-revoked dust/ non-revoked non-dust one. Add connect_blocks in test_utils to fix broken tests due to anti-reorg delay enforcement Remove anti-duplicate htlc update stuff in ManySimpleChannelMonitor

Modify ChainListener API by adding height field to block_disconnect

Fix tests broken by introduced change

Add test_failure_delay_htlc_local_commitment and test_no_failure_dust_htlc_local_commitment Move some bits of check_spend_remote as we need to fail dust HTLCs which can be spread on both prev/lastest local commitment tx

Add test_sweep_outbound_htlc_failure_update

We need also to track claim tx until their maturation to know when we may safely remove them from could-be-bumped-txn buffer

TheBlueMatt

I think this is pretty much ready to go.

TheBlueMatt · 2019-07-19T19:21:27Z

src/ln/channelmonitor.rs

+			//- htlc update there as failure-trigger tx (revoked commitment tx, non-revoked commitment tx, HTLC-timeout tx) has been disconnected
+			//- our claim tx on a commitment tx output
+		}
+		self.our_claim_txn_waiting_first_conf.retain(|_, ref mut v| if v.0 == height + 3 { false } else { true });


Isn't this now wrong given get_height_timer? Maybe also track what block we saw the ting which is being spent explicitly to avoid any confusion.

TheBlueMatt · 2019-07-19T19:22:00Z

src/ln/channelmanager.rs

@@ -28,7 +28,7 @@ use secp256k1;
 use chain::chaininterface::{BroadcasterInterface,ChainListener,ChainWatchInterface,FeeEstimator};
 use chain::transaction::OutPoint;
 use ln::channel::{Channel, ChannelError};
-use ln::channelmonitor::{ChannelMonitor, ChannelMonitorUpdateErr, ManyChannelMonitor, CLTV_CLAIM_BUFFER, HTLC_FAIL_TIMEOUT_BLOCKS, HTLC_FAIL_ANTI_REORG_DELAY};
+use ln::channelmonitor::{ChannelMonitor, ChannelMonitorUpdateErr, ManyChannelMonitor, CLTV_CLAIM_BUFFER, GRACE_PERIOD_BLOCKS, ANTI_REORG_DELAY};


New names are nice, but GRACE_PERIOD_BLOCKS isnt sufficiently clear, IMO, maybe NETWORK_SYNC_GRACE_PERIOD_BLOCKS?

Rename HTLC_FAIL_ANTI_REORG_DELAY to ANTI_REORG_DELAY because we are going to rely on it also to remove bump candidates outpoint from tracker after claim get enough depth. Rename HTLC_FAIL_TIMEOUT_BLOCKS to LATENCY_GRACE_PERIOD_BLOCKS because it's carrying more meaningfully that we are doing a favor to our peer instead of ruthlessly enforcing the contract. CLTV_EXPIRY_DELTA should be > to LATENCY_GRACE_PERIOD_BLOCKS + +CLTV_CLAIM_BUFFER + ANTI_REORG_DELAY + LATENCY_GRACE_PERIOD_BLOCKS When we reached height + LATENCY_GRACE_PERIOD_BLOCKS and we have pending unsolved outbound HTLC, we fail onchain with our local commitment tx. At this point we expect to get in chain in a worst-case delay of CLTV_CLAIM_BUFFER. When our HTLC-timeout is confirmed with ANTI_REORG_DELAY we may safely fail backward the corresponding inbound output.

When we generate a justice tx, a htlc tx on remote commitment or a htlc tx on local commitment we track them until first conf.

TheBlueMatt · 2019-07-19T21:28:19Z

src/ln/channelmonitor.rs

+			for (input, info) in spend_tx.input.iter_mut().zip(inputs_info.iter()) {
+				let (redeemscript, revocation_key) = sign_input!(sighash_parts, input, info.0, info.1);
+				let height_timer = Self::get_height_timer(height, info.2);
+				match self.our_claim_txn_waiting_first_conf.entry(input.previous_output.clone()) {


Doesn't need to happen in this PR, but in #347 this is gonna bite us - we don't track that all of these inputs are spent in the same transaction, with different expiries, so we may try to bump only one input, and then end up getting that confirmed, leaving nothing in the mempool trying to punish other inputs.

We must adapt our delay between two bumps of claim txn in respect to the timelock encumbering the targeted outpoint. If HTLC or revoked output is near to expire, we should try to get our claim in every block. If it's reasonably in the future, we may give us more latency to bump

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from d3967e7 to d74361c Compare April 10, 2019 22:33

TheBlueMatt added this to the 0.0.9 milestone Apr 22, 2019

TheBlueMatt reviewed May 22, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from d74361c to 3bb62d1 Compare May 22, 2019 19:04

TheBlueMatt reviewed May 30, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from 3bb62d1 to fc06077 Compare May 31, 2019 01:44

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from fc06077 to 47db5be Compare July 1, 2019 18:23

TheBlueMatt reviewed Jul 1, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 6 times, most recently from b33d1df to e3d42ed Compare July 3, 2019 22:37

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from e3d42ed to ca03df9 Compare July 8, 2019 23:20

TheBlueMatt reviewed Jul 9, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 4 times, most recently from df16818 to 0d3575a Compare July 12, 2019 18:03

TheBlueMatt reviewed Jul 17, 2019

View reviewed changes

Antoine Riard added 7 commits July 17, 2019 15:26

Add log_trace on to_remote/to_local inclusion in commitment tx

4b3afdd

Implement block_disconnect for pruning of waiting-conf HTLC updates

587af43

Modify ChainListener API by adding height field to block_disconnect

Delay failure of non-dust HTLC-outputs until solving timeout tx matures

a2b6a76

Fix tests broken by introduced change

Add block_disconnecting tests to cancel HTLC failure updates

c5ee360

Add test_sweep_outbound_htlc_failure_update

Move htlc_updated_waiting_threshold_conf to an OnchainEvent model

041b04c

We need also to track claim tx until their maturation to know when we may safely remove them from could-be-bumped-txn buffer

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch 3 times, most recently from ce5135f to ba511d5 Compare July 18, 2019 23:13

TheBlueMatt reviewed Jul 19, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from ba511d5 to 9d8bfa2 Compare July 19, 2019 20:07

Antoine Riard added 2 commits July 19, 2019 17:19

Add in-flight claim-tx tracking

81cea88

When we generate a justice tx, a htlc tx on remote commitment or a htlc tx on local commitment we track them until first conf.

TheBlueMatt reviewed Jul 19, 2019

View reviewed changes

ariard force-pushed the 2019-04-in-flight-txn-tracking-clean branch from 9d8bfa2 to 757bcc2 Compare July 19, 2019 21:31

TheBlueMatt merged commit 8470e60 into lightningdevkit:master Jul 19, 2019

		@@ -390,6 +393,9 @@ pub struct ChannelMonitor {

		destination_script: Script,


		onchain_events_waiting_threshold_conf: HashMap<u32, Vec<OnchainEvent>>,

		if let &Some(ref local_tx) = &self.current_local_signed_commitment_tx {
		for &(ref htlc, _, ref source) in &local_tx.htlc_outputs {


		add_dynamic_output!(htlc_timeout_tx, 0);
		//TODO: we should delay broadcast of HTLC-timeout tx until locktime - 1 to avoid being bounce off by mempool

2019 04 in flight txn tracking clean #336

2019 04 in flight txn tracking clean #336

Conversation

ariard commented Apr 10, 2019

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented May 22, 2019 • edited Loading

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented May 22, 2019 •

edited

Loading