Skip to content

Bump Engine : implement RBF for some timely channel transactions #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

ariard
Copy link

@ariard ariard commented Jul 8, 2019

Based on #336, it adds a bump_claim_tx function called in block_connected.

Every claim tx we broadcast is stored in a buffer with a pair tracking the outpoint and keeping tx material (like key, script, amount...). When we reach the given height, it means our claim txn is still in flight (likely in mempools) and its feerate isn't enough. To avoid it stucking beyond expiration of CSV/CLTV timelocks, we rebuild a claim tx and bump it using RBF.

First heuristic is using the new HighPriority estimation of the fee estimator.
Second one, is increasing the feerate by 25% compare to last claim tx broadcast.

Currently, without further protocol modifications, we can't RBF Local Commitment Transaction, no more our HTLC-Success/HTLC-Timeout transactions

@ariard ariard force-pushed the 2019-07-bump-claims branch from 36b78c4 to 84baef6 Compare July 9, 2019 15:12
@TheBlueMatt TheBlueMatt added this to the 0.0.10 milestone Jul 19, 2019
@ariard ariard force-pushed the 2019-07-bump-claims branch from 84baef6 to a5b7ed2 Compare July 22, 2019 20:46
@ariard ariard force-pushed the 2019-07-bump-claims branch from a5b7ed2 to 0832312 Compare July 31, 2019 17:49
@TheBlueMatt
Copy link
Collaborator

Concept ACK. Looks pretty clean implementation-wise. Obviously needs rebase and tests, but, well done.

@ariard ariard force-pushed the 2019-07-bump-claims branch 2 times, most recently from 25dcbcb to f48d1c3 Compare August 1, 2019 15:14
@codecov
Copy link

codecov bot commented Aug 1, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@01ae452). Click here to learn what that means.
The diff coverage is 70.72%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #347   +/-   ##
=========================================
  Coverage          ?   87.42%           
=========================================
  Files             ?       29           
  Lines             ?    16003           
  Branches          ?        0           
=========================================
  Hits              ?    13990           
  Misses            ?     2013           
  Partials          ?        0
Impacted Files Coverage Δ
src/chain/chaininterface.rs 79.8% <ø> (ø)
src/ln/functional_test_utils.rs 94.37% <100%> (ø)
src/util/test_utils.rs 54.24% <100%> (ø)
src/ln/channel.rs 84.47% <25%> (ø)
src/ln/channelmonitor.rs 85.86% <59.8%> (ø)
src/ln/functional_tests.rs 96.06% <97.7%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01ae452...0499526. Read the comment docs.

@ariard ariard force-pushed the 2019-07-bump-claims branch from f48d1c3 to 3fec428 Compare October 26, 2019 00:21
@ariard
Copy link
Author

ariard commented Oct 26, 2019

Rebased 3fec428.

Cleaned and added test, still need to extend them to cover all cases.

@ariard ariard force-pushed the 2019-07-bump-claims branch from 3fec428 to 639ba5e Compare October 26, 2019 01:11
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One initial comment that should make it easier to review.

@@ -109,6 +109,11 @@ pub trait FeeEstimator: Sync + Send {
/// * satoshis-per-byte * 250
/// * ceil(satoshis-per-kbyte / 4)
fn get_est_sat_per_1000_weight(&self, confirmation_target: ConfirmationTarget) -> u64;
/// Gets satoshis of minimum relay fee required per 1000 Weight-Units.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need an API for this? It's relatively in-sync across the entire network (and the network would perform terribly if that weren't the case).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding 4000sat for 1000 weight somewhere ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. Why not? It's not gonna change any time soon, I don't think, or if it does it'll go down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, hardcoding it (instead of reading it from the fuzz input) should remove the full_stack_target changes, which is nice since it saves a bunch of effort regenerating them.

@ariard ariard force-pushed the 2019-07-bump-claims branch from 639ba5e to c20f644 Compare November 4, 2019 03:38
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work here. Just a few notes. Also, for my own sanity, can you rebase on master so that we dont get all the rust warning spam garbage?

first_seen_height: u32,
feerate_previous: u64,
soonest_timelock: u32,
per_input_material: HashMap<u32, InputMaterial>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a HashMap or can it be a Vec?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the reason is to avoid entry duplicata as we may call check_spend_remote multiple times due to block rescanning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood it (at least document what the index in this map is, cause its hella confusing): #347 (comment)

match per_outp_material {
&InputMaterial::Revoked { ref script, ref pubkey, ref key, ref is_htlc, ref amount } => {
let sighash_parts = bip143::SighashComponents::new(&bumped_tx);
let sighash = hash_to_message!(&sighash_parts.sighash_all(&bumped_tx.input[i], &script, *amount)[..]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long term it would be super super nice if we could DRY up the transaction signing logic so that we use the same codepath to sign txn the first time and after an RBF. OK for now, but really sucks to have duplicate code that could fail differently :(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #400

@ariard
Copy link
Author

ariard commented Nov 13, 2019

Yeah will address comments tomorrow + harden testing, as said today having some kind of automatic/fuzzing testing for channel_monitor would be great as #327 shows it..

@ariard ariard force-pushed the 2019-07-bump-claims branch 2 times, most recently from 0499526 to ad05ddc Compare November 26, 2019 18:00
@TheBlueMatt
Copy link
Collaborator

Looks like the no-full_stack_target-change detection triggered on travis. Will review it anyway over the coming weekend, but would be nice to get a fix for that.

if *is_htlc {
for (ref outp, claim_tx_data) in self.our_claim_txn_waiting_first_conf.iter() {
outp.write(writer)?;
writer.write_all(&byte_utils::be32_to_array(claim_tx_data.height_timer))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably easier to use the serialization macros and just implement Writable for ClaimTxBumpMaterial, no?

// timelock expiration scale in one claiming tx to save on fees. If this tx doesn't confirm before height timer
// we need to bump it (RFB or CPFP). If an input has been part of an aggregate tx at first claim try, we need to
// keep it within another bumped aggregate tx to comply with RBF rules.
our_claim_txn_waiting_first_conf: HashMap<BitcoinOutPoint, ClaimTxBumpMaterial>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? I seem to be misunderstanding this - this map is from a spent outpoint to the info on the tx that spent it, but then inside it is a map to the list of other inputs being spent, but those should also go in this map? This seems really likely to end up inconsistent. Can we instead index this by the txid we're waiting on confirmation of (or original txid and a map from bumped txids to the original one), plus a map from outpoints to the original txid? This would remove the need for per_input_material being indexed by the input vout so that it could just be by the input in the transaction, and also would let us create single txn that spend multiple txes (eg one claim tx spending htlc txn plus commitment txn).

@TheBlueMatt
Copy link
Collaborator

The map indexing here seems to be more of a "this is the way it was" instead of "this is a sensible forward-looking indexing" IMO.

@ariard ariard force-pushed the 2019-07-bump-claims branch from ad05ddc to 234dc72 Compare December 4, 2019 06:52
Antoine Riard added 2 commits December 4, 2019 17:21
Hardcode min relay fee as its value is fixed on the bitcoin network
and updating it would be done really conservatively.
@ariard ariard force-pushed the 2019-07-bump-claims branch from 234dc72 to 17e8e94 Compare December 5, 2019 01:13
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. One or two more parameters could use some docs, and the test needs fixing.

pending_claim_requests: HashMap<Sha256dHash, ClaimTxBumpMaterial>,

// Used to link outpoints claimed in a connected block to a pending claim request.
claimable_outpoints: HashMap<BitcoinOutPoint, (Sha256dHash, u32)>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the fields!

// equality between spending transaction and claim request. If true, it means transaction was one our claiming one
// after a security delay of 6 blocks we remove pending claim request. If false, it means transaction wasn't and
// we need to regenerate new claim request we reduced set of stil-claimable outpoints.
pending_claim_requests: HashMap<Sha256dHash, ClaimTxBumpMaterial>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the fields (what is the Sha256dHash? the original, the latest, the wtxid, the txid, something else?)

return (txn_to_broadcast, (commitment_txid, watch_outputs), spendable_outputs);
}

let sighash_parts = bip143::SighashComponents::new(&spend_tx);

let mut per_input_material = HashMap::with_capacity(spend_tx.input.len());
let height_timer = Self::get_height_timer(height, height + inputs_info[0].2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the second argument suddenly gain a height+ here but didn't before (and other calls dont have it, though luckily tests fail if you remove it?)?

Also, why is OK to not check per-input and only check for the first non-HTLC input here?

Mostly, tbh, I dont remember what height_timer is, and its not documented (probably my fault, but should be easy to add a comment)

return (txn_to_broadcast, (commitment_txid, watch_outputs), spendable_outputs);
}

let sighash_parts = bip143::SighashComponents::new(&spend_tx);

let mut per_input_material = HashMap::with_capacity(spend_tx.input.len());
let mut soonest_timelock = 0xFFFFFFFF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::u32::MAX

Antoine Riard added 2 commits December 6, 2019 18:29
Add claimable_outpoints maps.

Both structures are tied and should ensure their mutual consistency.

Pending_claim_requests is cached by original claim txid. Medatada
and per input material should be constant between bumped transactions,
only change should be partial-claiming of outpoints set and block
reorgs.

Due to RBF rules, if an input has been part of an aggregate tx
at first claim try, if we want the bumped tx to land nicely
in the mempool, inputs should be distributed in multiple
bumped tx but still be aggregate in a new bumped tx.
As local onchain txn are already monitored in block_connected by
check_spend_local_transaction, it's useless to generate twice
pending claims for HTLC outputs on local commitment tx.

We could do the alternative.
Antoine Riard added 7 commits December 6, 2019 18:30
Add RBF-bumping of justice txn, given they are only signed by us we
can RBF at wish.

Aggregation of bump-candidates and more aggresive bumping heuristics
are left open

Fix tests broken by introduction of more txn broadcast.
Some tests may have a relaxed check (claim_htlc_ouputs_single_tx)
as broadcast bumped txn are now interwining in previous broadcast ones
and breaking simple expectations

Use bumping engine to rebuild claiming transaction in case of partial-
claim of its outpoints set.
Given they are only signed by us we can RBF at wish

Fix tests broken by introduction of more txn broadcast
(channel_monitor_network_test)

Add locktime in RemoteHTLC as it's needed to generate
timeout txn.
Test multiple rounds of 25% heuristic in bump_claim_tx on remote revoked commitment
txn with htlcs pending in both directions.
A pending claim request may contain a set of multiple outpoints.
If one or multiple of them get claimed by remote party, our in-flight
claiming transactions aren't valid anymore so we need to react
quickly and regenerate claiming transaction with accurate set.

However, a claimed outpoint may be disconnected and we need to resurrect
back outpoint among set of orignal pending claim request.

To guarantee consistency of contentious claimed outpoint we cache it
as OnchainEvent::ContentionsOutpoint and only delete it after
ANTI_REORG_DELAY.

Fix test broken by change, partial claiming on revoked txn
force us to regenerate txn
@ariard ariard force-pushed the 2019-07-bump-claims branch from d08a0d5 to 9284a2b Compare December 7, 2019 02:16
@TheBlueMatt
Copy link
Collaborator

Will take as #414

@TheBlueMatt TheBlueMatt closed this Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants