-
Notifications
You must be signed in to change notification settings - Fork 407
Bump Engine : implement RBF for some timely channel transactions #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36b78c4
to
84baef6
Compare
84baef6
to
a5b7ed2
Compare
a5b7ed2
to
0832312
Compare
Concept ACK. Looks pretty clean implementation-wise. Obviously needs rebase and tests, but, well done. |
25dcbcb
to
f48d1c3
Compare
Codecov Report
@@ Coverage Diff @@
## master #347 +/- ##
=========================================
Coverage ? 87.42%
=========================================
Files ? 29
Lines ? 16003
Branches ? 0
=========================================
Hits ? 13990
Misses ? 2013
Partials ? 0
Continue to review full report at Codecov.
|
f48d1c3
to
3fec428
Compare
Rebased 3fec428. Cleaned and added test, still need to extend them to cover all cases. |
3fec428
to
639ba5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One initial comment that should make it easier to review.
src/chain/chaininterface.rs
Outdated
@@ -109,6 +109,11 @@ pub trait FeeEstimator: Sync + Send { | |||
/// * satoshis-per-byte * 250 | |||
/// * ceil(satoshis-per-kbyte / 4) | |||
fn get_est_sat_per_1000_weight(&self, confirmation_target: ConfirmationTarget) -> u64; | |||
/// Gets satoshis of minimum relay fee required per 1000 Weight-Units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need an API for this? It's relatively in-sync across the entire network (and the network would perform terribly if that weren't the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding 4000sat for 1000 weight somewhere ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. Why not? It's not gonna change any time soon, I don't think, or if it does it'll go down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, hardcoding it (instead of reading it from the fuzz input) should remove the full_stack_target changes, which is nice since it saves a bunch of effort regenerating them.
639ba5e
to
c20f644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work here. Just a few notes. Also, for my own sanity, can you rebase on master so that we dont get all the rust warning spam garbage?
src/ln/channelmonitor.rs
Outdated
first_seen_height: u32, | ||
feerate_previous: u64, | ||
soonest_timelock: u32, | ||
per_input_material: HashMap<u32, InputMaterial>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a HashMap or can it be a Vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the reason is to avoid entry duplicata as we may call check_spend_remote multiple times due to block rescanning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood it (at least document what the index in this map is, cause its hella confusing): #347 (comment)
src/ln/channelmonitor.rs
Outdated
match per_outp_material { | ||
&InputMaterial::Revoked { ref script, ref pubkey, ref key, ref is_htlc, ref amount } => { | ||
let sighash_parts = bip143::SighashComponents::new(&bumped_tx); | ||
let sighash = hash_to_message!(&sighash_parts.sighash_all(&bumped_tx.input[i], &script, *amount)[..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long term it would be super super nice if we could DRY up the transaction signing logic so that we use the same codepath to sign txn the first time and after an RBF. OK for now, but really sucks to have duplicate code that could fail differently :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #400
Yeah will address comments tomorrow + harden testing, as said today having some kind of automatic/fuzzing testing for channel_monitor would be great as #327 shows it.. |
0499526
to
ad05ddc
Compare
Looks like the no-full_stack_target-change detection triggered on travis. Will review it anyway over the coming weekend, but would be nice to get a fix for that. |
if *is_htlc { | ||
for (ref outp, claim_tx_data) in self.our_claim_txn_waiting_first_conf.iter() { | ||
outp.write(writer)?; | ||
writer.write_all(&byte_utils::be32_to_array(claim_tx_data.height_timer))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably easier to use the serialization macros and just implement Writable for ClaimTxBumpMaterial, no?
lightning/src/ln/channelmonitor.rs
Outdated
// timelock expiration scale in one claiming tx to save on fees. If this tx doesn't confirm before height timer | ||
// we need to bump it (RFB or CPFP). If an input has been part of an aggregate tx at first claim try, we need to | ||
// keep it within another bumped aggregate tx to comply with RBF rules. | ||
our_claim_txn_waiting_first_conf: HashMap<BitcoinOutPoint, ClaimTxBumpMaterial>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm? I seem to be misunderstanding this - this map is from a spent outpoint to the info on the tx that spent it, but then inside it is a map to the list of other inputs being spent, but those should also go in this map? This seems really likely to end up inconsistent. Can we instead index this by the txid we're waiting on confirmation of (or original txid and a map from bumped txids to the original one), plus a map from outpoints to the original txid? This would remove the need for per_input_material being indexed by the input vout so that it could just be by the input in the transaction, and also would let us create single txn that spend multiple txes (eg one claim tx spending htlc txn plus commitment txn).
The map indexing here seems to be more of a "this is the way it was" instead of "this is a sensible forward-looking indexing" IMO. |
ad05ddc
to
234dc72
Compare
Hardcode min relay fee as its value is fixed on the bitcoin network and updating it would be done really conservatively.
234dc72
to
17e8e94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. One or two more parameters could use some docs, and the test needs fixing.
pending_claim_requests: HashMap<Sha256dHash, ClaimTxBumpMaterial>, | ||
|
||
// Used to link outpoints claimed in a connected block to a pending claim request. | ||
claimable_outpoints: HashMap<BitcoinOutPoint, (Sha256dHash, u32)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document the fields!
// equality between spending transaction and claim request. If true, it means transaction was one our claiming one | ||
// after a security delay of 6 blocks we remove pending claim request. If false, it means transaction wasn't and | ||
// we need to regenerate new claim request we reduced set of stil-claimable outpoints. | ||
pending_claim_requests: HashMap<Sha256dHash, ClaimTxBumpMaterial>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document the fields (what is the Sha256dHash? the original, the latest, the wtxid, the txid, something else?)
lightning/src/ln/channelmonitor.rs
Outdated
return (txn_to_broadcast, (commitment_txid, watch_outputs), spendable_outputs); | ||
} | ||
|
||
let sighash_parts = bip143::SighashComponents::new(&spend_tx); | ||
|
||
let mut per_input_material = HashMap::with_capacity(spend_tx.input.len()); | ||
let height_timer = Self::get_height_timer(height, height + inputs_info[0].2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the second argument suddenly gain a height+ here but didn't before (and other calls dont have it, though luckily tests fail if you remove it?)?
Also, why is OK to not check per-input and only check for the first non-HTLC input here?
Mostly, tbh, I dont remember what height_timer is, and its not documented (probably my fault, but should be easy to add a comment)
lightning/src/ln/channelmonitor.rs
Outdated
return (txn_to_broadcast, (commitment_txid, watch_outputs), spendable_outputs); | ||
} | ||
|
||
let sighash_parts = bip143::SighashComponents::new(&spend_tx); | ||
|
||
let mut per_input_material = HashMap::with_capacity(spend_tx.input.len()); | ||
let mut soonest_timelock = 0xFFFFFFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: std::u32::MAX
d7555b1
to
d08a0d5
Compare
Add claimable_outpoints maps. Both structures are tied and should ensure their mutual consistency. Pending_claim_requests is cached by original claim txid. Medatada and per input material should be constant between bumped transactions, only change should be partial-claiming of outpoints set and block reorgs. Due to RBF rules, if an input has been part of an aggregate tx at first claim try, if we want the bumped tx to land nicely in the mempool, inputs should be distributed in multiple bumped tx but still be aggregate in a new bumped tx.
As local onchain txn are already monitored in block_connected by check_spend_local_transaction, it's useless to generate twice pending claims for HTLC outputs on local commitment tx. We could do the alternative.
Add RBF-bumping of justice txn, given they are only signed by us we can RBF at wish. Aggregation of bump-candidates and more aggresive bumping heuristics are left open Fix tests broken by introduction of more txn broadcast. Some tests may have a relaxed check (claim_htlc_ouputs_single_tx) as broadcast bumped txn are now interwining in previous broadcast ones and breaking simple expectations Use bumping engine to rebuild claiming transaction in case of partial- claim of its outpoints set.
Given they are only signed by us we can RBF at wish Fix tests broken by introduction of more txn broadcast (channel_monitor_network_test) Add locktime in RemoteHTLC as it's needed to generate timeout txn.
Test multiple rounds of 25% heuristic in bump_claim_tx on remote revoked commitment txn with htlcs pending in both directions.
A pending claim request may contain a set of multiple outpoints. If one or multiple of them get claimed by remote party, our in-flight claiming transactions aren't valid anymore so we need to react quickly and regenerate claiming transaction with accurate set. However, a claimed outpoint may be disconnected and we need to resurrect back outpoint among set of orignal pending claim request. To guarantee consistency of contentious claimed outpoint we cache it as OnchainEvent::ContentionsOutpoint and only delete it after ANTI_REORG_DELAY. Fix test broken by change, partial claiming on revoked txn force us to regenerate txn
d08a0d5
to
9284a2b
Compare
Will take as #414 |
Based on #336, it adds a bump_claim_tx function called in block_connected.
Every claim tx we broadcast is stored in a buffer with a pair tracking the outpoint and keeping tx material (like key, script, amount...). When we reach the given height, it means our claim txn is still in flight (likely in mempools) and its feerate isn't enough. To avoid it stucking beyond expiration of CSV/CLTV timelocks, we rebuild a claim tx and bump it using RBF.
First heuristic is using the new HighPriority estimation of the fee estimator.
Second one, is increasing the feerate by 25% compare to last claim tx broadcast.
Currently, without further protocol modifications, we can't RBF Local Commitment Transaction, no more our HTLC-Success/HTLC-Timeout transactions