Skip to content

Sweep inputs even the budget cannot be covered #9627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 27, 2025

Conversation

yyforyongyu
Copy link
Member

This PR fixes the issue of the sweep being delayed due to the current wallet UTXOs not being able to cover the full budget. An example case is this sweeping tx, in which an HTLC output was swept with overshooting fees. This happened because the sweeper requires the budget to be fully covered before attempting the sweep, otherwise the sweeper will wait until more wallet UTXOs are available - by that time, the deadline may already be very close, or even passed, causing the full budget to be used up at once.

This PR changes the all or nothing behavior by starting the sweeping process asap if the budget can be covered partially. Later on, when there are more wallet UTXOs, the sweeper will add them to make up the rest of the budget.

@yyforyongyu yyforyongyu added utxo sweeping size/kilo medium, proper context needed, less than 1000 lines labels Mar 21, 2025
@yyforyongyu yyforyongyu added this to the v0.19.0 milestone Mar 21, 2025
@yyforyongyu yyforyongyu self-assigned this Mar 21, 2025
Copy link
Contributor

coderabbitai bot commented Mar 21, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ziggie1984 ziggie1984 self-requested a review March 21, 2025 18:48
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass done

Will need another look to evaluate the sweeper changes tho.

So when an input set cannot cover its fees, the inputs will be marked as failed, but the input set will still remain in memory and we are going to add another wallet input when another block arrives ?

rpcserver.go Outdated
for _, htlc := range dbChannel.RemoteCommitment.Htlcs {
remoteHTLCs.Add(htlc.RHash)
remoteHTLCs.Add(htlc.HtlcIndex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that won't work either because incoming and outgoing htlcs start a new counter so the counter are not unique, you either need a comibnation of the hash+index, or incoming_bool + index to fix it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah right - fixed!

Comment on lines 1859 to 1860
// fee rate for the next sweep attempt if the inputs are to be retried. An error
// is returned when the fee func is nil and created without error, otherwise an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error is returned when the fee func is nil and created without error :=> can you rephrase this I am unsure I understand what you are trying to say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

feeFunc = f
}

// Since the sweeping tx has been replaced by another party's tx, we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is this always the case that a sweep is replaced by another party, I mean we can also just run out of budget or ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, the change makes intuitive sense: we should at least attempt a sweep even if we don't have all the wallet inputs we need to publish at our ideal fee rate.

@@ -9085,5 +9091,10 @@ func (r *rpcServer) getChainSyncInfo() (*chainSyncInfo, error) {
// Overwrite isSynced and return.
info.isSynced = height == bestHeight

if !info.isSynced {
rpcsLog.Debugf("Blockbeat is not synced to height %v yet",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So simply logging this slows things down enough for a flake to not occur?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope - it's a rare flake that caused the syncing to be blocked, but there was little logging in that build. I added more debug logs such that if it happens again, we can have more info here.

@yyforyongyu
Copy link
Member Author

So when an input set cannot cover its fees, the inputs will be marked as failed, but the input set will still remain in memory and we are going to add another wallet input when another block arrives ?

Yeah except it's not the input set, but the inputs will remain and we will attempt again in another block. These inputs will be put in a new input set.

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, still two open question from my side then it's gtg:

  1. Can we also fire the sweep if we have no walletInputs at all but wallet inputs are needed to cover the full budget

  2. Limit the budget depending on the MaxFeeRate so that we do not use WalletInputs which are not needed in the first place because we will never use up the full wallet balance.

@@ -1893,3 +1854,59 @@ func (t *TxPublisher) handleReplacementTxError(r *monitorRecord,

return result
}

// calculateRetryFeeRate calculates a new fee rate to be used as the starting
// fee rate for the next sweep attempt if the inputs are to be retried. When the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this function to be used, do the inputs all need to be "retried inputs" or can even new inputs be swept using this function when regrouped with "retried inputs" ? This might sweep new inputs if regrouped with new inputs at a higher fee than necessary ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a tradeoff here - either we retry the sweeping with a starting fee rate of 0, which leaves the new inputs on the fee func line, or we keep the retried inputs on the line by using a starting fee rate derived from its last attempt, and keep the new inputs using the larger starting fee rate. This will be fixed if we have the ability to sweep by groups, and before that's fixed, I choose the latter approach since it's unlikely new inputs will be grouped given their deadlines usually vary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand thank you for the explanation.

// - budget: 1000 sats.
//
// NOTE: The starting fee rate is 0 instead of 1 because the budget has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here, how can the starting feerate be 0, you mean not set at all and it will be set later in the fee_function ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is explained below - the sweeper will update the starting fee rate to the last attempted fee rate, which in this case is 0, because we don't set it when error out in ErrMaxPosistion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a design decision, I wonder why you not just capped the value at the max feerate rather than not setting it at all, for me the former feels more intuative.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok decided to change the behavior. So previously two things were happening,

  • we update the starting fee rate when there's TxFailed event to prepare for the next fee bump
  • we set the next fee rate in the fee bump it when the budget is not enough or the inputs are not enough
    This was why the itest was updated -this particular fee bump will fail with max position error, which means a) it's a failed event, so the fee rate will be updated and, b) it's a max position error, so the next fee rate was not set in the bump result, hence when we update the starting fee rate, it will be set to 0 (empty value).

This is now changed that, if there's a max position error, we always set the next fee rate before returning the fee bump result to the sweeper to handle.

resp.UnconfirmedBalance + resp.ConfirmedBalance,
)

fee := walletBalance*2 - balance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why the *2 here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok updated the desc to make them more clear


// Assert the above sweeping tx is still in the mempool.
ht.AssertTxInMempool(sweepTx2.TxHash())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What would happen in case of a restart and a new utxo sweep being added to the group ?

Copy link
Member Author

@yyforyongyu yyforyongyu Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case I think they will be batched together to create a new sweeping tx.

// Fund Alice 200k sats, which will be used to cover the budget.
//
// TODO(yy): We are funding Alice more than enough - at this stage Alice
// has a confirmed UTXO of `walletBalance`` amount in her wallet, so
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: `walletBalance`` => `walletBalance`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 2497 to 2499
// since the confirmed wallet UTXO has already been used in sweepTx2,
// there's no easy way to tell her wallet to reuse that UTXO in the
// upcoming sweeping tx.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: I wonder what would happen in a restart, would the walletUTXO still be considered 0-conf and not be used ?

@yyforyongyu
Copy link
Member Author

Limit the budget depending on the MaxFeeRate so that we do not use WalletInputs which are not needed in the first place because we will never use up the full wallet balance.

There's already a TODO for the coin selection strategy, which is out of scope here.

@yyforyongyu yyforyongyu requested a review from Roasbeef March 25, 2025 02:40
@yyforyongyu yyforyongyu force-pushed the sweep-under-budget branch 2 times, most recently from e8630f7 to b0a9c14 Compare March 25, 2025 05:36
@yyforyongyu yyforyongyu requested a review from ziggie1984 March 25, 2025 07:03
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🫡 - had only some final non-blocking comments.

@@ -296,7 +296,9 @@ func (b *BudgetInputSet) NeedWalletInput() bool {
}

// Get the amount left after covering the input's own budget.
// This amount can then be lent to the above input.
// This amount can then be lent to the above input. For a wallet
// input, its `Budget`` is set to zero, which means the whole
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Budget`` => Budget`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

_, err := feeFunc.Increment()
if err != nil {
// The fee function has reached its max position - nothing we
// can do here other than letting the user increase the budget.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised lately that according to my taste we work way to much with comments rather than using specific errors which we can filter for. Would be great to have like a max_position error rather than having a detailed comment that it can only be the max_position error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah totally agree - I actually thought about whether it's possible to refactor the method so it doesn't even return an error, then add a new method like MaxPositionReached and use it in other places when we want to check whether we've reached the end of the fee function. I think it works, tho it now means we now need to care about the state of the fee function, but I guess we already do implicitly.

As for here, I think we want to skip any error returned here, plus the error log, should give us enough info about what happened.

Moving forward, I think I will try to avoid error equality checks as much as possible, it's like try catch, think there's always a way to refactor the error out of existence in the first place.

// The wallet doesn't have enough utxos to cover the budget. Revert the
// input set to its original state.
b.inputs = originalInputs
log.Warn("Not enough wallet UTXOs to cover the budget, sweeping " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add the amount we are short for the budget in the logs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually a bit involved, but think it can be very helpful so added the log,

2025-03-26 16:27:58.316 [WRN] SWPR: Not enough wallet UTXOs: need budget=0.00020000 BTC, has spendable=0.00010000 BTC, total=0.00020000 BTC, missing at least 0.00010000 BTC, sweeping anyway...

@@ -1893,3 +1854,59 @@ func (t *TxPublisher) handleReplacementTxError(r *monitorRecord,

return result
}

// calculateRetryFeeRate calculates a new fee rate to be used as the starting
// fee rate for the next sweep attempt if the inputs are to be retried. When the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understand thank you for the explanation.

// - budget: 1000 sats.
//
// NOTE: The starting fee rate is 0 instead of 1 because the budget has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a design decision, I wonder why you not just capped the value at the max feerate rather than not setting it at all, for me the former feels more intuative.


// Fund Alice 200k sats, which will be used to cover the budget.
//
// TODO(yy): We are funding Alice more than enough - at this stage Alice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would propose creating an issue for this TODO, otherwise it might become forgotten in the itest files. Wallet inputs are scarce and we should use them efficiently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a nit, we need an issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage I don't know who else is gonna fix that...but yeah it's the 3rd item listed here #8680.

Think I will create an issue too in case this gets to be the bitcoin summer internship project.

// send a TxFailed event so these inputs can be retried when the wallet
// has more UTXOs.
case errors.Is(err, ErrNotEnoughInputs),
errors.Is(err, ErrNotEnoughBudget):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ErrNotEnoughBudget really belong here? Previously this error was considered TxFatal so the user can increase the budget if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good q - I think it actually needs to be TxFailed since the inputs will be removed when it's TxFatal. I guess this in theory should never happen because we derive the fees based on the budget first, so the fees should never be greater than the budget, unless there's a bug in the tx weight calculation.

// - budget: 1000 sats.
//
// NOTE: The starting fee rate is 0 instead of 1 because the budget has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand this change.

@@ -247,7 +247,15 @@ func (h *HarnessTest) GetBestBlock() (*chainhash.Hash, int32) {

// MineBlockWithTx mines a single block to include the specifies tx only.
func (h *HarnessTest) MineBlockWithTx(tx *wire.MsgTx) *wire.MsgBlock {
return h.miner.MineBlockWithTx(tx)
// Update the harness's current height.
defer h.updateCurrentHeight()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a defer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a pattern we use in all MineBlockXXX methods


// Fund Alice 200k sats, which will be used to cover the budget.
//
// TODO(yy): We are funding Alice more than enough - at this stage Alice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a nit, we need an issue for this.

A minor refactor to prepare for upcoming changes.
We now always create the sweeping tx even though the budget cannot be
covered so we don't miss the deadline. Note that the fee bump will fail
once the provided wallet input cannot cover the increase fees, which is
fine as these inputs will be marked as failed and be retried again in
the next block. When that happens, if there are new wallet UTXOs, a new
batch will be created to perform the fee bump.
A minor refactor to prepare the upcoming changes.
We now return the next retry fee rate in `TxFailed` event in
`TxPublisher`. When handling the event, `UtxoSweeper` will update the
inputs to make sure the starting fee rate is set before attempting the
next sweep.
Make sure we assertPendingSweepResp in a wait call to wait for the
updated resp.
This is used is a following test.
Make sure we update the harness's current height and assert nodes have
been synced. Also fixes some typo found.
This is added to fix a flake found in starting the node.
@yyforyongyu yyforyongyu merged commit 15dbc43 into lightningnetwork:master Mar 27, 2025
34 of 36 checks passed
@yyforyongyu yyforyongyu deleted the sweep-under-budget branch March 27, 2025 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/kilo medium, proper context needed, less than 1000 lines utxo sweeping
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants