Skip to content

htlcswitch+peer [1/2]: thread context through in preparation for passing to graph DB calls #9691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: elle-graph
Choose a base branch
from

Conversation

ellemouton
Copy link
Collaborator

@ellemouton ellemouton commented Apr 9, 2025

PR context

This is part of the last step required to complete #9494. We want all calls to the graph DB (and hence, ChannelGraph) to take a context since later, this context will be passed through to any remote graph RPC calls as well as any DB calls to a SQL graph backend.

This PR specifically: This is part 1 of a 2 part PR series that addresses the htlcswitch

In this PR, we tackle the htlcswitch & peer packages. This is split over 2 PRs since it is quite a lot. This PR is part 1. See part 2 for the full picture. This package makes calls to the graph's FetchLastChannelUpdate call. So this PR is all aimed at threading contexts through to the call-sites of this call-back. It leaves 1 context.TODO() in the main LND server which will be addressed in a follow up.

Branch strategy:

This series of PRs is basically a big code refactor. So once 19 is shipped, we can merge any work that has been merged into elle-graph and from then on we can just merge into master.

Copy link
Contributor

coderabbitai bot commented Apr 9, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ellemouton ellemouton self-assigned this Apr 9, 2025
@ellemouton ellemouton changed the title htlcswitch: thread context through in preparation for passing to graph DB calls htlcswitch+peer: thread context through in preparation for passing to graph DB calls Apr 9, 2025
@ellemouton ellemouton force-pushed the ctx4 branch 4 times, most recently from ed0f260 to 00150d6 Compare April 10, 2025 11:10
@ellemouton ellemouton changed the title htlcswitch+peer: thread context through in preparation for passing to graph DB calls htlcswitch+peer [1/2]: thread context through in preparation for passing to graph DB calls Apr 10, 2025
@saubyk saubyk added this to lnd v0.20 Apr 10, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Apr 10, 2025
@ellemouton ellemouton force-pushed the ctx4 branch 2 times, most recently from a6321cc to ea67b33 Compare April 14, 2025 12:29
Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up reviewing #9699 to understand the full pic. Left some questions.

if update == nil {
// Fallback to the non-alias behavior.
var err error
update, err = l.cfg.FetchLastChannelUpdate(l.ShortChanID())
update, err = l.cfg.FetchLastChannelUpdate(ctx, l.ShortChanID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the main purpose is to create ctx to be used in graph related calls, I don't think we need to have this ctx popped up alone the callsite, ending up all the methods in the chain taking a never checked ctx, given we can just create a ctx here via l.cg.Create(context.Background()). And if the link is stopping, that ctx will be canceled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is dangerous to do that because what if the source of this call was not from internal (ie, from channelLink) but was actually triggered from an external system. Then if that system exists, this DB call wont be cancelled since the ctx is coming from the incorrect source

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nding up all the methods in the chain taking a never checked ctx

contexts only need to be "checked" if we are doing something that hangs/retries etc. so anywhere we currently have a select that can hang, then there we should always check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, for example: createFailureWithUpdate is called by CheckHtlcForward which is called by the switch. so not all call-sites are from within the link. It's safer to just have a ctx as a first param to methods so that we dont need to think about the origins of the call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s look at the specifics? Basically,

  • If the link.ctx is done, does it mean the switch.ctx is also done? Nope because other systems can call RemoveLink to shut down the link.
  • If the switch.ctx is done, does it mean the link.ctx is done? Yes, as when switch shuts down, it cleans up all the links in htlcForwarder.

This is because the switch can be viewed as the link manager, and it’s responsible for starting and stopping the links.

Then if that system exists, this DB call wont be cancelled since the ctx is coming from the incorrect source

This is def correct. It was discussed before that maybe we should decouple switch and link. But as for now, we want to maintain the relationship, which means when switch goes down, link goes down.

Like, for example: createFailureWithUpdate is called by CheckHtlcForward which is called by the switch. so not all call-sites are from within the link.

Also correct. Tho given the relationship between switch and link, it does really matter since when switch.ctx is done, link.ctx is also done. I’m also exploring the possibility to make link and private struct managed by switch only.

It’s worth mentioning that things may be different if we took the previous approach, that each Start takes a ctx, which means each system’s internal context can be a child of an external one. Not saying we should go with the previous approach, just wanna mention a simple change there can make a different design choice.

It's safer to just have a ctx as a first param to methods so that we dont need to think about the origins of the call

I’ve been thinking about it these days. It may seem nice to be oblivious about the origin, but I don’t think it’s safe to do so. It gives the external caller the ability to abort actions, while some actions cannot be aborted, it’s case-specific. Good thing is we only have an internal context here, and it’s only used to abort graph-related calls, so one less thing to worry about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is valuable to have a standard pattern for things like ctx. The standard pattern is that the caller's context is passed through for sync calls.

If the link and switch go down together anyways, then i dont see the harm in just keeping this standard pattern rather than now threading through the incorrect context.

while some actions cannot be aborted, it’s case-specific.

ok but yes it is case specific, for anything really critical like that, we should explicitly not pass in the caller's context.

@@ -3207,7 +3210,7 @@ func (l *channelLink) getAliases() []lnwire.ShortChannelID {
// attachFailAliasUpdate sets the link's FailAliasUpdate function.
//
// Part of the scidAliasHandler interface.
func (l *channelLink) attachFailAliasUpdate(closure func(
func (l *channelLink) attachFailAliasUpdate(closure func(ctx context.Context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we can create a ctx via l.cg.Create(context.Background()) and pass it to the closure below so it still satisfies the signature failAliasUpdate, think will bring way less change, and create fewer conflicts in other people's PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment

@@ -1521,7 +1523,7 @@ func (p *Brontide) maybeSendChannelUpdates() {
// to fetch the update to send to the remote peer. If the
// channel is pending, and not a zero conf channel, we'll get
// an error here which we'll ignore.
chanUpd, err := p.cfg.FetchLastChanUpdate(scid)
chanUpd, err := p.cfg.FetchLastChanUpdate(ctx, scid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same re ctx, we can derive it from l.cg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

Both the `htlcswitch` and `peer` systems take a `FetchLastChanUpdate`
call back which underneath will call the `graphBuilder`'s
`GetChannelByID` method which calls the graph DB's
`FetchChannelEdgeByID` method which will eventually take a context.

A couple of `context.TODO()`s are added which will be removed in later
commits.
which calls the `FetchLastChannelUpdate` call back.
In this commit we also derive fresh contexts and then call the `cancel`
functions of those in the `Stop` methods of:
- `InterceptableSwitch`
- `Switch`

This removes two previously added `context.TODO`s and introduces a few
more.
and introduce one more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants