Skip to content

[arch]: Fix the division of responsibility between ChannelLink and Switch #8634

Open
@ProofOfKeags

Description

@ProofOfKeags

This discussion came up in the process of trying to properly implement #8270. One of the tricky issues with quiescence is that when the channel is quiescent it is UB for us to send any update_* messages to our channel peer. Ordinarily this is taken care of by setting the link's EligibleToUpdate function to false, which the switch will query prior to adding packets to its mailbox. However, this is not the only thing that can cause the link to try and send an update message. Instead the update message can originate from inside the link if we are the exit hop.

This reveals what I believe to be an architectural issue where we have chosen an asymmetry which increases the number of edge cases we have to solve for. Instead, if we were to move the exit hop processing to the switch we clean up this asymmetry. This way, when the switch needs to process an exit hop resolution, if the link on which it needs to issue a settle/fail is currently ineligible for updates it can reuse any retry logic it already must implement to handle the case for when the peer connection is down.


This list will be added to as I discover more consequences of this choice.

Here is my concise list of arguments for why I think this change would be beneficial:

  1. It removes the exit hop edge case from the link
  2. The switch is now the introduction point of circuit construction of a route and exit point of circuit redemption
  3. It detaches the dependency between the invoice registry and the links
  4. It lends itself to solving [bug]: reverse the order inside invoice settlement flow #7463 very naturally
  5. We no longer have to track deferred actions in quiescence state as the switch's existing retry logic would solve it handily
  6. It would simplify the solution to [bug]: Potential Deadlock in HodlInvoice logic #8803 because hodl invoice logic would be performed in the switch.
  7. It would allow us to remove cross-link references from the channel state machine which is a reference leak.

I'm opening this for further discussion. I know that @Roasbeef has some reservations about doing this, but I strongly believe it is the right move for the long term health of the switch/link architecture. I'm also paging @yyforyongyu for opinions here since I have also discussed this out of band with him. Feel free to page anyone else who could productively contribute to this conversation.

The main thing I'd like to sort out is whether or not this is fundamentally a good idea. We can figure out how and when we want to schedule it (maybe never, who knows!) in a different thread. However, if we were designing the system again from scratch, would we design it the way it is, or would we do something like this?

Metadata

Metadata

Assignees

Labels

enhancementImprovements to existing features / behaviourepicIssues created to track large feature development

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions