Remove peers_needing_send set from peer_handling #456

jkczyz · 2020-01-22T00:23:43Z

Examining the peer_handling module reveals that the peers_needing_send set either can be replaced by a needing_send field on Peer (to reduce complexity) or removed entirely. The set's documentation is unclear in that it claims that peers_needing_send is used by do_read_event() when a message is pushed onto the send buffer but do_attempt_write_data() is not called to avoid reentrancy.

However, do_attempt_write_data() is called unconditionally following the read loop within do_read_event(). It's possible that do_attempt_write_data() is not called in the case of an early return (e.g., when use of the try_potential_handleerror! macro sends an error message and returns). But if this is the only case, then the members of peers_needing_send would be overly broad.

Furthermore, removing the places where peers_needing_send is populated does not cause any tests to fail. Therefore, this PR is an attempt to understand the intended usage of peers_needing_send, determine if it can be simplified or removed, clarify the documentation, and add any missing tests for the intended behavior.

jkczyz · 2020-01-22T00:24:36Z

lightning/src/ln/peer_handler.rs

-													peers.peers_needing_send.insert(peer_descriptor.clone());
+													peer.needing_send = true;


@TheBlueMatt Could you explain why this needs to be set here?

If we need to stream out our routing db, we (obviously) dont copy the whole thing into our outbound send buff (otherwise we'd just OOM ourselves...), so the logic in do_attempt_write_data has to be hit to add messages to the send buffer if the buffer is empty.

TheBlueMatt

Right, the comments definitely hint at a old world where we were trying to be really hard to avoid reentrancy in the peer_handler, and gave up cause it was only barely practical. AFAICT, its not needed at all anymore.

As you note, peer_handler is currently woefully under-tested, likely the least tested area of the code.

TheBlueMatt · 2020-01-22T03:08:06Z

lightning/src/ln/peer_handler.rs

-													peers.peers_needing_send.insert(peer_descriptor.clone());
+													peer.needing_send = true;


If we need to stream out our routing db, we (obviously) dont copy the whole thing into our outbound send buff (otherwise we'd just OOM ourselves...), so the logic in do_attempt_write_data has to be hit to add messages to the send buffer if the buffer is empty.

jkczyz · 2020-01-22T20:02:36Z

AFAICT, its not needed at all anymore.

By "it" do you mean peers_needing_send or the need to avoid reentrancy (i.e., "it" from the previous sentence)? I'm assuming the former but want to make sure I understand.

TheBlueMatt · 2020-01-22T20:21:05Z

Right, yea, the whole thing. AFAICT (without digging into git history) I gave up on avoiding reentrancy and just did the Easy Thing, so the whole set seems to be completely useless?

jkczyz · 2020-01-22T22:23:50Z

Right, yea, the whole thing. AFAICT (without digging into git history) I gave up on avoiding reentrancy and just did the Easy Thing, so the whole set seems to be completely useless?

At very least removing the code doesn't cause any test failures.

Could you help me understand what sort of reentrancy you were attempting to avoid? I see that #217 removed the call to process_events at the end of do_read_event in order to avoid sending data while the do_read_event call is outstanding. More specifically: How are you defining reentrancy in this context? Could you give me a scenario that you were attempting to avoid but ultimately were unable to avoid?

TheBlueMatt · 2020-01-22T23:41:02Z

Specifically, both read_event and write_event may call back into user code (via the Descriptor objects), as indicated in their docs. In general, reentrancy scares me cause, as a user using a library, it can be very hard to get right, and very easy to add hard-to-identify bugs, but there also isn't tooo much that can be obviously done here.

jkczyz · 2020-01-23T02:55:13Z

Thanks for the clarification. My understanding then is that the warnings about reentrancy are essentially telling users to avoid calling PeerHandler methods from your SocketDescriptor methods.

And, if I'm reading 294ad32 correctly, the thought was that calls to read_event (through do_read_event) would not be reentrant since the call to process_events (which calls send_data via do_attempt_write_data) was removed. However, this did not have the desired effect since do_read_event still called do_attempt_write_data directly.

What I'm having trouble understanding is how we planned on avoiding reentrancy. Was the thought to remove any transitive calls of send_data from read_event and write_event and only call send_data via process_events?

TheBlueMatt · 2020-01-23T03:32:27Z

What I'm having trouble understanding is how we planned on avoiding reentrancy. Was the thought to remove any transitive calls of send_data from read_event and write_event and only call send_data via process_events?

Yea, I think that was the goal, at least for read_event, the whole point of write_event is to call back into send_data (as write_event is a "backpressure complete, buffer space available, please write" event, really), so that would be a bit confusing. I guess we could go both ways with this - either finish the work and remove the reentrancy, which sounds nice when you point out that maybe we're pretty close(?), or move just drop the whole hassle and live with it.

From a client perspective, I could totally see confusion in that you could put a lock in your SocketDescriptor and end up with a deadlock when reading messages trying to lock it just to write something?

Using the peers_needing_send HashSet to determine which peers have any outstanding writes adds complexity without much benefit. Instead, add a needing_send field to the Peer struct to accomplish the same thing.

Variables named using the pronoun 'us' are confusing as they are overloaded for Connection and SocketDescriptor, often inconsistently. Use 'connection' and 'descriptor' instead.

jkczyz · 2020-02-04T00:18:47Z

I've been reading through lightning-net-tokio to try to get a better sense of how a user would use PeerManager. It looks pretty complex, though I know you've alluded to the fact that this could be simplified. However, I'd like to explore whether we can provide a simpler interface for the user to implement. I have some ideas but don't have any concrete suggestions just yet.

First, I'd like to get a better sense behind some of the design decisions that went into PeerManager. I see that most of PeerManager's public methods take a SocketDescriptor to look up a Peer to act upon. Additionally, all peers use the same ChannelMessageHandler. Thus, in process_events the node_id_to_descriptor map is needed to look up which peer the event is associated with.

Rather than using PeerManager in this manner, could the user be given a Peer struct (for each peer) with similar methods and where each peer has their own ChannelMessageHandler? The peer itself could also have a reference to its SocketDescriptor. Thus, much of the PeerManager functionality could be pushed to Peer, removing some of the complexity around locking and lookup.

Feel free to discuss with me offline and I can circle back to this PR.

julianknutsen · 2020-09-10T01:12:28Z

@jkczyz @TheBlueMatt

Took me a while to find this PR, but I wanted to let you know that I've readded the reentrancy requirement as part of eae0904. I found the bug when doing the unit tests and seeing that the docs still mentioned it shouldn't call send_data(). After my patches, read_event() is reentrant-free and there are plenty of tests to catch it in the future.

I actually like the idea of removing the separate set in favor of a Peer scan with a flag to clean up a lot of the internal functions that pass it around or reference it through self. I can cherry-pick these commits and see how they look on top of my work.

I'll make sure to reference this PR when it makes it out of the initial review on top of #494 so they can be closed out together.

jkczyz · 2020-09-10T15:41:19Z

Thanks for the heads up @julianknutsen and for addressing the reentrancy. As noted in the PR description, I was primarily trying to understand the peer handling code. Feel free to grab 9f875f4 for your work.

Motivated by lightningdevkit#456, remove the peers_needing_send set in favor of just scanning the peers during process_events() and attempting to send data for those peers that have items in their outbound queue or pending sync items to be sent out.

`Julian Knutsen <[email protected]>` pointed out in a previous discussion that `read_event` can reenter user code despite the documentation stating explicitly that it will not. This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but its somewhat simpler to just drop the `do_attempt_write_data` call. Ideally we could land most of Julian's work, but its still in need of substantial git history cleanup to get it in a reviewable state and this solves the immediate issue.

TheBlueMatt · 2021-06-23T16:27:39Z

Done in #948.

jkczyz commented Jan 22, 2020

View reviewed changes

TheBlueMatt reviewed Jan 22, 2020

View reviewed changes

jkczyz added 2 commits February 1, 2020 18:24

Indicate outstanding writes to peers directly

9f875f4

Using the peers_needing_send HashSet to determine which peers have any outstanding writes adds complexity without much benefit. Instead, add a needing_send field to the Peer struct to accomplish the same thing.

Remove overloaded 'us' in lightning-net-tokio

7a03fe0

Variables named using the pronoun 'us' are confusing as they are overloaded for Connection and SocketDescriptor, often inconsistently. Use 'connection' and 'descriptor' instead.

jkczyz force-pushed the 2020-01-peers-needing-send branch from eb24218 to 7a03fe0 Compare February 4, 2020 00:17

jkczyz mentioned this pull request May 18, 2020

Extract encode_and_send_msg into a method. #624

Merged

TheBlueMatt closed this Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove peers_needing_send set from peer_handling #456

Remove peers_needing_send set from peer_handling #456

Uh oh!

jkczyz commented Jan 22, 2020

Uh oh!

jkczyz Jan 22, 2020

Uh oh!

TheBlueMatt Jan 22, 2020

Uh oh!

TheBlueMatt left a comment •

edited

Loading

Uh oh!

TheBlueMatt Jan 22, 2020

Uh oh!

jkczyz commented Jan 22, 2020

Uh oh!

TheBlueMatt commented Jan 22, 2020

Uh oh!

jkczyz commented Jan 22, 2020

Uh oh!

TheBlueMatt commented Jan 22, 2020

Uh oh!

jkczyz commented Jan 23, 2020

Uh oh!

TheBlueMatt commented Jan 23, 2020

Uh oh!

jkczyz commented Feb 4, 2020

Uh oh!

julianknutsen commented Sep 10, 2020

Uh oh!

jkczyz commented Sep 10, 2020

Uh oh!

TheBlueMatt commented Jun 23, 2021

Uh oh!

Uh oh!

		peers.peers_needing_send.insert(peer_descriptor.clone());
		peer.needing_send = true;

Remove peers_needing_send set from peer_handling #456

Remove peers_needing_send set from peer_handling #456

Uh oh!

Conversation

jkczyz commented Jan 22, 2020

Uh oh!

jkczyz Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Jan 22, 2020

Choose a reason for hiding this comment

Uh oh!

jkczyz commented Jan 22, 2020

Uh oh!

TheBlueMatt commented Jan 22, 2020

Uh oh!

jkczyz commented Jan 22, 2020

Uh oh!

TheBlueMatt commented Jan 22, 2020

Uh oh!

jkczyz commented Jan 23, 2020

Uh oh!

TheBlueMatt commented Jan 23, 2020

Uh oh!

jkczyz commented Feb 4, 2020

Uh oh!

julianknutsen commented Sep 10, 2020

Uh oh!

jkczyz commented Sep 10, 2020

Uh oh!

TheBlueMatt commented Jun 23, 2021

Uh oh!

Uh oh!

TheBlueMatt left a comment •

edited

Loading