Skip to content

Remove peers_needing_send set from peer_handling #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Jan 22, 2020

Examining the peer_handling module reveals that the peers_needing_send set either can be replaced by a needing_send field on Peer (to reduce complexity) or removed entirely. The set's documentation is unclear in that it claims that peers_needing_send is used by do_read_event() when a message is pushed onto the send buffer but do_attempt_write_data() is not called to avoid reentrancy.

However, do_attempt_write_data() is called unconditionally following the read loop within do_read_event(). It's possible that do_attempt_write_data() is not called in the case of an early return (e.g., when use of the try_potential_handleerror! macro sends an error message and returns). But if this is the only case, then the members of peers_needing_send would be overly broad.

Furthermore, removing the places where peers_needing_send is populated does not cause any tests to fail. Therefore, this PR is an attempt to understand the intended usage of peers_needing_send, determine if it can be simplified or removed, clarify the documentation, and add any missing tests for the intended behavior.

Comment on lines -623 to +649
peers.peers_needing_send.insert(peer_descriptor.clone());
peer.needing_send = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt Could you explain why this needs to be set here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to stream out our routing db, we (obviously) dont copy the whole thing into our outbound send buff (otherwise we'd just OOM ourselves...), so the logic in do_attempt_write_data has to be hit to add messages to the send buffer if the buffer is empty.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the comments definitely hint at a old world where we were trying to be really hard to avoid reentrancy in the peer_handler, and gave up cause it was only barely practical. AFAICT, its not needed at all anymore.

As you note, peer_handler is currently woefully under-tested, likely the least tested area of the code.

Comment on lines -623 to +649
peers.peers_needing_send.insert(peer_descriptor.clone());
peer.needing_send = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to stream out our routing db, we (obviously) dont copy the whole thing into our outbound send buff (otherwise we'd just OOM ourselves...), so the logic in do_attempt_write_data has to be hit to add messages to the send buffer if the buffer is empty.

@jkczyz
Copy link
Contributor Author

jkczyz commented Jan 22, 2020

AFAICT, its not needed at all anymore.

By "it" do you mean peers_needing_send or the need to avoid reentrancy (i.e., "it" from the previous sentence)? I'm assuming the former but want to make sure I understand.

@TheBlueMatt
Copy link
Collaborator

Right, yea, the whole thing. AFAICT (without digging into git history) I gave up on avoiding reentrancy and just did the Easy Thing, so the whole set seems to be completely useless?

@jkczyz
Copy link
Contributor Author

jkczyz commented Jan 22, 2020

Right, yea, the whole thing. AFAICT (without digging into git history) I gave up on avoiding reentrancy and just did the Easy Thing, so the whole set seems to be completely useless?

At very least removing the code doesn't cause any test failures.

Could you help me understand what sort of reentrancy you were attempting to avoid? I see that #217 removed the call to process_events at the end of do_read_event in order to avoid sending data while the do_read_event call is outstanding. More specifically: How are you defining reentrancy in this context? Could you give me a scenario that you were attempting to avoid but ultimately were unable to avoid?

@TheBlueMatt
Copy link
Collaborator

Specifically, both read_event and write_event may call back into user code (via the Descriptor objects), as indicated in their docs. In general, reentrancy scares me cause, as a user using a library, it can be very hard to get right, and very easy to add hard-to-identify bugs, but there also isn't tooo much that can be obviously done here.

@jkczyz
Copy link
Contributor Author

jkczyz commented Jan 23, 2020

Thanks for the clarification. My understanding then is that the warnings about reentrancy are essentially telling users to avoid calling PeerHandler methods from your SocketDescriptor methods.

And, if I'm reading 294ad32 correctly, the thought was that calls to read_event (through do_read_event) would not be reentrant since the call to process_events (which calls send_data via do_attempt_write_data) was removed. However, this did not have the desired effect since do_read_event still called do_attempt_write_data directly.

What I'm having trouble understanding is how we planned on avoiding reentrancy. Was the thought to remove any transitive calls of send_data from read_event and write_event and only call send_data via process_events?

@TheBlueMatt
Copy link
Collaborator

What I'm having trouble understanding is how we planned on avoiding reentrancy. Was the thought to remove any transitive calls of send_data from read_event and write_event and only call send_data via process_events?

Yea, I think that was the goal, at least for read_event, the whole point of write_event is to call back into send_data (as write_event is a "backpressure complete, buffer space available, please write" event, really), so that would be a bit confusing. I guess we could go both ways with this - either finish the work and remove the reentrancy, which sounds nice when you point out that maybe we're pretty close(?), or move just drop the whole hassle and live with it.

From a client perspective, I could totally see confusion in that you could put a lock in your SocketDescriptor and end up with a deadlock when reading messages trying to lock it just to write something?

Using the peers_needing_send HashSet to determine which peers have any
outstanding writes adds complexity without much benefit. Instead, add a
needing_send field to the Peer struct to accomplish the same thing.
Variables named using the pronoun 'us' are confusing as they are
overloaded for Connection and SocketDescriptor, often inconsistently.
Use 'connection' and 'descriptor' instead.
@jkczyz jkczyz force-pushed the 2020-01-peers-needing-send branch from eb24218 to 7a03fe0 Compare February 4, 2020 00:17
@jkczyz
Copy link
Contributor Author

jkczyz commented Feb 4, 2020

I've been reading through lightning-net-tokio to try to get a better sense of how a user would use PeerManager. It looks pretty complex, though I know you've alluded to the fact that this could be simplified. However, I'd like to explore whether we can provide a simpler interface for the user to implement. I have some ideas but don't have any concrete suggestions just yet.

First, I'd like to get a better sense behind some of the design decisions that went into PeerManager. I see that most of PeerManager's public methods take a SocketDescriptor to look up a Peer to act upon. Additionally, all peers use the same ChannelMessageHandler. Thus, in process_events the node_id_to_descriptor map is needed to look up which peer the event is associated with.

Rather than using PeerManager in this manner, could the user be given a Peer struct (for each peer) with similar methods and where each peer has their own ChannelMessageHandler? The peer itself could also have a reference to its SocketDescriptor. Thus, much of the PeerManager functionality could be pushed to Peer, removing some of the complexity around locking and lookup.

Feel free to discuss with me offline and I can circle back to this PR.

@julianknutsen
Copy link

@jkczyz @TheBlueMatt

Took me a while to find this PR, but I wanted to let you know that I've readded the reentrancy requirement as part of eae0904. I found the bug when doing the unit tests and seeing that the docs still mentioned it shouldn't call send_data(). After my patches, read_event() is reentrant-free and there are plenty of tests to catch it in the future.

I actually like the idea of removing the separate set in favor of a Peer scan with a flag to clean up a lot of the internal functions that pass it around or reference it through self. I can cherry-pick these commits and see how they look on top of my work.

I'll make sure to reference this PR when it makes it out of the initial review on top of #494 so they can be closed out together.

@jkczyz
Copy link
Contributor Author

jkczyz commented Sep 10, 2020

Thanks for the heads up @julianknutsen and for addressing the reentrancy. As noted in the PR description, I was primarily trying to understand the peer handling code. Feel free to grab 9f875f4 for your work.

julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 10, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 14, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 15, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 15, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 15, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 16, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 18, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 18, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Sep 24, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Oct 2, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
julianknutsen added a commit to julianknutsen/rust-lightning that referenced this pull request Oct 2, 2020
Motivated by lightningdevkit#456, remove the
peers_needing_send set in favor of just scanning the peers during
process_events() and attempting to send data for those peers that have items in
their outbound queue or pending sync items to be sent out.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 10, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 10, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 16, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 17, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 18, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 21, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jun 21, 2021
`Julian Knutsen <[email protected]>` pointed
out in a previous discussion that `read_event` can reenter user
code despite the documentation stating explicitly that it will not.

This was addressed in lightningdevkit#456 by simply codifying the reentrancy, but
its somewhat simpler to just drop the `do_attempt_write_data` call.

Ideally we could land most of Julian's work, but its still in need
of substantial git history cleanup to get it in a reviewable state
and this solves the immediate issue.
@TheBlueMatt
Copy link
Collaborator

Done in #948.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants