-
Notifications
You must be signed in to change notification settings - Fork 37.3k
Bugfix: don't make collision from "tried" a feeler #19906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: don't make collision from "tried" a feeler #19906
Conversation
If we're going to connect to a feeler (as decided by a timer), we should not "waste" this iteration on connecting to a collision. The collisions belong to the "tried" table, so connecting to them don't accomplish the goal of feelers.
Can you explain a bit more what this PR is trying to accomplish? Is this just about naming conventions, or something else? |
@sdaftuar does the first commit message not sound clear?
We are supposed to attempt a feeler "transition" every 2 minutes. But instead, when a timer hits, we might connect to a tried address (from tried collisions), instead of a feeler. And the next feeler will be attempted after 2 more minutes. I don't think it's exploitable: an attacker can't force us to do this all the time. It's just currently logically incorrect. |
Got it, thanks for clarifying -- I didn't realize that the concern was around the frequency that we pull things out of the new table to potentially move them to the tried table. I think the underlying question here is how often we want to be making outbound connections to the network (this is a question I was thinking about in the context of #19858 as well). Both uses of FEELER connections today (moving things from new to tried, as well as the test-before-evict logic of resolving tried-table collisions) have the same property that their connections last a very short time. So I think that we could ask ourselves, given a certain rate at which we're comfortable making new outbound connections, how many of those should go to the test-before-evict logic, and how many to the logic that is pulling entries from the new-table? In practice, it seems to me that tried-table collisions are relatively infrequent, probably less than a few times per day on nodes I've looked at, if I remember right. So the effect of this on the new-table FEELER logic is low. On the other hand, if it were to have a big effect somehow, then changing the logic so that we still make feeler connections to new-table peers at the once-every-2-minute rate will mean we're connecting to the network much more (assuming that we always make the test-before-evict connections when we need to). So my inclination is that we don't need this change? Or if we do want it, we should just put a rate limit in place so that we can reason about what bounds are achievable in the event of tried-table-collisions. BTW, the logic presented in this PR is incorrect, I think, because it only tries to resolve tried-table-collisions when we'd make a non-FEELER connection. However, once we have all our outbound peers in place, we never try to make new non-FEELER connections (well, at least until #19858), so I think this basically breaks the test-before-evict logic as drafted now. |
// It's much less relevant for feelers, because it's a frequent procedure | ||
// to sanitize AddrMan (not critical), and selecting from AddrMan is | ||
// sufficiently (for this case) random/difficult-to-Sybil due to the bucketing. | ||
if (conn_type != ConnectionType::FEELER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's rare that we get to this line of code -- on startup, when we have not enough outbound peers, we'll hit this. But once we have made our initial 10 outbound connections, then at line 1916 above we'll abort the loop iteration, never getting to this code.
So what would happen is that our FEELERS generate tried-table-collisions which never get resolved, eventually resulting in the old tried table entry getting evicted without an attempt having been made, breaking test-before-evict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, making the test-before-evict connections no longer be FEELER breaks the logic in net_processing
around disconnecting these peers right when we connect.
addr = addrman.SelectTriedCollision(); | ||
if (!addr.IsValid()) { | ||
// If no collisions, consider an address from the "tried" table. | ||
addr = addrman.Select(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think true
means pull from the new table, not the tried table, so this line and the one below at line 1946 are backwards.
We currently have the following documentation:
It's not accurate since test-before-evict is not about new->tried, and it's not mentioned here. I'm pretty sure it may be inaccurate in other places as well, I didn't even know that this is done via feelers before you told me so here. Considering what's above, this PR attempted (not so well yet) to separate these two uses, and make feelers do new->tried exclusively. Now that I know that this other case was actually intended (at least by the author), there are 2 paths we may take:
@sdaftuar what do you think? Perhaps I'll sketch what would the latter option look like in the code later. |
I agree that it's confusing that the term "feeler" has encompassed both behaviors (perhaps @EthanHeilman has a view as well?) In Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, the "feeler" and "test-before-evict" countermeasures do not seem to use overlapping terminology, so my suggestion is that our code be clearer about mapping to those countermeasure names as well. My suggestion would be that we rename I think we could also rename the feeler timer to be the "addrman-helper-timer", and differentiate the two behaviors in code comments better, if that would make the logic clearer. I don't think it makes sense to add a second conn_type that has exactly the same properties as the first, just with a different name. Perhaps if we thought those behaviors might diverge in the future, it might make sense; but I don't anticipate that myself. |
closing in favour of #19958 |
The purpose of feelers is to move addrs from "new" to "tried."
We should not select a node evicted from collisions, as they are already "tried".