Skip to content

Bugfix: don't make collision from "tried" a feeler #19906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

naumenkogs
Copy link
Member

The purpose of feelers is to move addrs from "new" to "tried."
We should not select a node evicted from collisions, as they are already "tried".

If we're going to connect to a feeler (as decided by a timer),
we should not "waste" this iteration on connecting to a collision.
The collisions belong to the "tried" table, so connecting to them
don't accomplish the goal of feelers.
@fanquake fanquake added the P2P label Sep 7, 2020
@naumenkogs naumenkogs closed this Sep 7, 2020
@naumenkogs naumenkogs reopened this Sep 7, 2020
@sdaftuar
Copy link
Member

Can you explain a bit more what this PR is trying to accomplish? Is this just about naming conventions, or something else?

@naumenkogs
Copy link
Member Author

@sdaftuar does the first commit message not sound clear?

If we're going to connect to a feeler (as decided by a timer),
we should not "waste" this iteration on connecting to a collision.
The collisions belong to the "tried" table, so connecting to them
don't accomplish the goal of feelers.

We are supposed to attempt a feeler "transition" every 2 minutes. But instead, when a timer hits, we might connect to a tried address (from tried collisions), instead of a feeler. And the next feeler will be attempted after 2 more minutes.

I don't think it's exploitable: an attacker can't force us to do this all the time. It's just currently logically incorrect.

@sdaftuar
Copy link
Member

Got it, thanks for clarifying -- I didn't realize that the concern was around the frequency that we pull things out of the new table to potentially move them to the tried table.

I think the underlying question here is how often we want to be making outbound connections to the network (this is a question I was thinking about in the context of #19858 as well). Both uses of FEELER connections today (moving things from new to tried, as well as the test-before-evict logic of resolving tried-table collisions) have the same property that their connections last a very short time. So I think that we could ask ourselves, given a certain rate at which we're comfortable making new outbound connections, how many of those should go to the test-before-evict logic, and how many to the logic that is pulling entries from the new-table?

In practice, it seems to me that tried-table collisions are relatively infrequent, probably less than a few times per day on nodes I've looked at, if I remember right. So the effect of this on the new-table FEELER logic is low. On the other hand, if it were to have a big effect somehow, then changing the logic so that we still make feeler connections to new-table peers at the once-every-2-minute rate will mean we're connecting to the network much more (assuming that we always make the test-before-evict connections when we need to).

So my inclination is that we don't need this change? Or if we do want it, we should just put a rate limit in place so that we can reason about what bounds are achievable in the event of tried-table-collisions.

BTW, the logic presented in this PR is incorrect, I think, because it only tries to resolve tried-table-collisions when we'd make a non-FEELER connection. However, once we have all our outbound peers in place, we never try to make new non-FEELER connections (well, at least until #19858), so I think this basically breaks the test-before-evict logic as drafted now.

// It's much less relevant for feelers, because it's a frequent procedure
// to sanitize AddrMan (not critical), and selecting from AddrMan is
// sufficiently (for this case) random/difficult-to-Sybil due to the bucketing.
if (conn_type != ConnectionType::FEELER) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rare that we get to this line of code -- on startup, when we have not enough outbound peers, we'll hit this. But once we have made our initial 10 outbound connections, then at line 1916 above we'll abort the loop iteration, never getting to this code.

So what would happen is that our FEELERS generate tried-table-collisions which never get resolved, eventually resulting in the old tried table entry getting evicted without an attempt having been made, breaking test-before-evict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, making the test-before-evict connections no longer be FEELER breaks the logic in net_processing around disconnecting these peers right when we connect.

addr = addrman.SelectTriedCollision();
if (!addr.IsValid()) {
// If no collisions, consider an address from the "tried" table.
addr = addrman.Select(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think true means pull from the new table, not the tried table, so this line and the one below at line 1946 are backwards.

@naumenkogs
Copy link
Member Author

naumenkogs commented Sep 14, 2020

We currently have the following documentation:

    /**
     * Feeler connections are short lived connections used to increase the
     * number of connectable addresses in our AddrMan. Approximately every
     * FEELER_INTERVAL, we attempt to connect to a random address from the new
     * table. If successful, we add it to the tried table.
     */
    FEELER,

It's not accurate since test-before-evict is not about new->tried, and it's not mentioned here. I'm pretty sure it may be inaccurate in other places as well, I didn't even know that this is done via feelers before you told me so here.

Considering what's above, this PR attempted (not so well yet) to separate these two uses, and make feelers do new->tried exclusively.

Now that I know that this other case was actually intended (at least by the author), there are 2 paths we may take:

@sdaftuar what do you think? Perhaps I'll sketch what would the latter option look like in the code later.

@sdaftuar
Copy link
Member

Now that I know that this other case was actually intended (at least by the author), there are 2 paths we may take:

I agree that it's confusing that the term "feeler" has encompassed both behaviors (perhaps @EthanHeilman has a view as well?) In Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, the "feeler" and "test-before-evict" countermeasures do not seem to use overlapping terminology, so my suggestion is that our code be clearer about mapping to those countermeasure names as well.

My suggestion would be that we rename ConnectionType::FEELER to something like ConnectionType::ADDRMAN_HELPER or ConnectionType::TEST_CONNECT or something to indicate either that the connection is short-lived, or just for the addrman, or both. And then document in our code that we use this brief-addrman-helper conn_type for both the feeler and test-before-evict peer selection logic.

I think we could also rename the feeler timer to be the "addrman-helper-timer", and differentiate the two behaviors in code comments better, if that would make the logic clearer.

I don't think it makes sense to add a second conn_type that has exactly the same properties as the first, just with a different name. Perhaps if we thought those behaviors might diverge in the future, it might make sense; but I don't anticipate that myself.

@naumenkogs
Copy link
Member Author

closing in favour of #19958

@naumenkogs naumenkogs closed this Sep 15, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants