-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve checking for p2p connection limits #2985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a thought about this - our seed nodes currently accept connection requests and send "peer lists" back to the recently connected peers. Seed nodes will eventually notice they have too many peers and will randomly pick some to shutdown. I don't think we look at the age of the connection when picking some to close.
This change will alter this behavior - potentially making it harder for new peers joining the network to obtain peer lists from the initial seeds. If a seed is at capacity then it will simply close any new connection before it has chance to send any msgs to the new peer.
Edit: What if checked against peer+max_count + 10%
or some other softer limit?
My initial thinking is that we want to consume as few resources as possible until we know the new connection is desirable (not from a banned peer, not from a known peer and not if it would exceed our maximum configured connection limit). Otherwise it would allow "undesirable" connections to consume more resources than necessary (like sending peer lists), even after a ban. I don't think it makes sense to send a peer list to a banned peer for example? |
That's fine, we can check if they're banned before sending a peer list. But if we're disconnecting simply because we have too many incoming connections, we could have a real bootstrapping problem if we don't provide alternative peers. |
Agreed it would potentially impact bootstrapping ability with seed nodes in current form. Do you see this remaining an issue if the seed nodes don't maintain the proposed rule of shutting down new incoming connections while connection limit is reached without sending a peer list? Are there cases where non-seed nodes with default connection limits following this rule would likely impact bootstrapping? |
Seed nodes would probably be enough, but what constitutes a seed node? We don't currently have a config setting for that, so we would have to add one. |
After some more thought and further consideration of #2726: Goals:
Changes:
I'm updating the PR to [WIP] as this probably deserves more thought and testing, especially around limiting outgoing connections to 8. The changes above may be too much and can be rolled back in favor of simply having the check in the listener with a configurable 8 peer buffer. |
I actually like what is being proposed here. Making a more explicit distinction between incoming and outgoing connection handling makes a lot of sense. They are different and probably should be handled differently. All nodes have outbound connections (I guess technically some nodes may not have any but that would be unusual). Only "public" nodes have inbound connections.
My only comment is this may be overkill short-term. We may want to tackle this is two stages -
Edit: Ahh I thought you were just talking about future changes. I did not realize you had actually made them... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will test it. I like the clear separation of inbound and outbound connections, it was suggested recently because bitcoin uses the same approach
#peer_max_inbound_count = 117 | ||
|
||
#maximum number of outbound peer connections | ||
#peer_max_outbound_count = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
config/src/comments.rs
Outdated
#maximum number of peers | ||
#peer_max_count = 125 | ||
#maximum number of inbound peer connections | ||
#peer_max_inbound_count = 117 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just leave this as 125 or maybe even 128 for a nice round number.
It was arbitrary to begin with, definitely don't need to tweak them to add up to 125 in total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased the limit to 128 here.
p2p/src/types.rs
Outdated
const PEER_MAX_INBOUND_COUNT: u32 = 117; | ||
|
||
/// The max outbound peer count | ||
const PEER_MAX_OUTBOUND_COUNT: u32 = 8; | ||
|
||
/// min preferred peer count | ||
const PEER_MIN_PREFERRED_COUNT: u32 = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one be inbound/outbound specific as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed PEER_MIN_PREFERRED_COUNT
in favor of PEER_MIN_PREFERRED_OUTBOUND_COUNT
p2p/src/types.rs
Outdated
@@ -268,11 +287,24 @@ impl P2PConfig { | |||
} | |||
} | |||
|
|||
/// return peer_max_count | |||
/// return total maximum peer count (incoming + outgoing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this total max count anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was still used when broadcasting compact blocks and transactions for Peers
but should be able to be removed in favor of making incoming and outgoing connections more explicit. Please see the unresolved question posted here #2985 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. 👍
Changes in 7bfb54e:
|
We definitely don't want to just broadcast to outgoing peers. Non-public nodes would never receive new blocks or transactions. |
That's where I ended up. I wasn't certain if nodes would request those on their own from peers or not (even if so the propagation delay could be ugly). It sounds like we definitely want to broadcast to all connected peers. |
I wouldn't say definitely broadcast to all. Figuring out the optimal number is more a graph theory problem. @tromp could probably tell us how to figure that out. We'd probably have to take into account expected percentage of public nodes, average number of incoming and outgoing peers per public node, and average number of outgoing peers per private node. |
Agreed in the long term that a more complex mechanism will get us a more desirable result. My goal for this PR for now is to add a basic limit to incoming p2p connections by distinguishing incoming and outgoing connections without changing too much. I think as is, the broadcasting should behave the same as before, but the question in general could probably use more thought. |
Yeah I was actually wondering (briefly) about just broadcasting to outbound but then realized the same thing - some nodes only have outbound connections and they would never receive anything. We should not distinguish between inbound/outbound when broadcasting (at least in the short-term) and just broadcast to max peers. |
👍 |
#until we get to at least this number | ||
#peer_min_preferred_count = 8 | ||
#maximum number of outbound peer connections | ||
#peer_max_outbound_count = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this? If this is greater than peer_min_preferred_outbound_count, do we continue to connect to more outbound peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peer_max_outbound_count
lets us put a configurable limit on how many outgoing connections with peers we can have and by extension all connections. We continue to try to connect to more outbound peers until peer_min_preferred_outbound_count
is reached (a "healthy peers mix"). By default these values are the same (we try to maintain connections on all available outgoing slots)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But let's say you have peer_min_preferred_outbound_count
of 8, and peer_max_outbound_count
of 16, how would you ever end up with more than 8 outbound connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not after future refactoring to better clean up distinguishing incoming and outgoing connections. For now it happens when we broadcast compact blocks for example. There are probably still cases where outgoing connections are opened for one reason or another, exceeding the peer_max_outbound_count
at least until clean_peers()
is run to clean up any potential excess connections. In this case if peer_max_outbound_count
was set to 16, I think we would hang on to those connections instead of dropping them when cleaning up peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😕 You're saying we open outbound connections when broadcasting compact blocks? Where in the code does that take place, and what for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a closer look new outbound connections are not open when broadcasting, my mistake. During testing there were cases where logic allowed opening new connections beyond the outgoing limit which got closed shortly after during clean_peers()
. You can set some test limits and watch logs to see the outgoing connections be temporarily exceeded. Hopefully as we better define incoming and outgoing connections in future PRs these cases can all be thoughtfully handled (as in the broadcast scenarios)
@@ -300,11 +303,14 @@ fn listen_for_addrs( | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the peers.peer_count() > peers.peer_outbound_count()
check on line 302, since healthy_peers_mix()
covers it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. They don't quite check the same thing but I think we only want the healthy_peers_mix()
check here as you said. I'm pretty sure we don't care to check here that the total connected peers exceeds the outbound connected peers. This would require (I think) that we have inbound connections, and I'm pretty sure we want to return
here any time our outbound connections >= our configured preferred minimum outbound connections ("a healthy number of outbound peers").
Removed in fb9b06e unless there is a good reason to keep this as is.
p2p/src/peers.rs
Outdated
pub fn enough_peers(&self) -> bool { | ||
self.peer_count() >= self.config.peer_min_preferred_count() | ||
} | ||
|
||
/// We have enough peers, both total connected and outbound connected | ||
pub fn healthy_peers_mix(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise changing the name of this now, since "healthy_peers_mix" doesn't really represent what it's checking. Possible options are something like meets_preferred_outbound
, or preferably reverse the logic and call it needs_outbound_connections
. The logic for managing connections was already built piecemeal through a series of defect corrections, so it's probably best to try to keep it from getting anymore complicated/confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming here wasn't immediately intuitive to me either. My understanding is that it is a result of needing to maintain a healthy mix of outbound connections for dandelion to allow each peer to select a random outbound relay to send txs along the stem. Without ensuring a "healthy peers mix" apparently some nodes were not having enough slots for outbound connections.
With that context it makes a little more sense to me- a healthy peers mix is one
that has at least the minimum number of preferred outbound connections. So while I agree it doesn't strictly represent what is checked (outbound connections >= configured min preferred outbound connections), I think for now it does an ok job of capturing what we are checking for (that we have a healthy peers mix of outgoing connections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I agree that's where the name came from, but you've now just changed the functionality, so my claim is the name no longer makes sense. Before, it used to compare inbound and outbound count so it made sense, but it's no longer about having a healthy mix, and is now just about making sure we have a minimum number of outbound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it could make sense to use a more descriptive name about functionality rather than a prescriptive name about health. Updated in d06a8e1
It wouldn't be a more complex mechanism, I don't think. It's a complex formula to figure out by hand what the optimal number is, but then it's just a matter of broadcasting to n random peers. I'm ok with pushing this off though, just know that our network will be a whole lot noisier than it needs to be. |
I spoke with @tromp about this, and he found: https://www.quora.com/How-many-peers-is-a-Bitcoin-node-connected-to-at-all-times |
I think I originally introduced the "broadcast to 8 of our peers regardless of total peer connections" logic and only for headers. We never did this for transactions as far as I'm aware. Currently we implement the following -
But I think there is a lot of misinformation and misunderstanding out there on the internet regarding this. And some of the confusion comes from the inbound/outbound terminology. This documentation implies nodes send to all their peers for block relay - https://bitcoin.org/en/p2p-network-guide#term-unsolicited-block-push And this documentation suggests transactions are sent to all peer as well - https://en.bitcoin.it/wiki/Network#Standard_relaying
I'm not entirely sure which what the correct interpretation of bitcoin behavior is, but I'm less convinced that some magic 8 peers is used anywhere (even if we did this for header first broadcast, which I think may be wrong behavior). |
@antiochp Ok, I went to the actual source code, since documentation is inconsistent. Here's what I found. For compact blocks, looks like bitcoin actually only relays to up to 3 peers, but the whole system is different than ours (btc nodes request to receive compact blocks from specific peers). https://github.com/bitcoin/bips/blob/master/bip-0152.mediawiki and https://github.com/TheBlueMatt/bitcoin/blob/48efec82f3a18364c019577495914fcebb425e35/src/main.cpp#L470
For transactions, looks like it announces the availability of txs to all peers (similar to our tx kernel hash message): https://github.com/bitcoin/bitcoin/blob/67be6d7a177cb14ca8a68b9c6361dfaae9e64f2b/src/net_processing.cpp#L1311 It looks like it's the same for headers as well, but I could be reading the code wrong. One difference we should note is bitcoin keeps track of each peer's "Inventory", so it doesn't send transactions and headers to nodes that are known to already have them. So based on all of this, my recommendation is:
|
We do something similar at the peer level, tracking hashes of blocks and transactions that have passed through that particular peer connection. It is implemented in the |
I had no idea. Ok, seems like we're good then. |
Made a few cleanups based on feedback and removed the [WIP] tag. I think if we want to make more substantial changes beyond here we may want to revert back to the listener check and buffer only for this PR and move the incoming and outgoing connection changes to a separate PR. |
@j01tz This is good to go right? I'll merge if so. |
Yes I think so, although haven't heard from @hashmap since he started testing it |
Works on my machine! |
Just merged this. Can someone else see if it works for them (from a fresh node, no local peers db)? I'm wondering if this is an issue with current seeds not accepting connections currently (and not directly related to this PR). I see the following in my logs -
|
Current So it looks like there is an issue here with this PR merged to master where we are not successfully getting peer connections after hitting the initial DNS seed list. If someone else can reproduce we should investigate and fix (still not entirely convinced this is not a temporary issue with our seeds). |
Going to hold off opening an issue for this until we can verify this is actually a problem. |
This reverts commit 24f0a52.
@antiochp has anything changed with the seeds? I've been testing this for a few days and only started seeing this issue today. Something is definitely wrong though, I can reproduce the same conditions you are seeing now. |
can confirm that building with this PR included had me connecting to 0 peers, |
Hm, this PR still works for me, got peers (I reverted the revert), but much slower than usual, with 10-20 secs delay. It feels like we are slower in opening outbound connections, perhaps waiting for some delay. |
See #2993 for proposed fix and explanation (we are slower in opening outbound connections). |
Connection limiting can be improved by adding a check against maximum allowed peer count immediately after a connection is accepted.