Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer connection drop #2726

Open
hashmap opened this issue Apr 3, 2019 · 22 comments
Open

Peer connection drop #2726

hashmap opened this issue Apr 3, 2019 · 22 comments
Labels

Comments

@hashmap
Copy link
Contributor

hashmap commented Apr 3, 2019

It's not an issue per se, it was a design decision, however it may be worth to discuss one more time.

Grin node periodically kills the excess of connections https://github.com/mimblewimble/grin/blob/master/p2p/src/peers.rs#L463 which is understandable but annoying for counterparts. From my experience a node connects and almost immediately get dropped. There are couple ideas how to improve it:

Do we have some other ideas?

@JeremyRubin
Copy link
Contributor

@EthanHeilman is probably a good person to look at this.

@DavidBurkett
Copy link
Contributor

Yes, it'd be great to get an expert opinion on this. My thoughts are:

  • FIFO dropping could allow someone with access to a few dozen IPs to fill all of the node's connections.
  • More practically, FIFO dropping is annoying because you don't have any long-term, reliable peers.

I put some thought into it, and the best approach I can come up with to solve our current peer health problems, without sacrificing network security, would be to:

  1. Have separate incoming and outgoing connection quotas.
  2. If a node has filled its quota of incoming connections, any new connection requests should be responded to with a successful handshake (optional), a message with a list of peers (crucial on seed nodes for new peers who are trying to bootstrap), and then immediately dropped. The only exception would be if the peer has a higher chain height/difficulty, we would want to allow them to connect to make sure we're not being eclipse and/or sybil attacked.
  3. To combat the remote possibility of you and your peers becoming isolated from the main network, occasionally drop a random incoming connection or two when your incoming quota is full. Do the same with your outgoing connection pool, and immediately try to connect to another peer (preferably from a different subnet and/or geographical region).

Obviously, this doesn't have to all be implemented at once, but I'd prefer to work toward a system similar to this. I'd love to hear everyone's thoughts on this.

@garyyu
Copy link
Contributor

garyyu commented Apr 4, 2019

@hashmap

From my experience a node connects and almost immediately get dropped.

When dropping a connection because of excessing, does/could the remote peer try connecting again and again?
I suppose yes.
Perhaps we should duplicate the excess logic on the connection request part, instead of current accept / drop / accept / drop / ...

Send list of peers before dropping a peer #2725 (however some peers don't need it, is it ok to spam?)

IIRC, Peer request (send_peer_request(p2p::Capabilities::PEER_LIST)) should only happen when it's needed.

How about enhance the "Ping / Pong" ( or "Hand / Shake" ? I don't remember well) message? and clearly present the "Peer Request" on the "Hand" (for example) message if needed, means if the remote peer is full of connections, it should at least send back a peer list before refusing/dropping the connection request.

FIFO dropping - #2724 to support it (could be gameable)

Does that mean the new connections/nodes will be more unstable? ( new connection has bigger chance to be dropped)

@hashmap
Copy link
Contributor Author

hashmap commented Apr 4, 2019

@garyyu

does/could the remote peer try connecting again and again?

Only if it doesn't know any other peers, we have a queue of peers to connect, so we proceed with the next in this queue. So I don't think we should address it connection request code.

IIRC, Peer request (send_peer_request(p2p::Capabilities::PEER_LIST)) should only happen when it's needed.

Yeah, here is a problem that we don't know anything about this peer and this peer may not have time to request a peer list

How about enhance the "Ping / Pong" ( or "Hand / Shake" ? I don't remember well) message?

I'm not sure, not a big fan of a such design

if the remote peer is full of connections, it should at least send back a peer list before refusing/dropping the connection request.

that's what #2725 introduces

Does that mean the new connections/nodes will be more unstable? ( new connection has bigger chance to be dropped)

Vice versa, older connections would be dropped first (first in - first out)

@DavidBurkett
Copy link
Contributor

DavidBurkett commented Apr 4, 2019

Vice versa, older connections would be dropped first (first in - first out)

Just want to reiterate that FIFO doesn't solve any of the issues that we're having right now.

@antiochp
Copy link
Member

antiochp commented Apr 4, 2019

FIFO would work against any kind of network stability right? We'd drop older long-lived connection over newer connections.

@antiochp
Copy link
Member

antiochp commented Apr 4, 2019

Ideally we'd want (I think) a mix of old and new connections.
And seeds that "just seeded".

But we went with the simplest approach - all nodes are equal, all connections are equal.

Ideally we'd keep this as simple as possible (and not too simple) without introducing too much differentiation/precedence/complexity, while at the same time maintaining a healthy network.

@DavidBurkett
Copy link
Contributor

@antiochp Yes, FIFO would make the least stable network. I think my proposal[1] handles most of those concerns, while providing a very stable network (minimal peer dropping). Do you disagree?

[1] #2726 (comment)

@garyyu
Copy link
Contributor

garyyu commented Apr 4, 2019

Vice versa, older connections would be dropped first (first in - first out)

Long term connection peers (normally that means more stable peer) will be dropped first? always favor new connections?
I think current random dropping is much more fair :-) and simple.

@0xmichalis
Copy link
Contributor

0xmichalis commented Apr 4, 2019

My node log is full of the following errors and I wanted to confirm it's related to what is being discussed in this issue. If not, I can open a separate one.

https://pastebin.com/31q5426E

@DavidBurkett
Copy link
Contributor

@Kargakis Yes, that's the same issue.

@mcdallas
Copy link
Contributor

mcdallas commented Apr 4, 2019

Do we have some other ideas?

In Bitcoin you keep a banscore for each of your peers which you increase if they do stuff like:

  • Sending invalid blocks
  • Sending addr messages with more than 1000 addresses.

etc. If the peer reaches a threshold you ban him. When you need to evict a node (i.e when you are over capacity and receive a connection) you evict the peer with the highest banscore first

@DavidBurkett
Copy link
Contributor

@mcdallas I like that idea for long-term, but something like that would require a lot of work and even more testing. If done incorrectly, we could end up kicking slow peers off the network, or peers with older versions, etc.

@JeremyRubin
Copy link
Contributor

The banscore stuff in Bitcoin sucks and isn't actually really used -- if Bitcoin Core could get rid of it easily, we would. In general it's a binary thing at this point, modulo one or two use cases. Cases where banscore was incremented are being removed (e.g., no banning for any PoW valid block served). I don't think we want to emulate that here.

What is good (besides confusing terminology) is the notion of outbound vs inbound peers. I think thinking about this in terms of stable marriage terminology helps more: there are nodes that are suitors and there are nodes that you would like to be a suitor too (the node being suited can be called the 'wooed').

You place much lower value on your suitors than on the nodes that you want to woo. You should also be suiting those nodes, irrespective of the number of suitors you have. Otherwise it's easy to sybil your node.

The nodes that your suiting should either be ones that you configure, ones you've been connected to for a long time without fault, or that have provided you with high value information frequently (e.g., they propagate blocks to you you hadn't seen). Thus you can upgrade a suitor to a 'match' and add them to your woo-list if you're getting high quality information from them.

From your woo-list, you should select a fixed number (maybe let's say 8) to peer with. You can rotate these around. If a node on your woo-list connects to you as a suitor, you will let them stay on.

From your suitor list, you can treat that as a FIFO or whatever. You can also limit the bandwidth from those nodes more heavily as they are less important to you. Perhaps put heavier delays on propagating them blocks or transactions, or don't consider them for dandelion relay at all (probably a good idea!).

Treating all nodes equally sounds nice -- but in practice it makes the network easy to attack. Known good peers are very valuable! Something like the above preserves that structure.

I also don't think we should expose the peer list necessarily. Maybe expose your suitors but not your woo-list. It seems a bit sensitive given that this helps someone map the network.

@EthanHeilman should eventually chime in here (I'll ping him) about addressing Eclipse attacks under this -- pinging random IP addresses helps iirc.

@0xmichalis
Copy link
Contributor

  • Have separate incoming and outgoing connection quotas.

  • If a node has filled its quota of incoming connections, any new connection requests should be responded to with a successful handshake (optional), a message with a list of peers (crucial on seed nodes for new peers who are trying to bootstrap), and then immediately dropped. The only exception would be if the peer has a higher chain height/difficulty, we would want to allow them to connect to make sure we're not being eclipse and/or sybil attacked.

  • To combat the remote possibility of you and your peers becoming isolated from the main network, occasionally drop a random incoming connection or two when your incoming quota is full. Do the same with your outgoing connection pool, and immediately try to connect to another peer (preferably from a different subnet and/or geographical region).

@DavidBurkett do you think it's necessary for non-seeder nodes to share their peer list? Maybe share a part of it? Or ask an existing peer's list and send that back to the original request instead? Overall, I find your proposal reasonable. 👍

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 6, 2019

Perhaps put heavier delays on propagating them blocks or transactions, or don't consider them for dandelion relay at all (probably a good idea!).

Yes, this is in line with the recommendations of the Dandelion++ paper. Section 4.3 explicitly touches on this topic and outlines an attack where an adversary is lying about how many outbound edges they create and could even connect to all honest nodes on the network this way. They mitigate against this by forcing a node Alice to only forward transactions via its outbound edges, ie to nodes Alice has chosen, rather than to nodes that have chosen Alice:

Creating many edges. Dandelion is naturally robust to nodes that create a disproportionate number of edges, because spies can only create outbound edges to honest nodes. This matters because in the stem phase, honest nodes only forward messages on outbound edges.

Inbound edges, i.e. from suitors
Outbound edges, i.e. to the woed

@ignopeverell
Copy link
Contributor

We already keep track of inbound and outbound peers and have a fixed preferred quota of outbounds. This seems like a good thing. Beyond this, I think we're going to have a hard time finding a solution that's not gameable and helps significantly.

Question is: are we trying to bend over backward to compensate for a lack of open ports on the network?

@hashmap
Copy link
Contributor Author

hashmap commented Apr 8, 2019

@ignopeverell lack of open slot could explain that but 2 facts are worth further check. I rarely see peers connected for more than 30 secs. There is a trend here (peeks are restarts)

https://grin.report/d/jeTROiqmz/peers?orgId=1&fullscreen&panelId=2&from=now-30d&to=now&refresh=1m

@EthanHeilman
Copy link

Thanks @JeremyRubin for letting me know about this thread.

I am ignorant of how Grin currently manages network connections, but I can provide a high level explanation of how Bitcoin does it which may be helpful. In Bitcoin connections are divided into two groups: incoming connections and outgoing connections.

Currently the default configuration for a Bitcoin node is:

  1. 125 connections total i.e. incoming connections + outgoing connections <=125
  2. At most 9 outgoing connections, however one of these connections is reserved for special tasks so for all intents and purposes 8 outgoing connections
  3. At most 116 incoming connections

The reason that there are so many more slots reserved for incoming connections is that not every client can accept incoming connections (clients behind a NAT, firewall etc).

We wrote a paper examining Bitcoin's P2P network and how gamable it was in 2015-2016. This work includes a range of countermeasures to make cryptocurrency P2P networks more robust. I have a project page with a link to paper and the PR's we merged into Bitcoin to strengthen BItcoin against these attacks.

I'm happy to discuss this further if anyone is interested but it would be very helpful to have a short summary of how Grin currently manages its P2P network.

@ignopeverell
Copy link
Contributor

Thank you very much for passing by @EthanHeilman. Currently Grin does the following:

  1. 25 connections max total. This appears to be much too low (would you concur?) and currently can make connectivity difficult at times, for the reasons you outlined around availability of incoming connections.
  2. At least 8 peers total...
  3. ...except if we have less than 4 outgoing connections, in which case we try to establish more to get to 4.

Ii looks like our main gap when compared to bitcoin is the low number of connections we allow overall, and the absence of restrictions on outgoing connections.

@EthanHeilman
Copy link

Thanks for providing some insight into how Grin works.

25 connections max total. This appears to be much too low (would you concur?)

Personally I think 25 connections max is probably not enough, although I believe ETH does get by with 25 connections. Increasing this number will probably get you some breathing room with regard to incoming connection slot exhaustion.

4 outgoing connections, in which case we try to establish more to get to 4.

Bitcoin always tries to use all outgoing connection slots so BTC will always attempt to make an outgoing connection until they have 8 outgoing connections. I'm not sure what the magic number here is although 8 seems to work.

Increasing outgoing connections Pros:

  1. The more outgoing connections the more connected the network, the lower the probability the network partitions itself by accident, this also leads to faster propagation because fewer hops between nodes.
  2. The more outgoing connections the harder an eclipse attack becomes because an eclipse attacker needs to monopolize all the outgoing connections.

Increasing outgoing connections Cons:

  1. Uses more resources, takes up more incoming connection slots.

I'm not sure it is still the case but blockchain surveillance/analysis companies used to attempt to connect to every node which allowed incoming connection slots in Bitcoin for the purposes of learning when a transaction was broadcasted and who broadcasted it. This doesn't work so well as a surveillance method in modern Bitcoin, so I'm not sure if they still do it. This behavior takes up 1 incoming connection slot on each node which if the number of incoming connection slots were low would be very problematic.

@ignopeverell How does Grin do peer seeding? Are you using DNS seeders?

@lehnberg
Copy link
Collaborator

See the Erlay paper for additional discussion on how Bitcoin network connectivity works today: https://arxiv.org/abs/1905.10518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants