Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip connecting PeerWithSelf by checking ip address #2253

Merged
merged 4 commits into from
Dec 30, 2018
Merged

Skip connecting PeerWithSelf by checking ip address #2253

merged 4 commits into from
Dec 30, 2018

Conversation

garyyu
Copy link
Contributor

@garyyu garyyu commented Dec 29, 2018

My server can't be connected any more but the total connected peers is far below the configured peer_max_count, less than 30 peers, check by netstat:

$ netstat -tlna | grep -c 13414
235

in 2nd server:
$ netstat -tlna | grep -c 13414
213

in 3rd server:
$ netstat -tlna | grep 13414 | grep -c ESTABLISHED
200

$ netstat -tlna | grep 13414 | grep -c CLOSE_WAIT
40

My grin-server config:
peer_max_count = 256

And check detail on one server, I find 160 TCP connections by itself. I don't know how but it do connect to itself and keep these connections.

This PR give an improvement on current PeerWithSelf detection solution (by nonce which store 100 recent nonces it generated/sent in Hand).

And add one more step for PeerWithSelf: shutdown the TcpStream.

[Updated]
I find it's more clean to skip self connecting (on requester side), compared to double check nonce and ip address (on listener side).

Copy link
Contributor

@hashmap hashmap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need the second check, perhaps in some cases we don’t handle the error during handshake?

@ignopeverell
Copy link
Contributor

@garyyu and did that PR improve your connection count? It seems odd you'd have to add another test that's based on the first. Perhaps also make sure the PeerWithSelf error results in a clean disconnect?

@garyyu
Copy link
Contributor Author

garyyu commented Dec 29, 2018

@hashmap @ignopeverell thanks for the review.

did that PR improve your connection count?

More test will be done today. One node tested yesterday but it didn't happen after a restart (no much PeerWithSelf as before).

It seems odd you'd have to add another test that's based on the first.

Also odd to me. I can't imagine what exactly make this happen, so the only way is to do a real test to check.
Even after 12 hours later (over night), one of my node still have 237 TCP connections on port 13414 but only 31 peers.

$ netstat -tlna | grep -c 13414
237

$ curl -0 -XGET  -u grin:`cat ~/.grin/.api_secret` http://127.0.0.1:13413/v1/peers/connected 2>/dev/null | grep -o "\"direction\":\"Inbound\"" | wc -l
9

$ curl -0 -XGET  -u grin:`cat ~/.grin/.api_secret` http://127.0.0.1:13413/v1/peers/connected 2>/dev/null | grep -o "\"direction\":\"Outbound\"" | wc -l
22

And this node can't accept TCP connection on port 13414 anymore:

$ nc -v -t -z -w 3  139.162.150.184 13414
nc: connectx to 139.162.150.184 port 13414 (tcp) failed: Operation timed out

I think those nodes in floonet.seed.grin-tech.org have same problems as mine:

$ nc -v -t -z -w 3  204.48.26.36 13414
nc: connectx to 204.48.26.36 port 13414 (tcp) failed: Operation timed out

$ nc -v -t -z -w 3  198.245.50.26 13414
nc: connectx to 198.245.50.26 port 13414 (tcp) failed: Operation timed out

$ nc -v -t -z -w 3  109.74.202.16 13414
nc: connectx to 109.74.202.16 port 13414 (tcp) failed: Operation timed out

All 3 nodes in this seed can't accept any new connections at this moment.

Perhaps you check and confirm, with above commands.

Perhaps also make sure the PeerWithSelf error results in a clean disconnect?

I added one more step to shutdown the TcpStream when detected PeerWithSelf. I'm not sure whether this shutdown is already enough to kill this problem. I will test this first.

@garyyu
Copy link
Contributor Author

garyyu commented Dec 30, 2018

More problems found:

  1. One peer occupy multiple TCP connections, and both connections are kept. In all 30 connected peers, I got 6 peers with 2 TCP connections for each.
    For example:
{"capabilities":{"bits":15},"user_agent":"MW/Grin 0.5.0","version":1,"addr":"139.162.150.184:13414","direction":"Inbound","total_difficulty":86027740,"height":1519}

$ netstat -tlna | grep 139.162.150.184
tcp        0      0 45.118.135.254:13414    139.162.150.184:34348   ESTABLISHED
tcp        0     27 45.118.135.254:13414    139.162.150.184:34356   ESTABLISHED
  1. Disconnected peer still keep some daemon TCP connections.For example:
$ grep "clean_peers V4(180.111.249.72:13414)" grin-server.log
20181229 02:41:52.773 DEBUG grin_p2p::peers - clean_peers V4(180.111.249.72:13414), not connected
20181229 03:27:14.259 DEBUG grin_p2p::peers - clean_peers V4(180.111.249.72:13414), not connected
20181229 16:45:04.576 DEBUG grin_p2p::peers - clean_peers V4(180.111.249.72:13414), not connected

$ netstat -tlna | grep 180.111.249.72
tcp        0      0 45.118.135.254:13414    180.111.249.72:8373     ESTABLISHED

I think if we can kill case 1, then this case 2 will not happen.

[Updated]:
I will split this into another issue, and leave this PR to be simple and only handle the PeerWithSelf case.

@garyyu garyyu changed the title detect PeerWithSelf also by checking ip address Skip connecting PeerWithSelf by checking ip address Dec 30, 2018
@garyyu
Copy link
Contributor Author

garyyu commented Dec 30, 2018

After 24 hours tests on 3 servers, there's no more self connection. We can see some logs like this:

$ grep PeerWithSelf grin-server.log
20181230 03:33:24.649 WARN grin_p2p::serv - Error accepting peer 35.227.48.174:52374: PeerWithSelf
20181230 03:33:25.649 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:33:30.651 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:33:38.778 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:38:04.292 WARN grin_p2p::serv - Error accepting peer 35.227.48.174:52642: PeerWithSelf
20181230 03:38:05.321 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:38:08.294 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:56:32.555 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 03:56:52.571 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:11:32.771 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:11:52.775 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:24:32.946 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:24:52.951 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:34:53.087 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 04:35:13.091 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 05:09:53.558 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414
20181230 05:10:13.562 DEBUG grin_p2p::serv - connect: ignore the connecting to PeerWithSelf, addr: 35.227.48.174:13414

It works as expected.

Merging now.... and then I switch to issue #2258 which now becomes the main problem causing seed nodes un-connectable after this fix.

@garyyu garyyu merged commit ea7eea3 into mimblewimble:master Dec 30, 2018
@garyyu garyyu deleted the PeerWithSelf branch December 30, 2018 23:16
let addrs = hs.addrs.read();
if addrs.contains(&addr) {
debug!(
"connect: ignore the connecting to PeerWithSelf, addr: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ignore connecting to {}" sounds more accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks, and I will correct this word in next PR.

@@ -29,13 +29,16 @@ use crate::peer::Peer;
use crate::types::{Capabilities, Direction, Error, P2PConfig, PeerInfo, PeerLiveInfo};

const NONCES_CAP: usize = 100;
const ADDRS_CAP: usize = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to comment on why these magic numbers are chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants