Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new mptcp connections stuck in SYN-SENT #431

Closed
daire-byrne opened this issue Aug 27, 2023 · 4 comments
Closed

new mptcp connections stuck in SYN-SENT #431

daire-byrne opened this issue Aug 27, 2023 · 4 comments
Assignees
Labels

Comments

@daire-byrne
Copy link

Opening a new ticket in my ongoing series of "why do rsync transfers hang when using mptcp" series... :)

So as we patch, filter and better understand the causes of hanging rsync commands (thanks!), this issue seems to be recurring with our production workloads (if not yet reproducible) but is likely not connected to the previous issues?

My observation is that these "stuck" rsyncs hang in connect and often occur in timed flurries and mostly have the remote "server" (rsyncd) in common.

So I have seen 3 hang within the same minute on serverA, 2 in that same minute on serverB all trying to connect to serverC. I am not able to ascertain if any TCP connections were ever started, but certainly there no signs of it on the clients (serverA, serverB) and the server (serverC).

The hung rsync client processes have to be manually killed as they never timeout.

mptcp SYN-SENT 0      0       10.25.20.251:52758 10.29.20.251:873   users:(("rsync",pid=2779080,fd=3)) ino:90648012 sk:144 cgroup:unreachable:1 <->
	 skmem:(r0,rb262144,t0,tb16384,f0,w0,o0,bl0,d0) add_addr_signal:1 subflows_max:4 add_addr_signal_max:2 add_addr_accepted_max:4 token:dad97478                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
mptcp SYN-SENT 0      0       10.25.20.251:52468 10.29.20.251:873   users:(("rsync",pid=2798914,fd=3)) ino:90975459 sk:145 cgroup:unreachable:1 <->
	 skmem:(r0,rb262144,t0,tb16384,f0,w0,o0,bl0,d0) add_addr_signal:1 subflows_max:4 add_addr_signal_max:2 add_addr_accepted_max:4 token:9669aebd

strace: Process 845197 attached
connect(3, {sa_family=AF_INET, sin_port=htons(873), sin_addr=inet_addr("10.25.20.251")}, 16^Cstrace: Process 845197 detached

I have noticed the odd syn flood message in the logs of various servers but the timing of these never line up with rsync hangs.

[184728.689393] TCP: request_sock_subflow_v4: Possible SYN flooding on port 0.0.0.0:873. Dropping request.
[187038.497743] TCP: request_sock_subflow_v4: Possible SYN flooding on port 0.0.0.0:873. Dropping request.

I have also increased some sysctls and have not seen the flood messages since (but still see SYN-SENT hangs).

sysctl -w net.core.netdev_max_backlog = 30000
sysctl -w net.core.somaxconn=65536
sysctl -w net.ipv4.tcp_max_syn_backlog=32768

My gut feeling is that this is also something related to v6.3+ as I would have thought I would have noticed the frequency of this before (although I said that about #429 too...). Out of maybe 50,000 rsync connections per day, I'm seeing around 3-5 hanging like this.

Despite a few connections hanging like this around the same time, the subsequent connections all seem to work fine again so whatever causes it seems pretty fleeting.

And maybe this would still happen with normal TCP + rsync? Although I don't think I've ever seen it happen in the wild.

I'll attach more info as I have it...

@pabeni
Copy link

pabeni commented Aug 28, 2023

I have noticed the odd syn flood message in the logs of various servers but the timing of these never line up with rsync hangs.

[184728.689393] TCP: request_sock_subflow_v4: Possible SYN flooding on port 0.0.0.0:873. Dropping request.
[187038.497743] TCP: request_sock_subflow_v4: Possible SYN flooding on port 0.0.0.0:873. Dropping request.

Note that the SYN_SENT TCP subflow will time-out (close) quite some time after the above event on the other end. The exact time depends on net.ipv4.tcp_syn_retries and net.ipv4.tcp_syn_linear_timeouts but I can't extract a simple expression to compute the exact value on top of my head. With default setting should be very roughly ~1'. And all the tcp syn trans need to be dropped in-between to really experience the tcp-level timeout.

I think the same scenario will happen when tcp syn (and retransmissions) are dropped in between by whatever means/cause (e.g. firewall/ct exceeding max entries)

@pabeni
Copy link

pabeni commented Aug 28, 2023

@daire-byrne: I posted a couple of patches that should address the above (and possibly issues/430, as I think they are related/almost the same):

https://patchwork.kernel.org/project/mptcp/list/?series=779909

Could you please have a run in your testbed? ,

@matttbe
Copy link
Member

matttbe commented Aug 29, 2023

As discussed at the last weekly meeting (and mentioned by @pabeni above), it looks like this needs the same fix as for #430. We can marked this one here (#431) as duplicated then.

@daire-byrne
Copy link
Author

Yea, it is worth mentioning that I have not seen the SYN-SENT hung rsync processes today with production workloads and I think I would have expected to see one by now.

It does seem likely that this is fixed by the patches for #430.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants