-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPTCP connections are hanging when all subflows have been closed by timeout #430
Comments
Yea, so I think I have an idea what is happening here.... I tested lots of variations of reboots and killing the server process in mid transfer but nothing could reproduce it. Then I started playing around with the systemd scripts that setup all the routes and mptcp endpoints on boot. While doing this, I believe I was able to reproduce the hang - I think maybe we have a race condition in the startup of the network, rsyncd and the script we use to setup the mptcp endpoints and routing. Long story short, it looks like downing an interface (which is fine) and bringing it back up (without the routes) is possibly what caused the hangs in this case? I need to do some more tests and get the order of things right but this may not be a mptcp issue at all (unless it's still supposed to recover or fail gracefully). I'll report back tomorrow. |
Okay, I'm not 100% sure this is the exact way to reproduce the problem we saw when a server was rebooted, but it's a replication of an rsync hang nonetheless. Our servers use a systemd service script to setup the mptcp endpoints and routing on boot. This was missing a related dependency on the rsyncd systemd service so the ordering of start/stop is not guaranteed. Both might even start at the same time? So let's setup the mptcp networking on each server as we do in production with the systemd script:
And then the same but reverse routes on serverB:
And now lets do an rsync using serverA as the client to rsyncd running on serverB:
Everything works as expected and we see data being transferred between ens224 & ens256 interface pairs only. Now down ens256 on serverB:
The rsync continues uninterrupted as expected across the ens224 pair only and the mptcp endpoints are still defined for all interfaces. Now bring ens256 back up, but note that the route between the ens256 interfaces is obviously gone from serverB:
At this point the rsync client process on serverA is hung and the rsyncd process on rsyncd is also hung. The tcp connections are no longer reported in the output of "ss" but we see the "mptcp ESTAB" line:
Now I can kill either the rsync client on serverA or the rsyncd process on serverB but the other end's process will never die until manually killed too. I can add the correct route back into serverB for ens256, but the hung rsync processes never recover. All new rsync processes will work fine as expected. So, I think that something like this is happening where the bringup or tear down of interfaces and routes during reboot, hung lots of client rsyncs that were running on different remote hosts. Now I don't know if this is really an mptcp issue per say, but I'll have to think about how to improve the ordering of the network services and bringup and tear down steps to avoid such hangs. |
I could reproduce some of what you report above using a simplified, netns-based test-case.
but with the above the tcp-level subflows are still open and the transfer is not interrupted. I instead reproduced the mptcp-level hangup with this packetdrill:
Basically tcp-level close with timeout are ignored at mptcp level. I have a patch for this latter problem which is a little invasive. I'll try to share soon a simpler form |
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. Closes: multipath-tcp/mptcp_net-next#430 Signed-off-by: Paolo Abeni <[email protected]>
It turned out that the simplified patch I was hoping for has negative side effects. We should resort to the full solution I just posted on the ML. (same link as #431). |
Yep, got it and applied - thanks! I had to munge it ever so slightly to get it to apply cleanly to v6.4.11 but I think I got it right. I tested it with the "artificial" steps described above (down an endpoint interface and bring it back up again), but the same thing happens - the tcp connections close, the rsyncd process on serverB exits, but the client rsync on serverA hangs indefinitely (30 minutes and counting...). I will continue to monitor it with our production workloads and see if it has had any effect on the SYN-SENT hangs (#431). |
@daire-byrne: I can't reproduce such behavior here. Could you please report the relevant part of the |
After rsync start (transfer working well and using both ens224 & ens256):
After "ifconfig ens256 down" on serverB (rsyncd sending to client on serverA):
After bringing back up ens256 on serverB (but without correct route to ens256 on serverA) - the client is now hung on serverA:
So this time I can see a tcp connection in CLOSE-WAIT on serverA which I don't think I saw before (both servers just had the "MPTCP ESTAB" entries with no TCP connections). Both the client rsync process (serverA) and the rsyncd process (serverB) hang indefinitely in this state. If I kill the client rsync process on serverA, the corresponding rsyncd process on serverB continues to hang indefinitely. I hope that helps. I am still waiting to see any actual hangs with our production workloads and I am less concerned about the hangs caused by the steps in the ticket as I am obviously breaking the mptcp connections in an avoidable way. |
Thanks, indeed it does! There is (at least) a bug in the shared patches. Specifically the close timeout will never fire - or to be more accururate, it will fire in ~(LONG_MAX/(2*HZ) - ) seconds. I'll try to shared a fixed version of the patch ASAP. |
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. Closes: multipath-tcp/mptcp_net-next#430 Signed-off-by: Paolo Abeni <[email protected]>
this version should address the bug mentioned above: https://patchwork.kernel.org/project/mptcp/list/?series=780262 the 2 patches should replace the older ones. @daire-byrne: it would be great if you could spin them in your testbed. Side note: it's unclear to me why the TCP subflows close after the device goes up. I guess the one on top of flipping device start using the default route and a different IP-level path. The peer could end-up resetting the connection due to rp_filter and/or f/w. But the other subflows should stay alive... |
New v2 patch applied but I still see the same behaviour when the interface on serverB is brought back online:
Both the client and rsyncd processes hang indefinitely. I didn't catch it in CLOSE-WAIT this time... Because I had to munge one hunk of the patch to apply it to v6.4.11, let me just include my edited hunk in case I have mistranslated it (I have some doubts):
So we also managed to reproduce this type of hang a different way today on the production systems (also running the updated v2 patch). We had to do a firewall failover to upgrade firmware which will have caused a momentary TCP disconnect between all transfer server interfaces (ens192, ens224, ens256) between two servers pairs. All the client rsync processes in this case were hung indefinitely but the rsyncd processes seem to have died off themselves.
So no playing with routes or upping and downing the server interfaces - just all active TCP connections being dropped by the firewall in between them. |
@daire-byrne: thank you for the quick feedback, even if the results are not the one I was looking for... Can you please confirm that:
Note: we could make such timeout configurable e.g. systctl if is just a matter of too long delay - but I think there is other problem there: I expect that even in your setup the mptcp sockets will turn to close after ~60" but rsync process will not be woken-up.
This is consistent with my current understanding of this issue . |
@daire-byrne: addendum, could you please additionally check which syscall is blocking the relevant rsync process?
should tell |
Yea, the client rsync is definitely still hung from this morning's "artificial" test - 6 hours ago. However, it does look like the rsyncd process on serverB eventually died due to a configured rsync timeout:
And the firewall failover examples were noticed because they had been hung for 15+ minutes already. As for the client rsync strace - I think rsync actually spawns two processes (generator/sender), one for receiving from the remote host and one for operating locally:
Could it be that the communication between them is the thing that has broken because it is using mptcp too (mptcpize/LD_PRELOAD)?
Or maybe I am reading that wrong.... |
It looks like the mptcp socket is not moving at all into the TCP_CLOSE state - which is unexpected with the tentative fix patch applied. Could you please double-check that with the |
Requested output from another example:
|
@daire-byrne: thanks again for the feedback. I just noticed there is at least an integer overflow in the timeout logic in the proposed fix. that makes the patch ineffective for the first ~5' after the reboot - until jiffies become positive. Are you able to reproduce the hangup on a machine with the patched kernel and uptime >> 5'? |
I booted serverA and serverB and waited an hour before running through the steps - still the same hang (still going after 10 mins). Did my hunk edit look okay (above) for v6.4.11? That was my only doubt about whether I had applied the patch correctly. I have double checked the build and the patches are definitely in there. For the similar looking hangs we saw after the firewall failover, the hosts had been up for many hours by that point. |
thanks even more for the feedback. Yes, the hunk reported above looks correct. At this point I don't have any other ideas. To get to the bottom of it I need to understand why the msk never transition to the TCP_CLOSE status. The 'perf' tool could help, but the usage is not trivial. You need to
b) 2 probes for mptcp_worker:
not trivial as you can see, but possibly very useful :) the plus side is that no reboot is needed. |
With such concise steps, how could I fail?! So I grabbed the perf probes on serverA starting right before the "ifconfig ens256 up" on serverB. Then I proceeded to bring up the interface which hung the in progress mptcpize rsync. I waited a few minutes and then stopped the perf probe.
I hope that's useful. Let me know if you also need the same from serverB (with the hung rsyncd process). |
Sorry, I should have attached that as a file - didn't realise how many lines there were... |
IMHO it was fairly non trivial. Thanks a lot for the data collection!
1st subflow is closed.
2nd subflow is closed
3rd subflow (main one) is closed, the close timeout is started [...]
the close timeout expires
the mptcp worker see the timeout and invokes mptcp_do_fastclose...
... which walks again the subflows list. the main subflow is never deletes, so is still there even if already closed. Nothing to do, all good... commit bbd49d1
which add the state transition to mptcp_do_fastclose. I forgot/did not notice the above was a pre-req for the tentative fix. Please additionally backport such commit, too, on top of the tentative fix. As I noted the posted revision contains an integer overflow making the fix not effective for the first 5'. I'll post soon a new revision, but even the already shared one should be good 5' after boot.
Indeed I think it was! thanks again for the debugging effort. |
Side notes:
|
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by the blamed commit below. Closes: multipath-tcp/mptcp_net-next#430 Fixes: bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()") Signed-off-by: Paolo Abeni <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
It was still bugging me so I did a little more testing - when I first started using v6.4 I had also applied a patch from yet another cloudflare blog: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it I reapplied that to my stable v6.4.11 kernel + your mptcp fixes and I quickly started seeing hung rsync transfers again. This time they seem to be hanging at the tcp subflow level. This patch is looking likely to end up in a future kernel release so I will have to retest again then with the final merged version. So I think my flurry of bug reports were actually inspired from the high level of failure caused by this patch. I removed it quite early on and then we were left with the actual mptcp bugs that had been present at a much lower frequency for quite some time. Either way, everything has been rock solid for the last week. |
Thanks for the head-up! the relevant patch is already into 6.5, we will have to deal with it. AFAICS the new feature is protected by a sysctl knob, disabled by default. I guess you tested after explicitly setting On top of my head, I think such setting is not compatible at all with mptcp, IIRC the MPTCP RFC has some explicit discussion nothing that shrinking the subflow receive window will cause stalls. Could you please double check that there are no problems with Thanks! |
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allow removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. The issues is actually present since the beginning, but it basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). Closes: #430 Fixes: e16163b ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]>
According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
[ Upstream commit 27e5ccc ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 27e5ccc ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 27e5ccc ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 27e5ccc ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2046197 [ Upstream commit 27e5ccc2d5a50ed61bb73153edb1066104b108b3 ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d114d57 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2046197 [ Upstream commit 27e5ccc2d5a50ed61bb73153edb1066104b108b3 ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d114d57 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2045806 [ Upstream commit 27e5ccc ] According to RFC 8684 section 3.3: A connection is not closed unless [...] or an implementation-specific connection-level send timeout. Currently the MPTCP protocol does not implement such timeout, and connection timing-out at the TCP-level never move to close state. Introduces a catch-up condition at subflow close time to move the MPTCP socket to close, too. That additionally allows removing similar existing inside the worker. Finally, allow some additional timeout for plain ESTABLISHED mptcp sockets, as the protocol allows creating new subflows even at that point and making the connection functional again. This issue is actually present since the beginning, but it is basically impossible to solve without a long chain of functional pre-requisites topped by commit bbd49d1 ("mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current patch, please also backport this other commit as well. Closes: multipath-tcp/mptcp_net-next#430 Fixes: e16163b ("mptcp: refactor shutdown and close") Cc: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Roxana Nicolescu <[email protected]>
A server running rsyncd processes was accidentally rebooted while multiple "client" hosts had active rsync processes to it. On nearly all those hosts, rsync was in the state described in ticket #429 - the tcp connections were gone but the mptcp connection was hanging around and holding the rsync command open.
I also had one client rsync from exactly the same time stuck in the initial connect and MPTCP-SYN was the only entry in the output of "ss".
I'm going to try run some more tests killing the server's rsyncd processes and/or rebooting servers to see if it is reproducible.
Again, this is not normal operation for us and may be a different issue to the one described in #429 (but with the same result)?
Originally posted by @daire-byrne in #429 (comment)
The text was updated successfully, but these errors were encountered: