-
Notifications
You must be signed in to change notification settings - Fork 340
TCP collapse in full duplex connections with MPTCP enabled #261
Comments
Hi @fciaccia, Thank you for this detailed bug report! Matt |
Writing down some notes on this issue here as I am looking into it: There is packet-loss happening and from that moment on the connection is going to stall. What happens is that the server is retransmitting a lot of out-of-order data and there is ultimately only one segment missing. However, this last one is never getting acknowledged, although it is being retransmitted. This means that in the TCP-input path somewhere we are dropping it. nstat counters should indicate where, but I can't find which one that is. Besides that, maybe some other drop is taking place. |
I think the main problem here is, that there is no (known? documented?) possiblity to debug whats happening and what the internal states of mptcp are. I played two days with mptcp and saw only strange behavoir, looking the same as all the bugs here around, but no way to assign it to internal states or something like that. Without this there is no debugging possible, nor bug reports which could help, don't you think so? |
When storing a pointer to a dst_metrics structure in dst_entry._metrics, two flags are added in the least significant bits of the pointer value. Hence this assumes all pointers to dst_metrics structures have at least 4-byte alignment. However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not 4 bytes. Hence in some kernel builds, dst_default_metrics may be only 2-byte aligned, leading to obscure boot warnings like: WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty multipath-tcp#261 Stack from 10835e6c: 10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea 00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000 04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c 00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001 003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84 003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a Call Trace: [<00023fa6>] __warn+0xb2/0xb4 [<00023fea>] warn_slowpath_fmt+0x42/0x64 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<00043aa8>] printk+0x0/0x18 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e [<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e [<00025e84>] __local_bh_enable_ip+0x0/0x8e [<002416a8>] dst_destroy+0x40/0xae Fix this by forcing 4-byte alignment of all dst_metrics structures. Fixes: e5fd387 ("ipv6: do not overwrite inetpeer metrics prematurely") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
[ Upstream commit 258a980 ] When storing a pointer to a dst_metrics structure in dst_entry._metrics, two flags are added in the least significant bits of the pointer value. Hence this assumes all pointers to dst_metrics structures have at least 4-byte alignment. However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not 4 bytes. Hence in some kernel builds, dst_default_metrics may be only 2-byte aligned, leading to obscure boot warnings like: WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty multipath-tcp#261 Stack from 10835e6c: 10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea 00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000 04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c 00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001 003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84 003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a Call Trace: [<00023fa6>] __warn+0xb2/0xb4 [<00023fea>] warn_slowpath_fmt+0x42/0x64 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<00043aa8>] printk+0x0/0x18 [<001a70f8>] refcount_warn_saturate+0x44/0x9a [<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e [<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e [<00025e84>] __local_bh_enable_ip+0x0/0x8e [<002416a8>] dst_destroy+0x40/0xae Fix this by forcing 4-byte alignment of all dst_metrics structures. Fixes: e5fd387 ("ipv6: do not overwrite inetpeer metrics prematurely") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
TL;DR
I found that MPTCP behaves differently than legacy TCP for duplex TCP connections: under some circumstances the connection completely stalls when sending data in both directions.
Hi all,
I have a specific use case where I am transferring big amount of data over the same TCP connection in both directions. I am experiencing a full connection collapse after both directions are reaching high throughput; the connection never recovers.
A consistent reduction in throughput is expected when both direction are carrying full payload data (due to ACKs being piggybacked more slowly), and when setting rmem and wmem maximum values to more than the default Linux ones the TCP flow is more likely to collapse. Specifically, I grew, in both endpoints, the rmem max value to 8MB and wmem to 6MB (from the 6MB/4MB default values respectively).
So far, so good: this behaviour is expected. However, when enabling MPTCP with identical conditions (single subflow, same windows and CC algorithm) it exhibits different symptoms: throughput drops to zero and the connection stalls.
My impression is that MPTCP is not handling correctly this use case (i.e., full duplex TCP).
The environment I am testing consists of two Debian Jessie VMs, where only one of the two is multi-homed with two access links. The connection goes over the Internet, and RTT is quite big, ~130ms. VM1 is deployed in a KVM host, VM2 is an Amazon EC2 instance. Aggregated capacity in VM1 is 1.2Gbps (1x200Mbps link + 1x1Gbps link) where the Amazon instance can get up to 2Gbps over a single link. Both interfaces in VM1 have public IPv4 configured. VM2 in Amazon is behind a NAT.
I can reproduce the misbehaviour with both MPTCP kernel v0.94 and v0.93 directly installed from the apt repo. I can reproduce the collapse with both fullmesh path manager (which opens two subflows), and default path manager. I am using CUBIC as congestion control algorithm.
Uname -a:
The kernel log with mptcp_debug enabled shows the connection establishment and nothing more until I don't force close the connection by interrupting the client/server (public IPs mangled on purpose; the following log is taken when using default path manager):
On force close:
[13529.223071] mptcp_close: Close of meta_sk with tok 0x4bf4afb9
My sysctl configuration:
I am attaching a simple C program which reproduces the kind of traffic causing the error. It sends 2GB of random data in both direction over the same connection.
To compile:
gcc -Wall -O2 -pthread -o duplextcp duplextcp.c
Usage server:
./duplextcp -s localIP localPort
Usage client:
./duplextcp destIP destPort
Any help would be much appreciated.
Thank you very much!
duplextcp.zip
The text was updated successfully, but these errors were encountered: