TCP: tcp_ack resetting flow #243

berezoka · 2018-03-02T10:13:38Z

Hello,
in server side I got some errors:

[Thu Mar  1 23:19:24 2018] mptcp_verif_dss_csum csum is wrong: 0xc5e9 data_seq 1924841623 dss_csum_added 1 overflowed 0 iterations 1
[Thu Mar  1 23:49:31 2018] TCP: mptcp_fallback_infinite 0x806a1f10 will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:56279 rcv_nxt 2923808224 from tcp_rcv_state_process+0x1de/0x820
[Thu Mar  1 23:49:31 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 00:27:24 2018] TCP: mptcp_fallback_infinite 0xa29a980d will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:17413 rcv_nxt 3298219397 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 00:27:24 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 00:56:25 2018] TCP: mptcp_fallback_infinite 0xe5f938ee will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:19303 rcv_nxt 2875906644 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 00:56:25 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 02:15:31 2018] TCP: mptcp_fallback_infinite 0x54a53d7 will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:41031 rcv_nxt 2204363645 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 02:15:31 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:10:47 2018] TCP: mptcp_fallback_infinite 0x521a016c will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:44595 rcv_nxt 1751121396 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:10:47 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:11:33 2018] TCP: mptcp_fallback_infinite 0x83322e3a will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:34951 rcv_nxt 3543815736 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:11:33 2018] TCP: tcp_ack resetting flow
[Fri Mar  2 03:38:25 2018] TCP: mptcp_fallback_infinite 0x8cfa64bd will fallback - pi 2, src 11.111.11.22:1090 dst 185.2.229.214:35017 rcv_nxt 1180093270 from tcp_rcv_state_process+0x1de/0x820
[Fri Mar  2 03:38:25 2018] TCP: tcp_ack resetting flow

Client side was used LAN+LTE and no any errors. After error "TCP: tcp_ack resetting flow" client side traffic is not gonna go back to LTE. I need to stop and start the stream so that can go back to LTE.

MPTCP version 0.93.1 (https://github.com/multipath-tcp/mptcp/archive/mptcp_v0.93.zip)
net.mptcp.mptcp_checksum = 1
net.mptcp.mptcp_debug = 0
net.mptcp.mptcp_enabled = 1
net.mptcp.mptcp_path_manager = fullmesh
net.mptcp.mptcp_scheduler = default
net.mptcp.mptcp_syn_retries = 3
net.mptcp.mptcp_version = 0

Congestions:

net.ipv4.tcp_allowed_congestion_control = balia reno
net.ipv4.tcp_available_congestion_control = balia reno cubic lia olia wvegas
net.ipv4.tcp_congestion_control = balia

Where the problem might be?

The text was updated successfully, but these errors were encountered:

matttbe · 2018-03-02T11:02:30Z

Hi @berezoka,

@cpaasch sent some patches on mptcp-dev. They may fix this bug. Can you test them?

Note that they should be soon in Github.

berezoka · 2018-03-02T17:23:59Z

@matttbe are you talking about this patch?
DATA_ACK.patch.txt

berezoka · 2018-03-03T14:38:24Z

hi,
after patch situation is a same:

[Sat Mar 3 16:23:16 2018] mptcp_verif_dss_csum csum is wrong: 0x63dd data_seq 1369751649 dss_csum_added 1 overflowed 0 iterations 1
[Sat Mar 3 18:53:44 2018] mptcp_verif_dss_csum csum is wrong: 0x800 data_seq 3335471871 dss_csum_added 1 overflowed 0 iterations 1

in client side I must to do:
ip link set dev lte_link0 multipath off
ip link set dev lte_link0 multipath on
to traffic go back to LTE link.

what is another solution?

cpaasch · 2018-03-04T00:54:07Z

Hmmm... Can you double-check that the kernel you compiled has the commit f2a4860 ("mptcp: Update 64-bit receiver indexes after processing ofo-queue") ?

If that's the case, can you capture a packet-trace and wait until you see one of these messages pop up? Thanks!

berezoka · 2018-03-04T01:17:10Z

about hour ago I make compilation of new kernel with latest patches and situation is a same:

[Sun Mar 4 02:58:15 2018] mptcp_verif_dss_csum csum is wrong: 0x10 data_seq 1890498936 dss_csum_added 1 overflowed 0 iterations 1

f2a4860 is applied

packets I will capture tomorrow.

cpaasch · 2018-03-04T01:38:05Z

Thanks, I'm waiting for your capture!

cpaasch · 2018-03-04T01:41:29Z

Please, also store the mptcp_verif_dss_csum message you see together with the packet-trace. I need both to correlate the flows.

berezoka · 2018-03-04T02:03:41Z

I can't go to sleep so I made packets capture:
214.log.txt

error was:
[Sun Mar 4 03:55:13 2018] mptcp_verif_dss_csum csum is wrong: 0x4517 data_seq 1105057857 dss_csum_added 1 overflowed 0 iterations 1

one more:
[Sun Mar 4 21:18:51 2018] mptcp_verif_dss_csum csum is wrong: 0xfbff data_seq 371472309 dss_csum_added 1 overflowed 0 iterations 1
packets_02.zip

cpaasch · 2018-03-05T17:48:28Z

Hmmm... This is weird. Looking at the pcap, it really seems like commit f2a4860 is not in the kernel that you are booting.

How much data are you transmitting here? (it looks like you are transmitting huge amounts of data)
How frequent is the error? Do you also see it happening when transmitting only small amounts on a single connection?

berezoka · 2018-03-05T18:55:10Z

boot kernel is: Linux b 4.9.80 #1 SMP Sun Mar 4 17:35:08 EET 2018 x86_64 GNU/Linux
f2a4860 is really applied. Errors is randomly and sequencing I not finding. Traffic is small via LAN is ~5mbps + LTE ~2mbps.
Fullmesh parameters:
num_subflows =1
create_on_err = 1

Any ideas?

cpaasch · 2018-03-05T23:48:51Z

Traffic is small? I see more than 700MB transmitted on a single subflow:

I will get you a debug-patch that you can apply for testing.

cpaasch · 2018-03-05T23:50:11Z

Also - just to be sure - the other host is booting the same kernel, right?

cpaasch · 2018-03-06T00:12:38Z

And, which scheduler are you using?

cpaasch · 2018-03-06T00:14:17Z

Here is the debug-patch that would be good to apply:

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index c642b683abc7..9ac1254ef555 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -1131,6 +1131,8 @@ static inline void mptcp_check_sndseq_wrap(struct tcp_sock *meta_tp, int inc)
 		struct mptcp_cb *mpcb = meta_tp->mpcb;
 		mpcb->snd_hiseq_index = mpcb->snd_hiseq_index ? 0 : 1;
 		mpcb->snd_high_order[mpcb->snd_hiseq_index] += 2;
+
+		pr_err("%s %#x wrapped around at %u\n", __func__, meta_tp->mpcb->mptcp_loc_token, meta_tp->snd_nxt);
 	}
 }
 
@@ -1141,6 +1143,8 @@ static inline void mptcp_check_rcvseq_wrap(struct tcp_sock *meta_tp,
 		struct mptcp_cb *mpcb = meta_tp->mpcb;
 		mpcb->rcv_high_order[mpcb->rcv_hiseq_index] += 2;
 		mpcb->rcv_hiseq_index = mpcb->rcv_hiseq_index ? 0 : 1;
+
+		pr_err("%s %#x wrapped around at %u\n", __func__, meta_tp->mpcb->mptcp_loc_token, meta_tp->rcv_nxt);
 	}
 }
 
diff --git a/net/mptcp/mptcp_input.c b/net/mptcp/mptcp_input.c
index fb3d99379e11..0442ad051cec 100644
--- a/net/mptcp/mptcp_input.c
+++ b/net/mptcp/mptcp_input.c
@@ -349,8 +349,8 @@ static int mptcp_verif_dss_csum(struct sock *sk)
 
 	/* Now, checksum must be 0 */
 	if (unlikely(csum_fold(csum_tcp))) {
-		pr_err("%s csum is wrong: %#x data_seq %u dss_csum_added %d overflowed %d iterations %d\n",
-		       __func__, csum_fold(csum_tcp), TCP_SKB_CB(last)->seq,
+		pr_err("%s %#x csum is wrong: %#x TCP-seq %u dss_csum_added %d overflowed %d iterations %d\n",
+		       __func__, tp->mpcb->mptcp_loc_token, csum_fold(csum_tcp), TCP_SKB_CB(last)->seq,
 		       dss_csum_added, overflowed, iter);
 
 		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_CSUMFAIL);

berezoka · 2018-03-06T07:36:33Z

traffic going 24h/7 :) and data was sent ~240GB pre day via LAN+LTE. Booting kernel is the same in a both sides :). Congestion at this moment I use "balia", scheduler was used with "default" and "redundant" but error always is a same :(. Talking about kernel panic and "redundant" was no any problems like in #214
Now compiling kernel with debug patch and found one warning:

berezoka · 2018-03-06T09:43:21Z

[Tue Mar 6 11:38:45 2018] mptcp_verif_dss_csum 0x8e8828af csum is wrong: 0x7ae7 TCP-seq 4059809018 dss_csum_added 1 overflowed 0 iterations 1

berezoka · 2018-03-06T15:34:31Z

new catch:

[Tue Mar 6 16:19:23 2018] mptcp_detect_mapping Mappings do not match!
[Tue Mar 6 16:19:23 2018] mptcp_detect_mapping dseq 3709291268 mdseq 3709267468, sseq 4233560286 msseq 4233558885 dlen 1400 mdlen 25976 dfin 0 mdfin 0
packets_02.pcap.zip

berezoka · 2018-03-06T22:01:43Z

errors from server side:
[Tue Mar 6 21:57:19 2018] mptcp_check_rcvseq_wrap 0x5f5e5aa0 wrapped around at 214
[Tue Mar 6 22:30:46 2018] mptcp_verif_dss_csum 0x5f5e5aa0 csum is wrong: 0x76ff TCP-seq 1797150116 dss_csum_added 1 overflowed 0 iterations 1
server_03.pcap.zip

from client side was no errors. Here is only packets:
client_03.pcap.zip

berezoka · 2018-03-11T08:20:49Z

I make some tests with version 0.94 but I get same error:
[Sun Mar 11 01:38:24 2018] mptcp_verif_dss_csum csum is wrong: 0x77d8 data_seq 4005789404 dss_csum_added 1 overflowed 0 iterations 1

one more thing, I can't load balia module:
~# modprobe mptcp_balia
modprobe: ERROR: could not insert 'mptcp_balia': Invalid argument
[Sun Mar 11 10:14:05 2018] TCP: balia does not implement required ops

kernel was installed from debian repo v4.14.24.mptcp

cpaasch · 2018-03-11T16:06:55Z

Hmmm... I have an idea as to what might be going wrong.
I wonder if we handle wrap-around of the data-sequence number correctly on the client-side. I have to do some digging here.

As for the mptcp_balia-issue, this congestion control is not very well maintained and was more of a research-project. I would suggest you use the more proven-out congestion controls like Cubic, BBR,...

berezoka · 2018-03-11T16:58:33Z

however strange but with Cubic this error occurs after 3-4 hours working with alive session

cpaasch · 2018-03-11T18:40:52Z

So, you mean that with Cubic it happens less often? What was the frequency with Balia?

berezoka · 2018-03-11T18:57:13Z

yes, is more leas with Cubic than Balia. With Balia was 2-3 times per hour.

berezoka · 2018-04-26T06:22:13Z

Hello,
maybe there is a solution to solve this error “mptcp_verif_dss_csum 0x7a6e6987 csum is wrong ..." ?

berezoka · 2018-04-26T19:05:34Z

here is some logs:

Apr 25 17:26:28 b0 kernel: [ 5385.614813] mptcp_check_rcvseq_wrap 0x7ddc4547 wrapped around at 7064
Apr 25 17:26:28 b1 kernel: [ 8947.190313] mptcp_check_sndseq_wrap 0x5978af0e wrapped around at 4294966032
Apr 25 18:03:03 b0 kernel: [ 7580.169429] mptcp_verif_dss_csum 0x504d92ac csum is wrong: 0xf7b3 TCP-seq 2881437209 dss_csum_added 1 overflowed 0 iterations 1
Apr 25 18:08:07 b1 kernel: [11447.359862] mptcp_check_sndseq_wrap 0xa53b2d21 wrapped around at 4294966263
Apr 25 18:08:08 b0 kernel: [ 7884.860242] mptcp_check_rcvseq_wrap 0xeac5c8e wrapped around at 355
Apr 25 18:24:45 b1 kernel: [12445.481329] mptcp_check_sndseq_wrap 0x36be8f7d wrapped around at 4294967123
Apr 25 18:24:46 b0 kernel: [ 8883.103881] mptcp_check_rcvseq_wrap 0x7956b5e4 wrapped around at 1215
Apr 25 18:31:22 b0 kernel: [ 9279.222064] mptcp_verif_dss_csum 0x7956b5e4 csum is wrong: 0x8603 TCP-seq 947887266 dss_csum_added 1 overflowed 0 iterations 1
Apr 25 19:03:02 b0 kernel: [11179.662805] mptcp_verif_dss_csum 0x262dcfc8 csum is wrong: 0x93bb TCP-seq 2031246860 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 08:49:35 b0 kernel: [60771.282519] mptcp_check_rcvseq_wrap 0x7a6e6987 wrapped around at 979
Apr 26 08:49:35 b1 kernel: [ 4455.636286] mptcp_check_sndseq_wrap 0x30754331 wrapped around at 4294966887
Apr 26 09:02:07 b0 kernel: [61523.620784] mptcp_verif_dss_csum 0x7a6e6987 csum is wrong: 0xcfff TCP-seq 1601541607 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 09:07:28 b0 kernel: [61843.893513] mptcp_verif_dss_csum 0x7a6e6987 csum is wrong: 0xf754 TCP-seq 1921988663 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 09:09:00 b1 kernel: [ 5620.810096] mptcp_check_sndseq_wrap 0x4eff5b8e wrapped around at 4294967165
Apr 26 09:09:00 b0 kernel: [61936.236414] mptcp_check_rcvseq_wrap 0x1311fcdc wrapped around at 1257
Apr 26 09:53:08 b0 kernel: [64584.210484] mptcp_verif_dss_csum 0x1311fcdc csum is wrong: 0x7fd6 TCP-seq 1822176941 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 10:21:29 b1 kernel: [ 9970.842549] mptcp_check_sndseq_wrap 0x6de99c18 wrapped around at 4294966892
Apr 26 10:21:29 b0 kernel: [66285.500103] mptcp_check_rcvseq_wrap 0x7cffcbf4 wrapped around at 984
Apr 26 10:24:42 b0 kernel: [66477.884652] mptcp_verif_dss_csum 0x7cffcbf4 csum is wrong: 0x87 TCP-seq 2654382263 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 10:51:43 b0 kernel: [68099.464041] mptcp_verif_dss_csum 0x24355f43 csum is wrong: 0xdfff TCP-seq 1614789752 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 10:57:54 b0 kernel: [68470.180733] mptcp_check_rcvseq_wrap 0x24355f43 wrapped around at 168
Apr 26 10:57:54 b1 kernel: [12155.987283] mptcp_check_sndseq_wrap 0x89f8804c wrapped around at 4294966076
Apr 26 11:39:14 b0 kernel: [70949.807786] mptcp_check_rcvseq_wrap 0x88daffc2 wrapped around at 1128
Apr 26 11:39:14 b1 kernel: [14636.069070] mptcp_check_sndseq_wrap 0xdc965297 wrapped around at 4294967036
Apr 26 12:04:16 b0 kernel: [72451.819008] mptcp_verif_dss_csum 0x88daffc2 csum is wrong: 0x8411 TCP-seq 3327185381 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 12:20:06 b0 kernel: [73402.127478] mptcp_check_rcvseq_wrap 0xc256549e wrapped around at 951
Apr 26 12:20:06 b1 kernel: [17088.863686] mptcp_check_sndseq_wrap 0x6ea198ad wrapped around at 4294966859
Apr 26 13:48:59 b0 kernel: [78734.337969] mptcp_check_rcvseq_wrap 0xc256549e wrapped around at 1317
Apr 26 13:48:59 b1 kernel: [22422.080083] mptcp_check_sndseq_wrap 0x6ea198ad wrapped around at 4294967225
Apr 26 14:04:23 b0 kernel: [79659.148000] mptcp_verif_dss_csum 0xd3c5671e csum is wrong: 0xc10 TCP-seq 422445808 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 14:25:47 b1 kernel: [24631.323512] mptcp_check_sndseq_wrap 0xa494c1cb wrapped around at 4294966697
Apr 26 14:25:48 b0 kernel: [80943.223952] mptcp_check_rcvseq_wrap 0xee581318 wrapped around at 4953
Apr 26 16:04:24 b0 kernel: [86859.398228] mptcp_verif_dss_csum 0x31ec85b5 csum is wrong: 0xffef TCP-seq 2230254616 dss_csum_added 1 overflowed 1 iterations 1
Apr 26 17:57:25 b0 kernel: [93640.042730] mptcp_verif_dss_csum 0x5910e9ca csum is wrong: 0xec7e TCP-seq 2441195065 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 18:05:51 b0 kernel: [94146.346244] mptcp_verif_dss_csum 0x5910e9ca csum is wrong: 0x4079 TCP-seq 1227387143 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 19:12:27 b0 kernel: [98142.133871] mptcp_check_rcvseq_wrap 0xc4ba4f09 wrapped around at 733
Apr 26 20:13:54 b0 kernel: [101829.308346] mptcp_verif_dss_csum 0x57312bda csum is wrong: 0x4bf8 TCP-seq 1497321443 dss_csum_added 1 overflowed 0 iterations 1
Apr 26 20:40:06 b1 kernel: [ 8841.292427] mptcp_check_sndseq_wrap 0x58bc2176 wrapped around at 4294967104
Apr 26 20:40:06 b0 kernel: [103401.275533] mptcp_check_rcvseq_wrap 0x7b150f76 wrapped around at 1196

b0 - server
b1 - client

The ipv4 nf_ct code currently skips the nf_conntrak_in() call for fragmented packets. As a results later matches/target can end up manipulating template ct entry instead of 'real' ones. Exploiting the above, syzbot found a way to trigger the following splat: WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55 xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957 RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline] RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293 RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1 RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18 R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478 ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296 iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432 raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x441b49 RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49 RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470 R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Instead of adding checks for template ct on every target/match manipulating skb->_nfct, simply drop the template ct when skipping nf_conntrack_in(). Fixes: 7b4fdf7 ("netfilter: don't track fragmented packets") Reported-and-tested-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>

commit aebfa52 upstream. The ipv4 nf_ct code currently skips the nf_conntrak_in() call for fragmented packets. As a results later matches/target can end up manipulating template ct entry instead of 'real' ones. Exploiting the above, syzbot found a way to trigger the following splat: WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55 xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957 RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline] RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293 RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1 RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18 R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478 ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296 iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432 raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x441b49 RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49 RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470 R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Instead of adding checks for template ct on every target/match manipulating skb->_nfct, simply drop the template ct when skipping nf_conntrack_in(). Fixes: 7b4fdf7 ("netfilter: don't track fragmented packets") Reported-and-tested-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

lenormf · 2018-10-15T08:47:36Z

Hi,

I'm seeing the same type of errors on a trivial setup with 2 VLANs:

[…]
[ 2373.544290] TCP: mptcp_fallback_infinite 0x2e0ed3e4 will fallback - pi 1, src 192.168.2.20:51454 dst 192.168.2.1:5201 rcv_nxt 3490125391 from tcp_rcv_established+0x564/0x818
[ 2383.424398] TCP: mptcp_fallback_infinite 0x17435639 will fallback - pi 2, src 192.168.3.20:39195 dst 192.168.3.1:5201 rcv_nxt 1385930753 from tcp_rcv_state_process+0x240/0x8ec
[ 2383.438635] TCP: tcp_ack resetting flow
[ 2404.467775] TCP: mptcp_fallback_infinite 0xe518971 will fallback - pi 1, src 192.168.2.20:51494 dst 192.168.2.1:5201 rcv_nxt 1353534289 from tcp_rcv_established+0x564/0x818
[ 2404.482715] TCP: mptcp_fallback_infinite 0xfe6974a will fallback - pi 1, src 192.168.2.20:51498 dst 192.168.2.1:5201 rcv_nxt 3353088233 from tcp_rcv_established+0x564/0x818
[ 2404.497820] TCP: mptcp_fallback_infinite 0xb8dff011 will fallback - pi 1, src 192.168.2.20:51502 dst 192.168.2.1:5201 rcv_nxt 3432046503 from tcp_rcv_established+0x564/0x818
[ 2443.949321] TCP: mptcp_fallback_infinite 0x5f0ba4b9 will fallback - pi 2, src 192.168.3.20:42939 dst 192.168.3.1:5201 rcv_nxt 1032229633 from tcp_rcv_state_process+0x240/0x8ec
[ 2443.963526] TCP: tcp_ack resetting flow
[…]

What does the error messages mean? Do I need to patch my kernel with debugging prints that I could send back to you, to help?

lenormf · 2018-10-15T09:30:23Z

Also getting weird traces from time to time:

[ 7012.855984] ------------[ cut here ]------------
[ 7012.859199] WARNING: CPU: 2 PID: 15 at net/mptcp/mptcp_ctrl.c:705 mptcp_sock_def_error_report+0xec/0x13c
[ 7012.868667] Meta already closed i_rcv 1 i_snd 1 send_i 0 flags 0x2004301
[ 7012.875306] Modules linked in: DECT_paging cosic drv_timer drv_vmmc pppoe ppp_async mac_violation_mirror ltq_mpe_hal_drv ltq_directpath_datapath l2tp_ppp iptable_nat cdc_mbim pppox ppp_generic ppa_api_tmplbuf ppa_api_sw_accel_mod ppa_api nf_nat_pptp nf_nat_ipv4 nf_nat_amanda nf_conntrack_pptp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack_amanda ltq_tmu_hal_drv ltq_pae_hal ltq_eth_drv_xrx500 ipt_TRIGGER ipt_REJECT ipt_MASQUERADE ebtable_nat ebtable_filter ebtable_broute dc_mode0_xrx500 cdc_ncm cdc_ether xt_time xt_tcpmss xt_statistic xt_state xt_socket xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_extmark xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TPROXY xt_TCPMSS xt_REDIRECT xt_NFQUEUE xt_NETMAP xt_LOG xt_HL xt_DSCP xt_CLASSIFY xrx500_phy_fw usbnet usblp ts_kmp ts_fsm ts_bm slhc ppa_drv_stack_al phy_grx500_usb pecostat_noIRQ nfnetlink_queue nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast nf_conntrack macvlan ltq_voip_timer_driver ltq_crypto iptable_mangle iptable_filter ipt_ah ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_snat ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_ip ebt_extmark_m ebt_extmark ebt_dnat ebt_arpreply ebt_arp ebt_among ebt_802_3 dwc3_grx500 directconnect_datapath crc_ccitt cpuload cdc_wdm cdc_acm br_netfilter fuse sch_teql em_nbyte sch_prio sch_dsmark sch_pie em_meta sch_gred cls_basic act_ipt em_text sch_codel sch_red sch_fq sch_sfq act_police em_cmp act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress configs xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables msdos ip6_gre ip_gre gre sit l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp6 xfrm6_tunnel xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet esp6 ah6 ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 ip6_tunnel tunnel6 tunnel4 ip_tunnel veth af_key xfrm_user xfrm_ipcomp xfrm_algo vfat fat hfsplus[ 7013.113392] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.119934]  hfs autofs4 vrx518 drv_kpi2udp nls_utf8 nls_iso8859_1 nls_cp437 drv_sdd_mbx drv_tapi drv_ifxos sha512_generic sha1_generic md5 echainiv des_generic cmac cbc authenc usb_storage xhci_plat_hcd xhci_pci xhci_hcd dwc3 sd_mod scsi_mod ext4 jbd2 mbcache exfat usbcore nls_base usb_common mii crc32c_generic
[ 7013.147533] CPU: 2 PID: 15 Comm: ksoftirqd/2 Tainted: G        W       4.9.109+ #0
[ 7013.155077] Stack : 00000006 00000000 00000000 00000000 00000000 00000000 60927e9a 00000046
[ 7013.163407]         00000000 00000000 00000000 60920000 607a0000 6079ee07 606e5e3c 00000002
[ 7013.171740]         0000000f 60923a44 607d6200 726d55c0 00600000[ 7013.177370] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.183895]  6008fd18 00000001 60920000
[ 7013.187713]         607a0000 607a58d4 606eabac 7dc998ac 60923a44 600df91c 607d6200 726d55c0
[ 7013.196046]         68475d48 602f6d40 7dc998ac 00040900 00000000 00000000 00000000 00000000
[ 7013.204380]         ...
[ 7013.206813] Call Trace:
[ 7013.209280] [<6002d450>] show_stack+0x88/0xb8
[ 7013.213612] [<6024c39c>] dump_stack+0x8c/0xc0
[ 7013.217931] [<600470c4>] __warn+0x110/0x118
[ 7013.222097] [<6004710c>] warn_slowpath_fmt+0x40/0x64
[ 7013.227056] [<60552cb8>] mptcp_sock_def_error_report+0xec/0x13c
[ 7013.232949] [<604afd6c>] tcp_reset+0x60/0x84
[ 7013.237201] [<604b00cc>] tcp_validate_incoming+0x33c/0x4b8
[ 7013.241370] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.249008] [<604b2ee0>] tcp_rcv_state_process+0x22c/0x8ec
[ 7013.254490] [<604bec68>] tcp_v4_do_rcv+0x2b4/0x2c4
[ 7013.259251] [<604bfa28>] tcp_v4_rcv+0xc4c/0x11a0
[ 7013.263856] [<60493678>] ip_local_deliver_finish+0x388/0x398
[ 7013.269493] [<60493de4>] ip_local_deliver+0x68/0x10c
[ 7013.274441] [<60494374>] ip_rcv+0x4ec/0x688
[ 7013.278626] [<6044c808>] __netif_receive_skb_core+0x760/0x980
[ 7013.284342] [<6044d354>] netif_receive_skb_internal+0xcc/0xe4
[ 7013.290090] [<60574e28>] br_pass_frame_up+0xf4/0x17c
[ 7013.295017] [<60575008>] br_handle_frame_finish+0x100/0x534
[ 7013.300572] [<605757b8>] br_handle_frame+0x37c/0x414
[ 7013.305369] cdc_ether 2-1:2.0 wwan0: kevent 11 may have been dropped
[ 7013.311857] [<6044c6ec>] __netif_receive_skb_core+0x644/0x980
[ 7013.317587] [<6044e640>] process_backlog+0x9c/0x164
[ 7013.322447] [<6044e3e4>] net_rx_action+0x158/0x318
[ 7013.327232] [<6004bd38>] __do_softirq+0x194/0x2d0
[ 7013.331909] [<6004beac>] run_ksoftirqd+0x38/0x6c
[ 7013.336523] [<60069794>] smpboot_thread_fn+0x1b4/0x1dc
[ 7013.341645] [<600657f4>] kthread+0xf8/0x100
[ 7013.345800] [<600274b8>] ret_from_kernel_thread+0x14/0x1c
[ 7013.351214] ---[ end trace 953e5dd640ff0e93 ]---

cpaasch · 2018-10-15T16:41:32Z

@lenormf - do you also have this error with either mptcp_v0.94, or the latest branch of mptcp_v0.93 ?

Having sporadic warnings à la mptcp_fallback_infinite or tcp_ack resetting flow is fine, because these are just indicating that the MPTCP-connection is falling back to regular TCP.

The bigger warning that you are getting shouldn't appear though.

lenormf · 2018-10-15T16:57:28Z

I have warnings with v0.93.1, I've also cherry-picked some bug-fixes over from v0.94.

matttbe · 2018-10-16T08:19:03Z

I've also cherry-picked some bug-fixes over from v0.94.

Are there fixes missing in the mptcp_v0.93 branch? I was looking at creating a new release for this branch which should contain all needed fixes but please tell me if it is not the case!

Do you have these warnings with the latest version of the mptcp_v0.93 branch as well?

lenormf · 2018-10-16T09:43:33Z

I mispoke, I cherry-picked commits from the development branch, which seemed to be important:

877fa7d mptcp: avoid removing useful skbs from the reinject queue.
80671d2 mptcp: Iterate over subflow-list while holding the lock in tcp_splice_read
1569564 mptcp: Restart subflow-selection when we force a re-evaluation

I haven't picked f2632fa mptcp: Use tcp_abort correctly for MPTCP however, so I should try with this commit first and get back to you.

matttbe · 2018-10-16T11:05:17Z

@lenormf May you try with mptcp_v0.93 branch? https://github.com/multipath-tcp/mptcp/tree/mptcp_v0.93
It also contains all these fixes, e.g. the last one you mention: 678ebe0
It also contains fixes from Linux upstream.

lenormf · 2018-10-16T11:06:44Z

I'm using v0.93.1 already.

matttbe · 2018-10-16T14:19:14Z

@lenormf yes but v0.93.1 is a tag created in January: https://github.com/multipath-tcp/mptcp/releases/tag/v0.93.1

A tag (v0.93.1) should not be modified while the branch (mptcp_v0.93) is not fixed and had been updated in between: v0.93.1...mptcp_v0.93

git checkout mptcp_v0.93
git pull
make (...)

cpaasch self-assigned this Mar 11, 2018

cpaasch added bug Level: Normal labels Mar 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCP: tcp_ack resetting flow #243

TCP: tcp_ack resetting flow #243

berezoka commented Mar 2, 2018 •

edited

Loading

matttbe commented Mar 2, 2018

berezoka commented Mar 2, 2018 •

edited

Loading

berezoka commented Mar 3, 2018 •

edited

Loading

cpaasch commented Mar 4, 2018

berezoka commented Mar 4, 2018

cpaasch commented Mar 4, 2018

cpaasch commented Mar 4, 2018

berezoka commented Mar 4, 2018 •

edited

Loading

cpaasch commented Mar 5, 2018

berezoka commented Mar 5, 2018

cpaasch commented Mar 5, 2018

cpaasch commented Mar 5, 2018

cpaasch commented Mar 6, 2018

cpaasch commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 11, 2018

cpaasch commented Mar 11, 2018

berezoka commented Mar 11, 2018

cpaasch commented Mar 11, 2018

berezoka commented Mar 11, 2018

berezoka commented Apr 26, 2018

berezoka commented Apr 26, 2018

lenormf commented Oct 15, 2018

lenormf commented Oct 15, 2018

cpaasch commented Oct 15, 2018

lenormf commented Oct 15, 2018

matttbe commented Oct 16, 2018

lenormf commented Oct 16, 2018

matttbe commented Oct 16, 2018

lenormf commented Oct 16, 2018

matttbe commented Oct 16, 2018

TCP: tcp_ack resetting flow #243

TCP: tcp_ack resetting flow #243

Comments

berezoka commented Mar 2, 2018 • edited Loading

matttbe commented Mar 2, 2018

berezoka commented Mar 2, 2018 • edited Loading

berezoka commented Mar 3, 2018 • edited Loading

cpaasch commented Mar 4, 2018

berezoka commented Mar 4, 2018

cpaasch commented Mar 4, 2018

cpaasch commented Mar 4, 2018

berezoka commented Mar 4, 2018 • edited Loading

cpaasch commented Mar 5, 2018

berezoka commented Mar 5, 2018

cpaasch commented Mar 5, 2018

cpaasch commented Mar 5, 2018

cpaasch commented Mar 6, 2018

cpaasch commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 6, 2018

berezoka commented Mar 11, 2018

cpaasch commented Mar 11, 2018

berezoka commented Mar 11, 2018

cpaasch commented Mar 11, 2018

berezoka commented Mar 11, 2018

berezoka commented Apr 26, 2018

berezoka commented Apr 26, 2018

lenormf commented Oct 15, 2018

lenormf commented Oct 15, 2018

cpaasch commented Oct 15, 2018

lenormf commented Oct 15, 2018

matttbe commented Oct 16, 2018

lenormf commented Oct 16, 2018

matttbe commented Oct 16, 2018

lenormf commented Oct 16, 2018

matttbe commented Oct 16, 2018

berezoka commented Mar 2, 2018 •

edited

Loading

berezoka commented Mar 2, 2018 •

edited

Loading

berezoka commented Mar 3, 2018 •

edited

Loading

berezoka commented Mar 4, 2018 •

edited

Loading