-
Notifications
You must be signed in to change notification settings - Fork 340
TCP: tcp_ack resetting flow #243
Comments
@matttbe are you talking about this patch? |
hi, [Sat Mar 3 16:23:16 2018] mptcp_verif_dss_csum csum is wrong: 0x63dd data_seq 1369751649 dss_csum_added 1 overflowed 0 iterations 1 in client side I must to do: what is another solution? |
Hmmm... Can you double-check that the kernel you compiled has the commit f2a4860 ("mptcp: Update 64-bit receiver indexes after processing ofo-queue") ? If that's the case, can you capture a packet-trace and wait until you see one of these messages pop up? Thanks! |
about hour ago I make compilation of new kernel with latest patches and situation is a same: [Sun Mar 4 02:58:15 2018] mptcp_verif_dss_csum csum is wrong: 0x10 data_seq 1890498936 dss_csum_added 1 overflowed 0 iterations 1 f2a4860 is applied packets I will capture tomorrow. |
Thanks, I'm waiting for your capture! |
Please, also store the mptcp_verif_dss_csum message you see together with the packet-trace. I need both to correlate the flows. |
I can't go to sleep so I made packets capture: error was: one more: |
Hmmm... This is weird. Looking at the pcap, it really seems like commit f2a4860 is not in the kernel that you are booting. How much data are you transmitting here? (it looks like you are transmitting huge amounts of data) |
Also - just to be sure - the other host is booting the same kernel, right? |
And, which scheduler are you using? |
Here is the debug-patch that would be good to apply:
|
traffic going 24h/7 :) and data was sent ~240GB pre day via LAN+LTE. Booting kernel is the same in a both sides :). Congestion at this moment I use "balia", scheduler was used with "default" and "redundant" but error always is a same :(. Talking about kernel panic and "redundant" was no any problems like in #214 |
[Tue Mar 6 11:38:45 2018] mptcp_verif_dss_csum 0x8e8828af csum is wrong: 0x7ae7 TCP-seq 4059809018 dss_csum_added 1 overflowed 0 iterations 1 |
new catch: [Tue Mar 6 16:19:23 2018] mptcp_detect_mapping Mappings do not match! |
errors from server side: from client side was no errors. Here is only packets: |
I make some tests with version 0.94 but I get same error: one more thing, I can't load balia module: kernel was installed from debian repo v4.14.24.mptcp |
Hmmm... I have an idea as to what might be going wrong. As for the mptcp_balia-issue, this congestion control is not very well maintained and was more of a research-project. I would suggest you use the more proven-out congestion controls like Cubic, BBR,... |
however strange but with Cubic this error occurs after 3-4 hours working with alive session |
So, you mean that with Cubic it happens less often? What was the frequency with Balia? |
yes, is more leas with Cubic than Balia. With Balia was 2-3 times per hour. |
Hello, |
here is some logs: Apr 25 17:26:28 b0 kernel: [ 5385.614813] mptcp_check_rcvseq_wrap 0x7ddc4547 wrapped around at 7064 b0 - server |
The ipv4 nf_ct code currently skips the nf_conntrak_in() call for fragmented packets. As a results later matches/target can end up manipulating template ct entry instead of 'real' ones. Exploiting the above, syzbot found a way to trigger the following splat: WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55 xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957 RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline] RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293 RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1 RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18 R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478 ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296 iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432 raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x441b49 RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49 RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470 R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Instead of adding checks for template ct on every target/match manipulating skb->_nfct, simply drop the template ct when skipping nf_conntrack_in(). Fixes: 7b4fdf7 ("netfilter: don't track fragmented packets") Reported-and-tested-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
commit aebfa52 upstream. The ipv4 nf_ct code currently skips the nf_conntrak_in() call for fragmented packets. As a results later matches/target can end up manipulating template ct entry instead of 'real' ones. Exploiting the above, syzbot found a way to trigger the following splat: WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55 xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957 RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline] RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127 RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293 RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1 RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18 R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478 ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296 iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432 raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x441b49 RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49 RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470 R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Instead of adding checks for template ct on every target/match manipulating skb->_nfct, simply drop the template ct when skipping nf_conntrack_in(). Fixes: 7b4fdf7 ("netfilter: don't track fragmented packets") Reported-and-tested-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Hi, I'm seeing the same type of errors on a trivial setup with 2 VLANs:
What does the error messages mean? Do I need to patch my kernel with debugging prints that I could send back to you, to help? |
Also getting weird traces from time to time:
|
@lenormf - do you also have this error with either mptcp_v0.94, or the latest branch of mptcp_v0.93 ? Having sporadic warnings à la The bigger warning that you are getting shouldn't appear though. |
I have warnings with v0.93.1, I've also cherry-picked some bug-fixes over from v0.94. |
Are there fixes missing in the mptcp_v0.93 branch? I was looking at creating a new release for this branch which should contain all needed fixes but please tell me if it is not the case! Do you have these warnings with the latest version of the mptcp_v0.93 branch as well? |
I mispoke, I cherry-picked commits from the development branch, which seemed to be important:
I haven't picked f2632fa |
@lenormf May you try with mptcp_v0.93 branch? https://github.com/multipath-tcp/mptcp/tree/mptcp_v0.93 |
I'm using v0.93.1 already. |
@lenormf yes but v0.93.1 is a tag created in January: https://github.com/multipath-tcp/mptcp/releases/tag/v0.93.1 A tag (v0.93.1) should not be modified while the branch (mptcp_v0.93) is not fixed and had been updated in between: v0.93.1...mptcp_v0.93 git checkout mptcp_v0.93
git pull
make (...) |
Hello,
in server side I got some errors:
Client side was used LAN+LTE and no any errors. After error "TCP: tcp_ack resetting flow" client side traffic is not gonna go back to LTE. I need to stop and start the stream so that can go back to LTE.
Congestions:
Where the problem might be?
The text was updated successfully, but these errors were encountered: