Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Redundant scheduler currently stable? #354

Open
Marctraider opened this issue Aug 10, 2019 · 7 comments
Open

Redundant scheduler currently stable? #354

Marctraider opened this issue Aug 10, 2019 · 7 comments
Labels

Comments

@Marctraider
Copy link

Marctraider commented Aug 10, 2019

Hi!

Im using latest Openmpctprouter at the moment with latest mptcp version 0.95

Sadly using redundant scheduler often causes kernel panic on VPS side, i wonder if its considered stable or known issues?

Thanks

@Marctraider
Copy link
Author

Marctraider commented Aug 11, 2019

Update: Now I run Debian 9, not sure if it matters, I still get trace call (warnings) on redundant module, but the VPS does not freeze and redundant mode still seem to work (As I can rip out network cable 2 out of 3 and TCP stream recovers within a second)

[  229.284419] ------------[ cut here ]------------
[  229.284422] pvqspinlock: lock 0xffff88806653cc88 has corrupted value 0x0!
[  229.284443] WARNING: CPU: 0 PID: 0 at __pv_queued_spin_unlock_slowpath+0xbb/0xc0
[  229.284444] Modules linked in: mptcp_redundant xt_nat ip6t_MASQUERADE cls_u32 cls_flow cls_fw sch_sfq sch_prio xt_recent ipt_REJECT nf_reject_ipv4 iptable_mangle iptable_raw nf_log_ipv4 xt_comment nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp ip6table_nat nf_nat_ipv6 nf_nat_amanda ip6t_REJECT nf_reject_ipv6 xt_addrtype ip6t_rpfilter xt_mark ip6table_mangle nf_conntrack_snmp xt_tcpudp xt_CT ip6table_raw xt_multiport xt_conntrack nfnetlink_log xt_NFLOG xt_LOG nf_log_ipv6 nf_log_common nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda iptable_filter
[  229.284486]  ip6table_filter ip6_tables sit tunnel4 ipt_MASQUERADE ip_tunnel iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc joydev hid_generic bochs_drm snd_pcm aesni_intel snd_timer ttm snd aes_x86_64 crypto_simd virtio_rng soundcore usbhid cryptd glue_helper pcspkr hid rng_core serio_raw drm_kms_helper drm sg button evdev sch_fq tun tcp_bbr mptcp_balia mptcp_wvegas mptcp_olia ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 virtio_blk virtio_net net_failover failover sr_mod cdrom ata_generic crc32c_intel ata_piix libata psmouse scsi_mod floppy virtio_pci virtio_ring virtio uhci_hcd ehci_hcd usbcore i2c_piix4
[  229.284548] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         4.19.56-mptcp #1
[  229.284549] Hardware name: Tilaa VPS, BIOS 1.11.0-2.el7 04/01/2014
[  229.284553] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xbb/0xc0
[  229.284555] Code: 00 48 63 7a 40 e8 45 4e fa ff 66 90 c3 8b 05 a4 e3 24 01 85 c0 74 02 f3 c3 8b 17 48 89 fe 48 c7 c7 00 77 d3 81 e8 15 dd fb ff <0f> 0b c3 0f 0b e8 3b fa ff ff 66 90 fb 66 66 90 66 66 90 c3 90 48
[  229.284556] RSP: 0018:ffff88807fc039d8 EFLAGS: 00010292
[  229.284558] RAX: 000000000000003d RBX: ffff88807cad40c0 RCX: 0000000000000006
[  229.284559] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88807fc165d0
[  229.284560] RBP: ffff88807fc03b30 R08: 0000000000000001 R09: 0000000000023ce3
[  229.284561] R10: ffff88807fc03990 R11: 0000000000023ce3 R12: ffff88806653cc00
[  229.284562] R13: ffff88806653d4c0 R14: ffffffff82318e60 R15: ffff8880790e2640
[  229.284564] FS:  0000000000000000(0000) GS:ffff88807fc00000(0000) knlGS:0000000000000000
[  229.284565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  229.284566] CR2: 00007f96b5db3010 CR3: 0000000001e0a002 CR4: 00000000000606f0
[  229.284571] Call Trace:
[  229.284575]  <IRQ>
[  229.284578]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[  229.284582]  .slowpath+0x9/0xe
[  229.284585]  tcp_conn_request+0x9c0/0xc20
[  229.284592]  ? nf_conntrack_tuple_taken+0x50/0x270 [nf_conntrack]
[  229.284597]  ? mptcp_conn_request+0x131/0x160
[  229.284599]  mptcp_conn_request+0x131/0x160
[  229.284602]  ? __inet_lookup_listener+0x1ef/0x320
[  229.284604]  tcp_rcv_state_process+0x1d2/0x990
[  229.284607]  ? sk_filter_trim_cap+0x28/0x2c0
[  229.284610]  tcp_v4_do_rcv+0x14a/0x230
[  229.284612]  tcp_v4_rcv+0xc11/0xc80
[  229.284614]  ip_local_deliver_finish+0xa5/0x1d0
[  229.284616]  ip_local_deliver+0x56/0xc0
[  229.284618]  ? ip_rcv_core.isra.20+0x290/0x290
[  229.284620]  ip_rcv+0x3d/0xb0
[  229.284622]  ? ip_rcv_finish_core.isra.18+0x370/0x370
[  229.284624]  __netif_receive_skb_one_core+0x3d/0x50
[  229.284626]  netif_receive_skb_internal+0x1f/0xb0
[  229.284628]  napi_gro_receive+0x6a/0x80
[  229.284631]  receive_buf+0x62d/0x1140 [virtio_net]
[  229.284635]  ? kvm_sched_clock_read+0xd/0x20
[  229.284637]  ? sched_clock+0x5/0x10
[  229.284640]  ? sched_clock_cpu+0xc/0xa0
[  229.284643]  ? vring_unmap_one+0xc/0x60 [virtio_ring]
[  229.284645]  ? detach_buf+0x63/0x110 [virtio_ring]
[  229.284648]  virtnet_poll+0xca/0x306 [virtio_net]
[  229.284650]  net_rx_action+0x22f/0x350
[  229.284655]  __do_softirq+0xe8/0x20c
[  229.284657]  irq_exit+0xbd/0xd0
[  229.284659]  do_IRQ+0x45/0xd0
[  229.284661]  common_interrupt+0xf/0xf
[  229.284663]  </IRQ>
[  229.284666] RIP: 0010:native_safe_halt+0xe/0x10
[  229.284668] Code: 8b 00 a8 08 74 80 eb c2 f3 c3 90 90 e9 07 00 00 00 0f 00 2d e6 b9 5f 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d6 b9 5f 00 fb f4 <c3> 90 53 e8 ca eb ab ff 65 8b 05 93 14 a0 7e fb 66 66 90 66 66 90
[  229.284669] RSP: 0018:ffffffff81e03ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8
[  229.284671] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807fc1a660
[  229.284673] RDX: ffffffff81e42238 RSI: ffff88807fc1a660 RDI: 000000355f81f242
[  229.284674] RBP: ffffffff81eba260 R08: 00000043e90181bf R09: 0000000000000000
[  229.284675] R10: 0000000000000008 R11: 0000000000000000 R12: ffffffff81e13480
[  229.284676] R13: ffffffff81e13480 R14: 0000000000000000 R15: 0000000000000000
[  229.284680]  default_idle+0xc/0x20
[  229.284682]  do_idle+0x191/0x240
[  229.284684]  cpu_startup_entry+0x5a/0x60
[  229.284686]  start_kernel+0x44f/0x45a
[  229.284690]  secondary_startup_64+0xa4/0xb0
[  229.284692] ---[ end trace 74d7482191bac297 ]---

@Marctraider
Copy link
Author

Nvm got freeze with some load.
panic

@matttbe
Copy link
Member

matttbe commented Aug 12, 2019

Hi

Thank you for the bug report.

For the last screenshot you took, could you get the trace from the beginning please? (from the cut here, all around the 550th second). A bit like what you did in your previous comment.

By chance, could @chrpinedo and/or @AlexanderFroemmgen (or others) could have a look at this please?

@matttbe matttbe added the bug label Aug 12, 2019
@Marctraider
Copy link
Author

Marctraider commented Aug 12, 2019

@matttbe

Ill try replicate the panic today. I get dmesg to spam callbacks consistently tho with either Redundant or BLEST, the panic is a bit less common but not too hard to trigger.

It might be related to shadowsocks, when I disable that, and run MPTCP over glorytun TCP, at least the callback spam is not there.

@matttbe
Copy link
Member

matttbe commented Aug 12, 2019

@Marctraider : Feel free to open a new ticket for the warnings/panics you have with BLEST. Then we can add the different authors in the loop.

@AlexanderFroemmgen
Copy link
Contributor

By chance, could @chrpinedo and/or @AlexanderFroemmgen (or others) could have a look at this please?

I will try to have a look at the weekend.

@Marctraider
Copy link
Author

Marctraider commented Aug 13, 2019

Alright, just a heads up. I've changed from a VPS (KVM) to a Dedicated Server, and all trace call spam and no kernel panics to be seen after almost a day of uptime (With redundant scheduler)

The maintainer of Openmptcprouter gave me the suggestion and said that its possible that there are too many VPS shared on same hardware/connection and it can make mptcp behave bad. (Already upgraded the VPS from 1 to 2GB and 1 to 2 cores but this didnt seem to have helped)

Also, now i do no longer think the kernel panic and call backs from dmesg were related to redundant/BLEST per se, as the spam also occurred on a clean install with just the regular default scheduler.

Still, I find it kind of weird that mptcp has apparently no tolerance for variable hardware resources, is this an inherit flaw or something that simply cannot be dealt with and basically 'by design' ?

I tried yesterday to reproduce a kernel panic for hours for you guys, but I only got trace call spam in dmesg sadly. I'm sure that this can somehow be replicated in a test environment though with variable (limited) hardware resources.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants