Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Slow (very slow) connection with 3 connections but not with 2 #282

Open
dur3x opened this issue Sep 8, 2018 · 20 comments
Open

Slow (very slow) connection with 3 connections but not with 2 #282

dur3x opened this issue Sep 8, 2018 · 20 comments

Comments

@dur3x
Copy link

dur3x commented Sep 8, 2018

In the past I had 3 ADSL box by the same internet provider but more recently I replaced one of my adsl box to another one (WAN2) which is by another provider (to have prodiver redundancy and also because the bandwith was better).
With the 3 old adsl box by the same ISP I got good result by using the wan1+wan2+wan3 but since I replaced/got a new box (wan2) it's just not usable.
So currently I'm running with wan1+wan3 or just wan2.
I'm open to any suggestion/test :) I really don't understand why I got these poor results with 3 connections now and not in the past with my old ISP.
Thanks in advance for you help
dump.pcap.zip

WAN1 (without mptcp)

# curl --interface wan1 http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 12  120M   12 15.5M    0     0   787k      0  0:02:36  0:00:20  0:02:16  930k^C

WAN2 (without mptcp)

# curl --interface wan2 http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 34  120M   34 41.5M    0     0  2119k      0  0:00:58  0:00:20  0:00:38  917k

WAN3 (without mptcp)

# curl --interface wan3 http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 31  120M   31 37.6M    0     0  1895k      0  0:01:05  0:00:20  0:00:45 1947k

WAN1 + WAN3

# curl http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 14  120M   14 17.9M    0     0   881k      0  0:02:20  0:00:20  0:02:00 1182k

WAN3 + WAN2

# curl http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 44  120M   44 54.0M    0     0  2727k      0  0:00:45  0:00:20  0:00:25 1900k

WAN1 + WAN2

# curl http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 10  120M   10 12.8M    0     0   611k      0  0:03:22  0:00:21  0:03:01  738k

WAN1 + WAN2 + WAN3

# curl http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  120M    0  996k    0     0  50071      0  0:42:08  0:00:20  0:41:48 13686

WAN1 + WAN2 + WAN3 => with tcpdump pcap in attached to this issue

# curl http://multipath-tcp.org/snapshots/mptcp_2016_04_18.tar.gz > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1  120M    1 1669k    0     0  79329      0  0:26:36  0:00:21  0:26:15 12882
@rstanislav
Copy link

Have you each tested connections rtt ? (ping) and also mptcp capable tests (on mptcp status page)? Difference in ping can cause big impact on results.

@dur3x
Copy link
Author

dur3x commented Sep 19, 2018

Indeed I don't have all the time stable results (high ping or link disruption) but I didn't think it could have a so big impact (switch from 2Mo/s to around 0Ko/s)

WAN1

--- 8.8.8.8 ping statistics ---
101 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 127.688/327.328/1250.665 ms

WAN2

--- 8.8.8.8 ping statistics ---
53 packets transmitted, 53 packets received, 0% packet loss
round-trip min/avg/max = 14.931/30.788/67.409 ms

WAN3

--- 8.8.8.8 ping statistics ---
134 packets transmitted, 134 packets received, 0% packet loss
round-trip min/avg/max = 16.170/19.121/67.605 ms

@rstanislav
Copy link

That can cause very huge impact from my experience, on even 2 LTE modems, if 1 gets 20mb/s and second 5-10 but with 3 times more ping, result would be not even close to 20 mb/s.. if connections have around the same RTT(ping) then its working fine.

Also why so big ping on wan1 ADLS ? Looks like problem.. maybe hardware ?

@dur3x
Copy link
Author

dur3x commented Sep 19, 2018

Thanks for your feedback. In fact I'm not directly connected through these three ADSL with ethernet cable but I'm connected to these ones through wifi. The three ADSL modems are in fact my neighbours which share me an access.
The rtt (ping) for each of my connection could be variant but sometimes it's stable and all connections has a similar ping but when I download a file I can confirm that each time all rtt are different on each connection and indeed it can be the root cause.

@rstanislav
Copy link

I was doing tests for example currently with 4 LTE modems, installed in vehicle and in city if all 4 have good signal/RTT i'm pushing it to limit (i'm using raspberry pi 3b+) - its connected to router (for wifi) via 100 mb/s link (4 ethernet wires, and from what i found on internet if rpi3b+ connected via 100 mb ethernet 91-92 mb/s is a limit it can push) - and in many cases i can get 60 to 91 mb/s results, i was using 2 sim cards of same operator (megafon) and currently testing with 4 different (replaced 1 megafon with tele2, so in the end i have 4 different operators - MTS, MEGAFON, BEELINE, TELE2) - in most cases results are way lower, but i dont need high speed, i need high coverage and redundancy (outside of city) and tele2 has advantages outside of city or between cities, so as result i get more low, but more stable internet speed - in situations where other 3 operators sometimes have no signal, tele2 works and i have internet, downside of this is speed/quality of tele2 3G/LTE - its poor, so overall speed is way lower in most cases, so yeah, 1 poor link can lead to huge impact in final aggregated speed of connection.. In your case, maybe you can use directional antennas to your neighbours, and that will improve quality ? WiFi is not best way to connect, and in high load situations with poor signal usually results in packet loss witch affect speed greatly.

@dur3x
Copy link
Author

dur3x commented Sep 19, 2018

I was doing tests for example currently with 4 LTE modems, installed in vehicle and in city if all 4 have good signal/RTT i'm pushing it to limit (i'm using raspberry pi 3b+) - its connected to router (for wifi) via 100 mb/s link (4 ethernet wires, and from what i found on internet if rpi3b+ connected via 100 mb ethernet 91-92 mb/s is a limit it can push) - and in many cases i can get 60 to 91 mb/s results, i was using 2 sim cards of same operator (megafon) and currently testing with 4 different (replaced 1 megafon with tele2, so in the end i have 4 different operators - MTS, MEGAFON, BEELINE, TELE2) - in most cases results are way lower, but i dont need high speed, i need high coverage and redundancy (outside of city) and tele2 has advantages outside of city or between cities, so as result i get more low, but more stable internet speed - in situations where other 3 operators sometimes have no signal, tele2 works and i have internet, downside of this is speed/quality of tele2 3G/LTE - its poor, so overall speed is way lower in most cases, so yeah, 1 poor link can lead to huge impact in final aggregated speed of connection.. In your case, maybe you can use directional antennas to your neighbours, and that will improve quality ? WiFi is not best way to connect, and in high load situations with poor signal usually results in packet loss witch affect speed greatly.

Interesting :-) In the past I already did many tests and never observed this current behaviour. In general in all my test the worst case was that my speed was equal to the worst path/adsl but here as you can see I can see my worst path is around 900Ko/s and during tests with 3 paths in most of case the connection is around 0 and finaly interrupted (voluntary or not).
So I'm ok to say that my current setup is not the optimal one and of course improvements can be done but this so bad behaviour looks me too important.

But anyway currently I'm using all the time 2 paths with 1 backup and it's cleary sufficient.
I don't know if we will a day get a solution for this because as you said it's perhaps linked to one of the remote component (hardware, provider, signal noisy, ..)

@pRiVi
Copy link

pRiVi commented Sep 25, 2018

I donnot think so, as mentioned on bug #283, a similar setup, shows that the project is not in a usable state, same situation now over years.

pabeni pushed a commit to pabeni/mptcp that referenced this issue Apr 15, 2019
This adds mutex to guard against update of global ppgtt mm LRU list.
To resolve error found as below warning.

[73130.012162] ------------[ cut here ]------------
[73130.012168] list_add corruption. prev->next should be next (ffff995f970cca50), but was 0000000000000000. (prev=ffff995f0dc5bdf8).
[73130.012181] WARNING: CPU: 3 PID: 82 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
[73130.012183] Modules linked in: btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) dm_mod(E) kvmgt(E) fuse(E) xt_addrtype(E) nft_compat(E) xt_conntrack(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) devlink(E) nf_tables(E) nfnetlink(E) loop(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) mei_me(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) intel_cstate(E) intel_uncore(E) mei(E) intel_pch_thermal(E) intel_rapl_perf(E) pcspkr(E) iTCO_wdt(E) iTCO_vendor_support(E) idma64(E) sg(E) virt_dma(E) acpi_pad(E) evdev(E) binfmt_misc(E) ip_tables(E) x_tables(E) ipv6(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) xhci_pci(E) sdhci_pci(E) cqhci(E) intel_lpss_pci(E) intel_lpss(E) crc32c_intel(E) xhci_hcd(E) sdhci(E) i2c_i801(E) e1000e(E) mmc_core(E)
[73130.012218]  ptp(E) pps_core(E) usbcore(E) mfd_core(E) sd_mod(E) fan(E) thermal(E)
[73130.012227] CPU: 3 PID: 82 Comm: gvt workload 0 Tainted: G        W   E     5.0.0-rc7-staging-190226+ multipath-tcp#282
[73130.012228] Hardware name:  /NUC6i5SYB, BIOS SYSKLi35.86A.0039.2016.0316.1747 03/16/2016
[73130.012232] RIP: 0010:__list_add_valid+0x4d/0x70
[73130.012234] Code: c3 48 89 d1 48 c7 c7 e0 82 91 bb 48 89 c2 e8 44 8a cc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 30 83 91 bb e8 2d 8a cc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 83 91 bb e8
[73130.012236] RSP: 0018:ffffa4924107fdd0 EFLAGS: 00010286
[73130.012238] RAX: 0000000000000000 RBX: ffff995d8a5ccf00 RCX: 0000000000000006
[73130.012240] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff995faad96680
[73130.012241] RBP: 0000000000000000 R08: 0000000000213a28 R09: 0000000000000084
[73130.012243] R10: 0000000000000000 R11: ffffa4924107fc70 R12: ffff995d8a5ccf78
[73130.012245] R13: ffff995f970c8000 R14: ffff995f0dc5bdf8 R15: ffff995f970cca50
[73130.012247] FS:  0000000000000000(0000) GS:ffff995faad80000(0000) knlGS:0000000000000000
[73130.012249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[73130.012250] CR2: 00000222e1891000 CR3: 0000000116848002 CR4: 00000000003626e0
[73130.012252] Call Trace:
[73130.012258]  intel_vgpu_pin_mm+0x7a/0xa0
[73130.012262]  workload_thread+0x683/0x12a0
[73130.012266]  ? do_wait_intr_irq+0xb0/0xb0
[73130.012269]  ? finish_wait+0x80/0x80
[73130.012271]  ? intel_vgpu_clean_workloads+0x110/0x110
[73130.012274]  kthread+0x116/0x130
[73130.012276]  ? kthread_bind+0x30/0x30
[73130.012280]  ret_from_fork+0x35/0x40
[73130.012285] WARNING: CPU: 3 PID: 82 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
[73130.012286] ---[ end trace 458a2e792eec21c0 ]---

v2:
- simplify lock handling

Reviewed-by: Xiong Zhang <[email protected]>
Cc: Xiong Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
@suyuan168
Copy link

suyuan168 commented May 19, 2019

I was doing tests for example currently with 4 LTE modems, installed in vehicle and in city if all 4 have good signal/RTT i'm pushing it to limit (i'm using raspberry pi 3b+) - its connected to router (for wifi) via 100 mb/s link (4 ethernet wires, and from what i found on internet if rpi3b+ connected via 100 mb ethernet 91-92 mb/s is a limit it can push) - and in many cases i can get 60 to 91 mb/s results, i was using 2 sim cards of same operator (megafon) and currently testing with 4 different (replaced 1 megafon with tele2, so in the end i have 4 different operators - MTS, MEGAFON, BEELINE, TELE2) - in most cases results are way lower, but i dont need high speed, i need high coverage and redundancy (outside of city) and tele2 has advantages outside of city or between cities, so as result i get more low, but more stable internet speed - in situations where other 3 operators sometimes have no signal, tele2 works and i have internet, downside of this is speed/quality of tele2 3G/LTE - its poor, so overall speed is way lower in most cases, so yeah, 1 poor link can lead to huge impact in final aggregated speed of connection.. In your case, maybe you can use directional antennas to your neighbours, and that will improve quality ? WiFi is not best way to connect, and in high load situations with poor signal usually results in packet loss witch affect speed greatly.

raspberry pi 3b+Isn't the official saying that the network speed can reach 300 mb/s? I said how I studied the half-day speed is always locked at 100 mb / s can not break through 100 mb / s. But unlike the official claims, or where the code locks the NIC speed.

@rstanislav
Copy link

I was doing tests for example currently with 4 LTE modems, installed in vehicle and in city if all 4 have good signal/RTT i'm pushing it to limit (i'm using raspberry pi 3b+) - its connected to router (for wifi) via 100 mb/s link (4 ethernet wires, and from what i found on internet if rpi3b+ connected via 100 mb ethernet 91-92 mb/s is a limit it can push) - and in many cases i can get 60 to 91 mb/s results, i was using 2 sim cards of same operator (megafon) and currently testing with 4 different (replaced 1 megafon with tele2, so in the end i have 4 different operators - MTS, MEGAFON, BEELINE, TELE2) - in most cases results are way lower, but i dont need high speed, i need high coverage and redundancy (outside of city) and tele2 has advantages outside of city or between cities, so as result i get more low, but more stable internet speed - in situations where other 3 operators sometimes have no signal, tele2 works and i have internet, downside of this is speed/quality of tele2 3G/LTE - its poor, so overall speed is way lower in most cases, so yeah, 1 poor link can lead to huge impact in final aggregated speed of connection.. In your case, maybe you can use directional antennas to your neighbours, and that will improve quality ? WiFi is not best way to connect, and in high load situations with poor signal usually results in packet loss witch affect speed greatly.

raspberry pi 3b+Isn't the official saying that the network speed can reach 300 mb/s? I said how I studied the half-day speed is always locked at 100 mb / s can not break through 100 mb / s. But unlike the official claims, or where the code locks the NIC speed.

In my case it was wifi router that has 100mb link, so at the end i getting about 93mb/s, in theory using all usb on rpi3b+ i think about 130-150mb/s is possible with gigabit eth link, considering that all usb is using USB hub chip that is shared with rpi ethernet (its also usb to ethernet chip on rpi) and connected to rpi CPU via single usb link).

@suyuan168
Copy link

I was doing tests for example currently with 4 LTE modems, installed in vehicle and in city if all 4 have good signal/RTT i'm pushing it to limit (i'm using raspberry pi 3b+) - its connected to router (for wifi) via 100 mb/s link (4 ethernet wires, and from what i found on internet if rpi3b+ connected via 100 mb ethernet 91-92 mb/s is a limit it can push) - and in many cases i can get 60 to 91 mb/s results, i was using 2 sim cards of same operator (megafon) and currently testing with 4 different (replaced 1 megafon with tele2, so in the end i have 4 different operators - MTS, MEGAFON, BEELINE, TELE2) - in most cases results are way lower, but i dont need high speed, i need high coverage and redundancy (outside of city) and tele2 has advantages outside of city or between cities, so as result i get more low, but more stable internet speed - in situations where other 3 operators sometimes have no signal, tele2 works and i have internet, downside of this is speed/quality of tele2 3G/LTE - its poor, so overall speed is way lower in most cases, so yeah, 1 poor link can lead to huge impact in final aggregated speed of connection.. In your case, maybe you can use directional antennas to your neighbours, and that will improve quality ? WiFi is not best way to connect, and in high load situations with poor signal usually results in packet loss witch affect speed greatly.

raspberry pi 3b+Isn't the official saying that the network speed can reach 300 mb/s? I said how I studied the half-day speed is always locked at 100 mb / s can not break through 100 mb / s. But unlike the official claims, or where the code locks the NIC speed.

In my case it was wifi router that has 100mb link, so at the end i getting about 93mb/s, in theory using all usb on rpi3b+ i think about 130-150mb/s is possible with gigabit eth link, considering that all usb is using USB hub chip that is shared with rpi ethernet (its also usb to ethernet chip on rpi) and connected to rpi CPU via single usb link).

I can't exceed 100mb/s anyway on rpi3b+ anyway. My local computer shows that I am using a 1000m network card to link to rpi3b+. I use rpi3b+ as the openwrt router and the test speed never exceeds 100 MB/s. I think there is a problem.
Thank you for your answer.

@matttbe
Copy link
Member

matttbe commented May 20, 2019

Having bad perf might be due to many things: CPU, hardware and bug of course. The best is certainly to analyse traces to know which side is blocking and then analyse what's wrong on this device, e.g. check CPU utilisation, etc.

But now that I see you are using a RPI 3B+, it might be due to this "low-end" device: https://www.raspberrypi.org/forums/viewtopic.php?t=208512

@suyuan168
Copy link

Having bad perf might be due to many things: CPU, hardware and bug of course. The best is certainly to analyse traces to know which side is blocking and then analyse what's wrong on this device, e.g. check CPU utilisation, etc.

But now that I see you are using a RPI 3B+, it might be due to this "low-end" device: https://www.raspberrypi.org/forums/viewtopic.php?t=208512

I just tried my soft routing x86, he has 6 Gigabit Ethernet ports and has usb3.0. I used 6 4G network cards plus 200M fiber. With the ORM test speed is only 110M, the speed is very low. But I removed the 4G And only used 200M fiber. At this time the speed can be full. The maximum speed is 200M and the speed is very fast. So I think MPTCP is very unfriendly for USB 4G. Not only does the speed not improve, but it also drops .
The conclusion is that they feel that they will average the network speed.
Thank you everyone.

@pRiVi
Copy link

pRiVi commented May 20, 2019

No no no!

It is as I have already told in a different bug just got closed without any attention: If you have latency, mptcp fails at all.

They only tested in their labor, without packet loss and without (changing) latency, so you got what they developed: A local-switch only solution.

@suyuan168
Copy link

There may be delays and slow speeds that affect the overall speed. And this speed is not 1+1+1=3 may be 1+1+1=1.5. But I am still very grateful to the MPTCP community team for their contributions. They help the network become more stable. If you can solve this problem, it would be perfect. If the speed is 1+1+1=2.5, it would be great. I am just describing the speed of the network. In the process of moving the network, we don't know which wan's speed will change or not, but we still hope to have a bigger broadband with less delay. Thank you everyone. Hope this question? Can someone think of a better solution.
thank you very much.

@matttbe
Copy link
Member

matttbe commented May 20, 2019

Don't hesitate to look at the comments from #334 In short, this use-case should require another MPTCP packet scheduler and it should be needed to analyse traces to understand what's wrong, then analyse why one side doesn't accept more or the other side doesn't push more. Maybe you are "simply" limit by windows size because due to the latency, you might need to buffer more. Some schedulers might use less buffers.

@suyuan168
Copy link

Don't hesitate to look at the comments from #334 In short, this use-case should require another MPTCP packet scheduler and it should be needed to analyse traces to understand what's wrong, then analyse why one side doesn't accept more or the other side doesn't push more. Maybe you are "simply" limit by windows size because due to the latency, you might need to buffer more. Some schedulers might use less buffers.
Thank you,
Thank you very much, I have used BBR, OLIA, BALIA, WVEGS, no improvement. I will continue to try.

@matttbe
Copy link
Member

matttbe commented May 20, 2019

I don't think TCP CC (net.ipv4.tcp_congestion_control sysctl) will change a lot the situation. MPTCP packet scheduler (net.mptcp.mptcp_scheduler sysctl) might if you have a recent (development version) MPTCP kernel.

@suyuan168
Copy link

I don't think TCP CC (net.ipv4.tcp_congestion_control sysctl) will change a lot the situation. MPTCP packet scheduler (net.mptcp.mptcp_scheduler sysctl) might if you have a recent (development version) MPTCP kernel.

Thank you.

@pRiVi
Copy link

pRiVi commented May 20, 2019

There is so much work to be done for this project to be useful in the most use cases, if not in any....

@matttbe
Copy link
Member

matttbe commented Jun 26, 2019

For those here who are using more than 2 subflows and see issues when one subflow is bad, could you please look at my last message in #334 ?

Of course if you think that this project used by millions of people is not useful, no need to read this message nor testing anything.

dreibh pushed a commit to dreibh/mptcp that referenced this issue Oct 29, 2020
[ Upstream commit a7a12b5 ]

the following command

 # tc action add action tunnel_key \
 > set src_ip 2001:db8::1 dst_ip 2001:db8::2 id 10 erspan_opts 1:6789:0:0

generates the following splat:

 BUG: KASAN: slab-out-of-bounds in tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
 Write of size 4 at addr ffff88813f5f1cc8 by task tc/873

 CPU: 2 PID: 873 Comm: tc Not tainted 5.9.0+ multipath-tcp#282
 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
 Call Trace:
  dump_stack+0x99/0xcb
  print_address_description.constprop.7+0x1e/0x230
  kasan_report.cold.13+0x37/0x7c
  tunnel_key_copy_opts+0xcc9/0x1010 [act_tunnel_key]
  tunnel_key_init+0x160c/0x1f40 [act_tunnel_key]
  tcf_action_init_1+0x5b5/0x850
  tcf_action_init+0x15d/0x370
  tcf_action_add+0xd9/0x2f0
  tc_ctl_action+0x29b/0x3a0
  rtnetlink_rcv_msg+0x341/0x8d0
  netlink_rcv_skb+0x120/0x380
  netlink_unicast+0x439/0x630
  netlink_sendmsg+0x719/0xbf0
  sock_sendmsg+0xe2/0x110
  ____sys_sendmsg+0x5ba/0x890
  ___sys_sendmsg+0xe9/0x160
  __sys_sendmsg+0xd3/0x170
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f872a96b338
 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
 RSP: 002b:00007ffffe367518 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000000005f8f5aed RCX: 00007f872a96b338
 RDX: 0000000000000000 RSI: 00007ffffe367580 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000001c
 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
 R13: 0000000000686760 R14: 0000000000000601 R15: 0000000000000000

 Allocated by task 873:
  kasan_save_stack+0x19/0x40
  __kasan_kmalloc.constprop.7+0xc1/0xd0
  __kmalloc+0x151/0x310
  metadata_dst_alloc+0x20/0x40
  tunnel_key_init+0xfff/0x1f40 [act_tunnel_key]
  tcf_action_init_1+0x5b5/0x850
  tcf_action_init+0x15d/0x370
  tcf_action_add+0xd9/0x2f0
  tc_ctl_action+0x29b/0x3a0
  rtnetlink_rcv_msg+0x341/0x8d0
  netlink_rcv_skb+0x120/0x380
  netlink_unicast+0x439/0x630
  netlink_sendmsg+0x719/0xbf0
  sock_sendmsg+0xe2/0x110
  ____sys_sendmsg+0x5ba/0x890
  ___sys_sendmsg+0xe9/0x160
  __sys_sendmsg+0xd3/0x170
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

 The buggy address belongs to the object at ffff88813f5f1c00
  which belongs to the cache kmalloc-256 of size 256
 The buggy address is located 200 bytes inside of
  256-byte region [ffff88813f5f1c00, ffff88813f5f1d00)
 The buggy address belongs to the page:
 page:0000000011b48a19 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13f5f0
 head:0000000011b48a19 order:1 compound_mapcount:0
 flags: 0x17ffffc0010200(slab|head)
 raw: 0017ffffc0010200 0000000000000000 0000000d00000001 ffff888107c43400
 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88813f5f1b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88813f5f1c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff88813f5f1c80: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
                                               ^
  ffff88813f5f1d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88813f5f1d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

using IPv6 tunnels, act_tunnel_key allocates a fixed amount of memory for
the tunnel metadata, but then it expects additional bytes to store tunnel
specific metadata with tunnel_key_copy_opts().

Fix the arguments of __ipv6_tun_set_dst(), so that 'md_size' contains the
size previously computed by tunnel_key_get_opts_len(), like it's done for
IPv4 tunnels.

Fixes: 0ed5269 ("net/sched: add tunnel option support to act_tunnel_key")
Reported-by: Shuang Li <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Acked-by: Cong Wang <[email protected]>
Link: https://lore.kernel.org/r/36ebe969f6d13ff59912d6464a4356fe6f103766.1603231100.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
cpaasch pushed a commit that referenced this issue Sep 24, 2021
commit f68aa10 upstream.

A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:

[    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[    1.333404] Modules linked in:
[    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ #282
[    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[    1.333404] Call Trace:
[    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
[    1.333404]  ? rdinit_setup+0x2b/0x2b
[    1.333404]  early_resume_init+0x23/0xa4
[    1.333404]  ? cn_proc_init+0x36/0x36
[    1.333404]  do_one_initcall+0x3e/0x200
[    1.333404]  kernel_init_freeable+0x232/0x28e
[    1.333404]  ? rest_init+0xd0/0xd0
[    1.333404]  kernel_init+0x16/0x120
[    1.333404]  ret_from_fork+0x1f/0x30

Cc: <[email protected]>
Fixes: 8d152e7 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
matttbe pushed a commit that referenced this issue Jan 7, 2022
commit f68aa10 upstream.

A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:

[    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[    1.333404] Modules linked in:
[    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ #282
[    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[    1.333404] Call Trace:
[    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
[    1.333404]  ? rdinit_setup+0x2b/0x2b
[    1.333404]  early_resume_init+0x23/0xa4
[    1.333404]  ? cn_proc_init+0x36/0x36
[    1.333404]  do_one_initcall+0x3e/0x200
[    1.333404]  kernel_init_freeable+0x232/0x28e
[    1.333404]  ? rest_init+0xd0/0xd0
[    1.333404]  kernel_init+0x16/0x120
[    1.333404]  ret_from_fork+0x1f/0x30

Cc: <[email protected]>
Fixes: 8d152e7 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dreibh pushed a commit to dreibh/mptcp that referenced this issue Feb 7, 2022
commit f68aa10 upstream.

A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:

[    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[    1.333404] Modules linked in:
[    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ multipath-tcp#282
[    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[    1.333404] Call Trace:
[    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
[    1.333404]  ? rdinit_setup+0x2b/0x2b
[    1.333404]  early_resume_init+0x23/0xa4
[    1.333404]  ? cn_proc_init+0x36/0x36
[    1.333404]  do_one_initcall+0x3e/0x200
[    1.333404]  kernel_init_freeable+0x232/0x28e
[    1.333404]  ? rest_init+0xd0/0xd0
[    1.333404]  kernel_init+0x16/0x120
[    1.333404]  ret_from_fork+0x1f/0x30

Cc: <[email protected]>
Fixes: 8d152e7 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants