Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp: use RFC6298 compliant TCP RTO calculation #3

Merged
merged 1 commit into from
Jun 13, 2016

Conversation

danielmgit
Copy link

This patch adjusts Linux RTO calculation to be RFC6298 Standard
compliant. MinRTO is no longer added to the computed RTO, RTO damping
and overestimation are decreased.

In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the
calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is
less. The Linux implementation as a discrepancy to the Standard
basically adds the defined MinRTO to the calculated RTO. When
comparing both approaches, the Linux calculation seems to perform
worse for sender limited TCP flows like Telnet, SSH or constant bit
rate encoded transmissions, especially for Round Trip Times (RTT) of
50ms to 800ms.

Compared to the Linux implementation the RFC 6298 proposed RTO
calculation performs better and more precise in adapting to current
network characteristics. Extensive measurements for bulk data did not
show a negative impact of the adjusted calculation.

Exemplary Performance Comparison for sender-limited-flows:

  • Rate: 10Mbit/s

  • Delay: 200ms, Delay Variation: 10ms

  • Time between each scheduled segment: 1s

  • Amount Data Segments: 300

  • Mean of 11 runs

     Mean Response Waiting Time [milliseconds]
    

PER [%] | 0.5 1 1.5 2 3 5 7 10
--------+-------------------------------------------------------
old | 206.4 208.6 218.0 218.6 227.2 249.3 274.7 308.2
new | 203.9 206.0 207.0 209.9 217.3 225.6 238.7 259.1

Detailed Analysis:

https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/

Signed-off-by: Daniel Metz [email protected]

This patch adjusts Linux RTO calculation to be RFC6298 Standard
compliant. MinRTO is no longer added to the computed RTO, RTO damping
and overestimation are decreased.

In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the
calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is
less.  The Linux implementation as a discrepancy to the Standard
basically adds the defined MinRTO to the calculated RTO. When
comparing both approaches, the Linux calculation seems to perform
worse for sender limited TCP flows like Telnet, SSH or constant bit
rate encoded transmissions, especially for Round Trip Times (RTT) of
50ms to 800ms.

Compared to the Linux implementation the RFC 6298 proposed RTO
calculation performs better and more precise in adapting to current
network characteristics. Extensive measurements for bulk data did not
show a negative impact of the adjusted calculation.

Exemplary Performance Comparison for sender-limited-flows:
- Rate: 10Mbit/s
- Delay: 200ms, Delay Variation: 10ms
- Time between each scheduled segment: 1s
- Amount Data Segments: 300
- Mean of 11 runs

         Mean Response Waiting Time [milliseconds]

PER [%] |   0.5      1    1.5      2      3      5      7     10
--------+-------------------------------------------------------
old     | 206.4  208.6  218.0  218.6  227.2  249.3  274.7  308.2
new     | 203.9  206.0  207.0  209.9  217.3  225.6  238.7  259.1

Detailed Analysis:

https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/

Signed-off-by: Daniel Metz <[email protected]>
@hgn hgn merged commit 3245ac3 into hgn:davem-net-next-master Jun 13, 2016
hgn pushed a commit that referenced this pull request Jun 14, 2016
…offline_kmem()

memcg_offline_kmem() may be called from memcg_free_kmem() after a css
init failure.  memcg_free_kmem() is a ->css_free callback which is
called without cgroup_mutex and memcg_offline_kmem() ends up using
css_for_each_descendant_pre() without any locking.  Fix it by adding rcu
read locking around it.

    mkdir: cannot create directory `65530': No space left on device
    ===============================
    [ INFO: suspicious RCU usage. ]
    4.6.0-work+ torvalds#321 Not tainted
    -------------------------------
    kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
     [  527.243970] other info that might help us debug this:
     [  527.244715]
    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by kworker/0:5/1664:
     #0:  ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
     #1:  ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
     [  527.248098] stack backtrace:
    CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ torvalds#321
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
    Workqueue: cgroup_destroy css_free_work_fn
    Call Trace:
      dump_stack+0x68/0xa1
      lockdep_rcu_suspicious+0xd7/0x110
      css_next_descendant_pre+0x7d/0xb0
      memcg_offline_kmem.part.44+0x4a/0xc0
      mem_cgroup_css_free+0x1ec/0x200
      css_free_work_fn+0x49/0x5e0
      process_one_work+0x1c5/0x4a0
      worker_thread+0x49/0x490
      kthread+0xea/0x100
      ret_from_fork+0x1f/0x40

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>	[4.5+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
hgn pushed a commit that referenced this pull request Jan 29, 2020
With zpci_disable() working, lockdep detected a potential deadlock
(lockdep output at the end).

The deadlock is between recovering a PCI function via the

/sys/bus/pci/devices/<dev>/recover

attribute vs powering it off via

/sys/bus/pci/slots/<slot>/power.

The fix is analogous to the changes in commit 0ee223b ("scsi: core:
Avoid that SCSI device removal through sysfs triggers a deadlock")
that fixed a potential deadlock on removing a SCSI device via sysfs.

[  204.830107] ======================================================
[  204.830109] WARNING: possible circular locking dependency detected
[  204.830111] 5.5.0-rc2-06072-gbc03ecc9a672 #6 Tainted: G        W
[  204.830112] ------------------------------------------------------
[  204.830113] bash/1034 is trying to acquire lock:
[  204.830115] 0000000192a1a610 (kn->count#200){++++}, at: kernfs_remove_by_name_ns+0x5c/0xa8
[  204.830122]
               but task is already holding lock:
[  204.830123] 00000000c16134a8 (pci_rescan_remove_lock){+.+.}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48
[  204.830128]
               which lock already depends on the new lock.

[  204.830129]
               the existing dependency chain (in reverse order) is:
[  204.830130]
               -> #1 (pci_rescan_remove_lock){+.+.}:
[  204.830134]        validate_chain+0x93a/0xd08
[  204.830136]        __lock_acquire+0x4ae/0x9d0
[  204.830137]        lock_acquire+0x114/0x280
[  204.830140]        __mutex_lock+0xa2/0x960
[  204.830142]        mutex_lock_nested+0x32/0x40
[  204.830145]        recover_store+0x4c/0xa8
[  204.830147]        kernfs_fop_write+0xe6/0x218
[  204.830151]        vfs_write+0xb0/0x1b8
[  204.830152]        ksys_write+0x6c/0xf8
[  204.830154]        system_call+0xd8/0x2d8
[  204.830155]
               -> #0 (kn->count#200){++++}:
[  204.830187]        check_noncircular+0x1e6/0x240
[  204.830189]        check_prev_add+0xfc/0xdb0
[  204.830190]        validate_chain+0x93a/0xd08
[  204.830192]        __lock_acquire+0x4ae/0x9d0
[  204.830193]        lock_acquire+0x114/0x280
[  204.830194]        __kernfs_remove.part.0+0x2e4/0x360
[  204.830196]        kernfs_remove_by_name_ns+0x5c/0xa8
[  204.830198]        remove_files.isra.0+0x4c/0x98
[  204.830199]        sysfs_remove_group+0x66/0xc8
[  204.830201]        sysfs_remove_groups+0x46/0x68
[  204.830204]        device_remove_attrs+0x52/0x90
[  204.830207]        device_del+0x182/0x418
[  204.830208]        pci_remove_bus_device+0x8a/0x130
[  204.830210]        pci_stop_and_remove_bus_device_locked+0x3a/0x48
[  204.830212]        disable_slot+0x68/0x100
[  204.830213]        power_write_file+0x7c/0x130
[  204.830215]        kernfs_fop_write+0xe6/0x218
[  204.830217]        vfs_write+0xb0/0x1b8
[  204.830218]        ksys_write+0x6c/0xf8
[  204.830220]        system_call+0xd8/0x2d8
[  204.830221]
               other info that might help us debug this:

[  204.830223]  Possible unsafe locking scenario:

[  204.830224]        CPU0                    CPU1
[  204.830225]        ----                    ----
[  204.830226]   lock(pci_rescan_remove_lock);
[  204.830227]                                lock(kn->count#200);
[  204.830229]                                lock(pci_rescan_remove_lock);
[  204.830231]   lock(kn->count#200);
[  204.830233]
                *** DEADLOCK ***

[  204.830234] 4 locks held by bash/1034:
[  204.830235]  #0: 00000001b6fbc498 (sb_writers#4){.+.+}, at: vfs_write+0x158/0x1b8
[  204.830239]  #1: 000000018c9f5090 (&of->mutex){+.+.}, at: kernfs_fop_write+0xaa/0x218
[  204.830242]  #2: 00000001f7da0810 (kn->count#235){.+.+}, at: kernfs_fop_write+0xb6/0x218
[  204.830245]  #3: 00000000c16134a8 (pci_rescan_remove_lock){+.+.}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48
[  204.830248]
               stack backtrace:
[  204.830250] CPU: 2 PID: 1034 Comm: bash Tainted: G        W         5.5.0-rc2-06072-gbc03ecc9a672 #6
[  204.830252] Hardware name: IBM 8561 T01 703 (LPAR)
[  204.830253] Call Trace:
[  204.830257]  [<00000000c05e10c0>] show_stack+0x88/0xf0
[  204.830260]  [<00000000c112dca4>] dump_stack+0xa4/0xe0
[  204.830261]  [<00000000c0694c06>] check_noncircular+0x1e6/0x240
[  204.830263]  [<00000000c0695bec>] check_prev_add+0xfc/0xdb0
[  204.830264]  [<00000000c06971da>] validate_chain+0x93a/0xd08
[  204.830266]  [<00000000c06994c6>] __lock_acquire+0x4ae/0x9d0
[  204.830267]  [<00000000c069867c>] lock_acquire+0x114/0x280
[  204.830269]  [<00000000c09ca15c>] __kernfs_remove.part.0+0x2e4/0x360
[  204.830270]  [<00000000c09cb5c4>] kernfs_remove_by_name_ns+0x5c/0xa8
[  204.830272]  [<00000000c09cee14>] remove_files.isra.0+0x4c/0x98
[  204.830274]  [<00000000c09cf2ae>] sysfs_remove_group+0x66/0xc8
[  204.830276]  [<00000000c09cf356>] sysfs_remove_groups+0x46/0x68
[  204.830278]  [<00000000c0e3dfe2>] device_remove_attrs+0x52/0x90
[  204.830280]  [<00000000c0e40382>] device_del+0x182/0x418
[  204.830281]  [<00000000c0dcfd7a>] pci_remove_bus_device+0x8a/0x130
[  204.830283]  [<00000000c0dcfe92>] pci_stop_and_remove_bus_device_locked+0x3a/0x48
[  204.830285]  [<00000000c0de7190>] disable_slot+0x68/0x100
[  204.830286]  [<00000000c0de6514>] power_write_file+0x7c/0x130
[  204.830288]  [<00000000c09cc846>] kernfs_fop_write+0xe6/0x218
[  204.830290]  [<00000000c08f3480>] vfs_write+0xb0/0x1b8
[  204.830291]  [<00000000c08f378c>] ksys_write+0x6c/0xf8
[  204.830293]  [<00000000c1154374>] system_call+0xd8/0x2d8
[  204.830294] INFO: lockdep is turned off.

Signed-off-by: Niklas Schnelle <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
hgn pushed a commit that referenced this pull request Jan 29, 2020
We don't need to hold the local pinctrl lock here to set irq wake on the
summary irq line. Doing so only leads to lockdep warnings instead of
protecting us from anything. Remove the locking.

 WARNING: possible circular locking dependency detected
 5.4.11 #2 Tainted: G        W
 ------------------------------------------------------
 cat/3083 is trying to acquire lock:
 ffffff81f4fa58c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94

 but task is already holding lock:
 ffffff81f4880c18 (&pctrl->lock){-.-.}, at: msm_gpio_irq_set_wake+0x48/0x7c

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&pctrl->lock){-.-.}:
        _raw_spin_lock_irqsave+0x64/0x80
        msm_gpio_irq_ack+0x68/0xf4
        __irq_do_set_handler+0xe0/0x180
        __irq_set_handler+0x60/0x9c
        irq_domain_set_info+0x90/0xb4
        gpiochip_hierarchy_irq_domain_alloc+0x110/0x200
        __irq_domain_alloc_irqs+0x130/0x29c
        irq_create_fwspec_mapping+0x1f0/0x300
        irq_create_of_mapping+0x70/0x98
        of_irq_get+0xa4/0xd4
        spi_drv_probe+0x4c/0xb0
        really_probe+0x138/0x3f0
        driver_probe_device+0x70/0x140
        __device_attach_driver+0x9c/0x110
        bus_for_each_drv+0x88/0xd0
        __device_attach+0xb0/0x160
        device_initial_probe+0x20/0x2c
        bus_probe_device+0x34/0x94
        device_add+0x35c/0x3f0
        spi_add_device+0xbc/0x194
        of_register_spi_devices+0x2c8/0x408
        spi_register_controller+0x57c/0x6fc
        spi_geni_probe+0x260/0x328
        platform_drv_probe+0x90/0xb0
        really_probe+0x138/0x3f0
        driver_probe_device+0x70/0x140
        device_driver_attach+0x4c/0x6c
        __driver_attach+0xcc/0x154
        bus_for_each_dev+0x84/0xcc
        driver_attach+0x2c/0x38
        bus_add_driver+0x108/0x1fc
        driver_register+0x64/0xf8
        __platform_driver_register+0x4c/0x58
        spi_geni_driver_init+0x1c/0x24
        do_one_initcall+0x1a4/0x3e8
        do_initcall_level+0xb4/0xcc
        do_basic_setup+0x30/0x48
        kernel_init_freeable+0x124/0x1a8
        kernel_init+0x14/0x100
        ret_from_fork+0x10/0x18

 -> #0 (&irq_desc_lock_class){-.-.}:
        __lock_acquire+0xeb4/0x2388
        lock_acquire+0x1cc/0x210
        _raw_spin_lock_irqsave+0x64/0x80
        __irq_get_desc_lock+0x64/0x94
        irq_set_irq_wake+0x40/0x144
        msm_gpio_irq_set_wake+0x5c/0x7c
        set_irq_wake_real+0x40/0x5c
        irq_set_irq_wake+0x70/0x144
        cros_ec_rtc_suspend+0x38/0x4c
        platform_pm_suspend+0x34/0x60
        dpm_run_callback+0x64/0xcc
        __device_suspend+0x310/0x41c
        dpm_suspend+0xf8/0x298
        dpm_suspend_start+0x84/0xb4
        suspend_devices_and_enter+0xbc/0x620
        pm_suspend+0x210/0x348
        state_store+0xb0/0x108
        kobj_attr_store+0x14/0x24
        sysfs_kf_write+0x4c/0x64
        kernfs_fop_write+0x15c/0x1fc
        __vfs_write+0x54/0x18c
        vfs_write+0xe4/0x1a4
        ksys_write+0x7c/0xe4
        __arm64_sys_write+0x20/0x2c
        el0_svc_common+0xa8/0x160
        el0_svc_handler+0x7c/0x98
        el0_svc+0x8/0xc

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&pctrl->lock);
                                lock(&irq_desc_lock_class);
                                lock(&pctrl->lock);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

 7 locks held by cat/3083:
  #0: ffffff81f06d1420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4
  #1: ffffff81c8935680 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc
  #2: ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc
  #3: ffffffe89a641d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348
  #4: ffffff81f190e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c
  #5: ffffff81f183d8c0 (lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
  #6: ffffff81f4880c18 (&pctrl->lock){-.-.}, at: msm_gpio_irq_set_wake+0x48/0x7c

 stack backtrace:
 CPU: 4 PID: 3083 Comm: cat Tainted: G        W         5.4.11 #2
 Hardware name: Google Cheza (rev3+) (DT)
 Call trace:
  dump_backtrace+0x0/0x174
  show_stack+0x20/0x2c
  dump_stack+0xc8/0x124
  print_circular_bug+0x2ac/0x2c4
  check_noncircular+0x1a0/0x1a8
  __lock_acquire+0xeb4/0x2388
  lock_acquire+0x1cc/0x210
  _raw_spin_lock_irqsave+0x64/0x80
  __irq_get_desc_lock+0x64/0x94
  irq_set_irq_wake+0x40/0x144
  msm_gpio_irq_set_wake+0x5c/0x7c
  set_irq_wake_real+0x40/0x5c
  irq_set_irq_wake+0x70/0x144
  cros_ec_rtc_suspend+0x38/0x4c
  platform_pm_suspend+0x34/0x60
  dpm_run_callback+0x64/0xcc
  __device_suspend+0x310/0x41c
  dpm_suspend+0xf8/0x298
  dpm_suspend_start+0x84/0xb4
  suspend_devices_and_enter+0xbc/0x620
  pm_suspend+0x210/0x348
  state_store+0xb0/0x108
  kobj_attr_store+0x14/0x24
  sysfs_kf_write+0x4c/0x64
  kernfs_fop_write+0x15c/0x1fc
  __vfs_write+0x54/0x18c
  vfs_write+0xe4/0x1a4
  ksys_write+0x7c/0xe4
  __arm64_sys_write+0x20/0x2c
  el0_svc_common+0xa8/0x160
  el0_svc_handler+0x7c/0x98
  el0_svc+0x8/0xc

Fixes: 6aced33 ("pinctrl: msm: drop wake_irqs bitmap")
Cc: Douglas Anderson <[email protected]>
Cc: Brian Masney <[email protected]>
Cc: Lina Iyer <[email protected]>
Cc: Maulik Shah <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
hgn pushed a commit that referenced this pull request Jan 29, 2020
We need to initialise the struct ourselves, else we expose tcp-specific
callbacks such as tcp_splice_read which will then trigger splat because
the socket is an mptcp one:

BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478

CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
Call Trace:
 tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
 do_splice+0x1259/0x1560 fs/splice.c:1205

To prevent build error with ipv6, add the recv/sendmsg function
declaration to ipv6.h.  The functions are already accessible "thanks"
to retpoline related work, but they are currently only made visible
by socket.c specific INDIRECT_CALLABLE macros.

Reported-by: Christoph Paasch <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
hgn pushed a commit that referenced this pull request Jan 29, 2020
Ido Schimmel says:

====================
mlxsw: Offload TBF

Petr says:

In order to allow configuration of shapers on Spectrum family of
machines, recognize TBF either as root Qdisc, or as a child of ETS or
PRIO. Configure rate of maximum shaper according to TBF rate setting,
and maximum shaper burst size according to TBF burst setting.

- Patches #1 and #2 make the TBF shaper suitable for offloading.
- Patches #3, #4 and #5 are refactoring aimed at easier support of leaf
  Qdiscs in general.
- Patches #6 to torvalds#10 gradually introduce TBF offload.
- Patches torvalds#11 to torvalds#14 add selftests.
====================

Signed-off-by: David S. Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants