Skip to content

Commit 99ad210

Browse files
[Mellanox] backport kernel patches for hw-management 7.0100.2303 (sonic-net#211)
0028-thermal-Fix-deadlock-in-thermal-thermal_zone_device_.patch torvalds/linux@163b00c 0029-hwmon-pmbus-Add-support-for-MPS-Multi-phase-mp2975-c.patch torvalds/linux@2c6fcbb 0030-hwmon-Add-convience-macro-to-define-simple-static-se.patch torvalds/linux@c43a113 0031-backport-nvme-Add-hardware-monitoring-support.patch torvalds/linux@400b6a7 0032-platform-mellanox-mlxreg-hotplug-Use-capability-regi.patch torvalds/linux@0a43f7b regression tests have been performed against these patches on the Mellanox platform, no issue found.
1 parent deddc61 commit 99ad210

7 files changed

+1428
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
From 163b00cde7cf2206e248789d2780121ad5e6a70b Mon Sep 17 00:00:00 2001
2+
From: Wei Wang <[email protected]>
3+
Date: Tue, 12 Nov 2019 12:42:23 -0800
4+
Subject: [PATCH] thermal: Fix deadlock in thermal thermal_zone_device_check
5+
6+
1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone
7+
device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid
8+
a use-after-free issue. However, cancel_delayed_work_sync could be called
9+
insides the WQ causing deadlock.
10+
11+
[54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000
12+
[54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check
13+
[54109.642447] c0 1162 Call trace:
14+
[54109.642456] c0 1162 __switch_to+0x138/0x158
15+
[54109.642467] c0 1162 __schedule+0xba4/0x1434
16+
[54109.642480] c0 1162 schedule_timeout+0xa0/0xb28
17+
[54109.642492] c0 1162 wait_for_common+0x138/0x2e8
18+
[54109.642511] c0 1162 flush_work+0x348/0x40c
19+
[54109.642522] c0 1162 __cancel_work_timer+0x180/0x218
20+
[54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4
21+
[54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c
22+
[54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24
23+
[54109.642574] c0 1162 process_one_work+0x3cc/0x69c
24+
[54109.642583] c0 1162 worker_thread+0x49c/0x7c0
25+
[54109.642593] c0 1162 kthread+0x17c/0x1b0
26+
[54109.642602] c0 1162 ret_from_fork+0x10/0x18
27+
[54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000
28+
[54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check
29+
[54109.643077] c0 1162 Call trace:
30+
[54109.643085] c0 1162 __switch_to+0x138/0x158
31+
[54109.643095] c0 1162 __schedule+0xba4/0x1434
32+
[54109.643104] c0 1162 schedule_timeout+0xa0/0xb28
33+
[54109.643114] c0 1162 wait_for_common+0x138/0x2e8
34+
[54109.643122] c0 1162 flush_work+0x348/0x40c
35+
[54109.643131] c0 1162 __cancel_work_timer+0x180/0x218
36+
[54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4
37+
[54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c
38+
[54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24
39+
[54109.643167] c0 1162 process_one_work+0x3cc/0x69c
40+
[54109.643177] c0 1162 worker_thread+0x49c/0x7c0
41+
[54109.643186] c0 1162 kthread+0x17c/0x1b0
42+
[54109.643195] c0 1162 ret_from_fork+0x10/0x18
43+
[54109.644500] c0 1162 cat D 0 7766 1 0x00000001
44+
[54109.644515] c0 1162 Call trace:
45+
[54109.644524] c0 1162 __switch_to+0x138/0x158
46+
[54109.644536] c0 1162 __schedule+0xba4/0x1434
47+
[54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0
48+
[54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0
49+
[54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20
50+
[54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360
51+
[54109.644586] c0 1162 temp_show+0x30/0x78
52+
[54109.644609] c0 1162 dev_attr_show+0x5c/0xf0
53+
[54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4
54+
[54109.644636] c0 1162 kernfs_seq_show+0x48/0x88
55+
[54109.644656] c0 1162 seq_read+0x1f4/0x73c
56+
[54109.644664] c0 1162 kernfs_fop_read+0x84/0x318
57+
[54109.644683] c0 1162 __vfs_read+0x50/0x1bc
58+
[54109.644692] c0 1162 vfs_read+0xa4/0x140
59+
[54109.644701] c0 1162 SyS_read+0xbc/0x144
60+
[54109.644708] c0 1162 el0_svc_naked+0x34/0x38
61+
[54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic]
62+
63+
Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device")
64+
65+
Signed-off-by: Wei Wang <[email protected]>
66+
Signed-off-by: Zhang Rui <[email protected]>
67+
---
68+
drivers/thermal/thermal_core.c | 2 ++--
69+
1 file changed, 1 insertions(+), 1 deletions(-)
70+
71+
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
72+
index 7b0ffc1c0..f5a84db55 100644
73+
--- a/drivers/thermal/thermal_core.c
74+
+++ b/drivers/thermal/thermal_core.c
75+
@@ -296,7 +296,7 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
76+
mod_delayed_work(system_freezable_wq, &tz->poll_queue,
77+
msecs_to_jiffies(delay));
78+
else
79+
- cancel_delayed_work(&tz->poll_queue);
80+
+ cancel_delayed_work_sync(&tz->poll_queue);
81+
}
82+
83+
static void monitor_thermal_zone(struct thermal_zone_device *tz)

0 commit comments

Comments
 (0)