Skip to content

Commit a459eeb

Browse files
Michal Hockotorvalds
Michal Hocko
authored andcommitted
mm, page_alloc: do not depend on cpu hotplug locks inside the allocator
Dmitry has reported the following lockdep splat lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 __mutex_lock_common kernel/locking/mutex.c:521 [inline] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 __alloc_percpu+0x24/0x30 mm/percpu.c:1075 smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44 cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136 cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493 _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 cpu_up+0x18/0x20 kernel/cpu.c:1095 smp_init+0xe9/0xee kernel/smp.c:564 kernel_init_freeable+0x439/0x690 init/main.c:1010 kernel_init+0x13/0x180 init/main.c:941 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock pcpu_alloc pcpu_alloc_mutex get_online_cpus+0x62/0x90 kernel/cpu.c:248 drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778 __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 __alloc_pages include/linux/gfp.h:426 [inline] __alloc_pages_node include/linux/gfp.h:439 [inline] alloc_pages_node include/linux/gfp.h:453 [inline] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 find_and_alloc_map kernel/bpf/syscall.c:34 [inline] map_create kernel/bpf/syscall.c:188 [inline] SYSC_bpf kernel/bpf/syscall.c:870 [inline] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 entry_SYSCALL_64_fastpath+0x1f/0xc2 pcpu_alloc pcpu_alloc_mutex drain_all_pages get_online_cpus cpu_hotplug.lock cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304 _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 cpu_up+0x18/0x20 kernel/cpu.c:1095 smp_init+0xe9/0xee kernel/smp.c:564 kernel_init_freeable+0x439/0x690 init/main.c:1010 kernel_init+0x13/0x180 init/main.c:941 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock Pulling cpu hotplug locks inside the page allocator is just too dangerous. Let's remove the dependency by dropping get_online_cpus() from drain_all_pages. This is not so simple though because now we do not have a protection against cpu hotplug which means 2 things: - the work item might be executed on a different cpu in worker from unbound pool so it doesn't run on pinned on the cpu - we have to make sure that we do not race with page_alloc_cpu_dead calling drain_pages_zone Disabling preemption in drain_local_pages_wq will solve the first problem drain_local_pages will determine its local CPU from the WQ context which will be stable after that point, page_alloc_cpu_dead is pinned to the CPU already. The later condition is achieved by disabling IRQs in drain_pages_zone. Fixes: mm, page_alloc: drain per-cpu pages from workqueue context Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 0ccce3b commit a459eeb

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

mm/page_alloc.c

+9-6
Original file line numberDiff line numberDiff line change
@@ -2341,7 +2341,16 @@ void drain_local_pages(struct zone *zone)
23412341

23422342
static void drain_local_pages_wq(struct work_struct *work)
23432343
{
2344+
/*
2345+
* drain_all_pages doesn't use proper cpu hotplug protection so
2346+
* we can race with cpu offline when the WQ can move this from
2347+
* a cpu pinned worker to an unbound one. We can operate on a different
2348+
* cpu which is allright but we also have to make sure to not move to
2349+
* a different one.
2350+
*/
2351+
preempt_disable();
23442352
drain_local_pages(NULL);
2353+
preempt_enable();
23452354
}
23462355

23472356
/*
@@ -2366,11 +2375,6 @@ void drain_all_pages(struct zone *zone)
23662375
if (current->flags & PF_WQ_WORKER)
23672376
return;
23682377

2369-
/*
2370-
* As this can be called from reclaim context, do not reenter reclaim.
2371-
* An allocation failure can be handled, it's simply slower
2372-
*/
2373-
get_online_cpus();
23742378
works = alloc_percpu_gfp(struct work_struct, GFP_ATOMIC);
23752379

23762380
/*
@@ -2421,7 +2425,6 @@ void drain_all_pages(struct zone *zone)
24212425
flush_work(&work);
24222426
}
24232427
}
2424-
put_online_cpus();
24252428
}
24262429

24272430
#ifdef CONFIG_HIBERNATION

0 commit comments

Comments
 (0)