Memory leak within `rayon_core::registry::ThreadSpawn` #870

Hywan · 2021-07-05T14:51:21Z

Hello,

I'm maintaining the https://github.com/wasmerio/wasmer project. Two of our users have reported memory leaks that seem to come from Rayon. I'm quoting the example from @chenyukang at wasmerio/wasmer#2404 (comment) which doesn't involve Wasmer at all, it's purely Rayon and it illustrates the memory leak:

struct Demo {
    count: i32,
}

impl Demo {
    pub fn new() -> Self {
        Self { count: 0 }
    }

    pub fn add(&mut self, v: i32) -> i32 {
        self.count = self.count + v as i32;
        self.count
    }
}

fn run_rayon() {
    let input: Vec<i32> = (1..1100).collect();
    let res: i32 = input
        .par_iter()
        .map_init(Demo::new, |demo, &v| demo.add(v as i32))
        .sum();
    println!("res: {}", res);
}

fn main() {
    run_rayon();
}

Here is the Valgrind report:

==28104== Memcheck, a memory error detector
==28104== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==28104== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==28104== Command: ./target/debug/rayon-debug
==28104==
res: 5477109
==28104==
==28104== HEAP SUMMARY:
==28104==     in use at exit: 53,128 bytes in 150 blocks
==28104==   total heap usage: 227 allocs, 77 frees, 80,701 bytes allocated
==28104==
==28104== 2,304 bytes in 8 blocks are possibly lost in loss record 20 of 24
==28104==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==28104==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==28104==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==28104==    by 0x4879322: allocate_stack (allocatestack.c:622)
==28104==    by 0x4879322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==28104==    by 0x1BD774: std::sys::unix::thread::Thread::new (thread.rs:50)
==28104==    by 0x179AB0: std::thread::Builder::spawn_unchecked (mod.rs:498)
==28104==    by 0x17B53C: std::thread::Builder::spawn (mod.rs:381)
==28104==    by 0x12A689: <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (registry.rs:100)
==28104==    by 0x12B4D7: rayon_core::registry::Registry::new (registry.rs:256)
==28104==    by 0x12A976: rayon_core::registry::global_registry::{{closure}} (registry.rs:168)
==28104==    by 0x12AB87: rayon_core::registry::set_global_registry::{{closure}} (registry.rs:195)
==28104==    by 0x143FAC: std::sync::once::Once::call_once::{{closure}} (once.rs:261)
==28104==    by 0x11DC47: std::sync::once::Once::call_inner (once.rs:418)
==28104==
==28104== LEAK SUMMARY:
==28104==    definitely lost: 0 bytes in 0 blocks
==28104==    indirectly lost: 0 bytes in 0 blocks
==28104==      possibly lost: 2,304 bytes in 8 blocks
==28104==    still reachable: 50,824 bytes in 142 blocks
==28104==         suppressed: 0 bytes in 0 blocks
==28104== Reachable blocks (those to which a pointer was found) are not shown.
==28104== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==28104==
==28104== For lists of detected and suppressed errors, rerun with: -s
==28104== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Thoughts?

The text was updated successfully, but these errors were encountered:

cuviper · 2021-07-07T23:29:33Z

I suspect this is simply due to the fact that we never shutdown the global thread pool, by design. See also #688.

That "possibly lost" record is just an allocation for thread-local storage, and I suspect the rest will be related to the thread pool's work queues. It seems valgrind is already filtering out the thread stacks, because those would also be 2MB each by default.

All of that should be bounded though. If you find some memory use that increases over time, that may be evidence of a real leak.

chenyukang · 2021-07-08T14:22:04Z

The leak size will not grow, lost memory size is always:
2,304 bytes in 8 blocks are possibly lost in loss record 20 of 2

I guess you are right, if we create some thread in Rust, and don't wait them, there will also this kind of memory leak:

fn run_thread() {
    use std::thread;
    let builder = thread::Builder::new();

    let handler = builder
        .spawn(|| {
            // thread code
            println!("hello");
        })
        .unwrap();

    //handler.join().unwrap();
}

fn main() {
    run_thread();
}

The result is:

==16491== HEAP SUMMARY:
==16491==     in use at exit: 1,472 bytes in 7 blocks
==16491==   total heap usage: 21 allocs, 14 frees, 3,616 bytes allocated
==16491==
==16491== 288 bytes in 1 blocks are possibly lost in loss record 6 of 7
==16491==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==16491==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==16491==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==16491==    by 0x4879322: allocate_stack (allocatestack.c:622)
==16491==    by 0x4879322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==16491==    by 0x12E0C4: std::sys::unix::thread::Thread::new (thread.rs:50)
==16491==    by 0x10FC11: std::thread::Builder::spawn_unchecked (mod.rs:498)
==16491==    by 0x11034B: std::thread::Builder::spawn (mod.rs:381)
==16491==    by 0x1106C0: rayon_demo::run_thread (main.rs:31)
==16491==    by 0x110735: rayon_demo::main (main.rs:42)
==16491==    by 0x11086A: core::ops::function::FnOnce::call_once (function.rs:227)
==16491==    by 0x110C5D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==16491==    by 0x110B70: std::rt::lang_start::{{closure}} (rt.rs:66)
==16491==    by 0x12C489: call_once<(),Fn<()>> (function.rs:259)
==16491==    by 0x12C489: do_call<&Fn<()>,i32> (panicking.rs:379)
==16491==    by 0x12C489: try<i32,&Fn<()>> (panicking.rs:343)
==16491==    by 0x12C489: catch_unwind<&Fn<()>,i32> (panic.rs:431)
==16491==    by 0x12C489: std::rt::lang_start_internal (rt.rs:51)
==16491==
==16491== LEAK SUMMARY:
==16491==    definitely lost: 0 bytes in 0 blocks
==16491==    indirectly lost: 0 bytes in 0 blocks
==16491==      possibly lost: 288 bytes in 1 blocks
==16491==    still reachable: 1,184 bytes in 6 blocks
==16491==         suppressed: 0 bytes in 0 blocks
==16491== Reachable blocks (those to which a pointer was found) are not shown.
==16491== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==16491==
==16491== For lists of detected and suppressed errors, rerun with: -s
==16491== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

cuviper · 2021-07-08T21:13:31Z

Ok -- you'd have to talk to glibc and/or valgrind folks about recognizing that particular allocation, as it's not under Rust's control, let alone rayon. Either way, I don't think there's anything to be concerned about here, so I'm going to close. Feel free to let us know if you find something else!

Hywan mentioned this issue Jul 8, 2021

Memory leak when using Rayon wasmerio/wasmer#2404

Closed

cuviper closed this as completed Jul 8, 2021

djkoloski mentioned this issue Jun 20, 2022

Memory leak after deserializing in multiple threads rkyv/rkyv#277

Closed

Anders429 mentioned this issue Aug 19, 2022

Unit test the bulk of the public API. Anders429/brood#88

Merged

poszu mentioned this issue Apr 28, 2023

atxs: possible memory leak on atxs/post code path spacemeshos/go-spacemesh#4296

Closed

hishamhm mentioned this issue Sep 13, 2023

multiple wasmer processes created? wasmerio/wasmer#4113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak within `rayon_core::registry::ThreadSpawn` #870

Memory leak within `rayon_core::registry::ThreadSpawn` #870

Hywan commented Jul 5, 2021

cuviper commented Jul 7, 2021

chenyukang commented Jul 8, 2021

cuviper commented Jul 8, 2021

Memory leak within rayon_core::registry::ThreadSpawn #870

Memory leak within rayon_core::registry::ThreadSpawn #870

Comments

Hywan commented Jul 5, 2021

cuviper commented Jul 7, 2021

chenyukang commented Jul 8, 2021

cuviper commented Jul 8, 2021

Memory leak within `rayon_core::registry::ThreadSpawn` #870

Memory leak within `rayon_core::registry::ThreadSpawn` #870