Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker_threads consuming so much memory and crash #32265

Closed
khoanguyen-3fc opened this issue Mar 14, 2020 · 12 comments
Closed

worker_threads consuming so much memory and crash #32265

khoanguyen-3fc opened this issue Mar 14, 2020 · 12 comments

Comments

@khoanguyen-3fc
Copy link

What steps will reproduce the bug?

const { Worker, isMainThread } = require('worker_threads');

if (isMainThread) {
    for (let i = 0; i < 2000; i++) {
        new Worker(__filename);
    }
} else {
    console.log(JSON.stringify(process.memoryUsage()));

    setInterval(() => {
        // Keep thread alive
    }, 1000);
}

How often does it reproduce? Is there a required condition?

This problem always occur.

What is the expected behavior?

I have to run at least 2000 worker thread at the same time.

What do you see instead?

The script crash with random GC error.

Additional information

I need to run at least 2000 thread at the same time, but there are 2 problem that I encounter:

  • The worker_thread are consuming so much memory, about 5MB in RSS for an empty thread, so I end up with 1500 threads and about 8GB RAM, and cost some more if the thread do something, but it wasn't the real problem, because my server have a large amount of RAM (>100GB)
  • The main problem is the script would crash at about 8GB RSS, I'd also try with --max-old-space-size=81920 --max-semi-space-size=81920, but the error still there when RSS reach 8GB

Output of script

// 1486 lines, 1487th line bellow
{"rss":8157556736,"heapTotal":4190208,"heapUsed":2382936,"external":802056}

<--- Last few GCs --->
[19127:0x7f5f4442fa80]    26606 ms: Scavenge 2.0 (2.7) -> 1.6 (3.7) MB, 1.8 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
{"rss":8158093312,"heapTotal":4190208,"heapUsed":2388408,"external":802056}
 1: 0x9ef190 node::Abort() [node]

<--- Last few GCs --->
[19127:0x7f5fa442fb40]    24675 ms: Scavenge 2.0 (2.7) -> 1.6 (3.7) MB, 1.9 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
{"rss":8158359552,"heapTotal":4190208,"heapUsed":2383584,"external":802056}
{"rss":8158359552,"heapTotal":4190208,"heapUsed":2375648,"external":802056}
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]

<--- Last few GCs --->
[19127:0x7f5fb842fb20]    27482 ms: Scavenge 2.0 (2.7) -> 1.6 (3.7) MB, 2.0 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
 1: 0x9ef190 node::Abort() [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 1: 0x9ef190 node::Abort() [node]
{"rss":8158846976,"heapTotal":4190208,"heapUsed":2375568,"external":802056}
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]

<--- Last few GCs --->
[19127:0x7f5fac42fb20]    27489 ms: Scavenge 2.0 (2.7) -> 1.6 (3.7) MB, 2.0 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
{"rss":8160190464,"heapTotal":4190208,"heapUsed":2390760,"external":802056}
{"rss":8160190464,"heapTotal":4190208,"heapUsed":2385720,"external":802056}
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd0a765  [node]
 1: 0x9ef190 node::Abort() [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 6: 0xd545ee  [node]

<--- Last few GCs --->
[19127:0x7f60b842fe20]    29692 ms: Scavenge 2.0 (2.7) -> 1.5 (3.7) MB, 1.8 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
{"rss":8161587200,"heapTotal":3928064,"heapUsed":2385144,"external":802056}
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd0a765  [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]

<--- Last few GCs --->
[19127:0x7f5ef4165610]    26880 ms: Scavenge 2.0 (2.7) -> 1.5 (3.7) MB, 1.8 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->
Cannot get stack trace in GC.
FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory
 1: 0x9ef190 node::Abort() [node]
 5: 0xd0a765  [node]
 6: 0xd545ee  [node]
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
{"rss":8161710080,"heapTotal":3928064,"heapUsed":2373360,"external":802056}
 1: 0x9ef190 node::Abort() [node]
 6: 0xd545ee  [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 5: 0xd0a765  [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 6: 0xd545ee  [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
11: 0xd1afcc v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 5: 0xd0a765  [node]
{"rss":8161210368,"heapTotal":3665920,"heapUsed":2381024,"external":802056}
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
{"rss":8161210368,"heapTotal":3665920,"heapUsed":2388704,"external":802056}
12: 0xce7cae v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
{"rss":8161210368,"heapTotal":3665920,"heapUsed":2360896,"external":802056}
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
 6: 0xd545ee  [node]
 5: 0xd0a765  [node]
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
11: 0xd1afcc v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
13: 0xede9db v8::internal::Map::RawCopy(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int, int) [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 6: 0xd545ee  [node]
{"rss":8161210368,"heapTotal":3665920,"heapUsed":2365384,"external":802056}
11: 0xd1afcc v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
{"rss":8160485376,"heapTotal":3403776,"heapUsed":2389984,"external":802056}
12: 0xce7cae v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
{"rss":8160489472,"heapTotal":3403776,"heapUsed":2389456,"external":802056}
{"rss":8160489472,"heapTotal":3403776,"heapUsed":2397112,"external":802056}
14: 0xedf104 v8::internal::Map::CopyDropDescriptors(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>) [node]

<--- Last few GCs --->

[19127:0x7f60a8001010]    47768 ms: Scavenge 2.4 (4.2) -> 2.1 (4.0) MB, 1.6 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure 


<--- JS stacktrace --->

FATAL ERROR: Committing semi space failed. Allocation failed - JavaScript heap out of memory
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
 7: 0xd58797 v8::internal::MarkCompactCollector::CollectGarbage() [node]
12: 0xce7cae v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
13: 0xede9db v8::internal::Map::RawCopy(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int, int) [node]
{"rss":8150188032,"heapTotal":3403776,"heapUsed":2413400,"external":802056}
15: 0xedf1a6 v8::internal::Map::ShareDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Descriptor*) [node]
 1: 0x9ef190 node::Abort() [node]
11: 0xd1afcc v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xd16c39 v8::internal::Heap::MarkCompact() [node]
13: 0xede9db v8::internal::Map::RawCopy(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int, int) [node]
14: 0xedf104 v8::internal::Map::CopyDropDescriptors(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>) [node]
16: 0xedfcae v8::internal::Map::CopyAddDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [node]
 2: 0x9f13b2 node::OnFatalError(char const*, char const*) [node]
12: 0xce7cae v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
13: 0xede9db v8::internal::Map::RawCopy(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, int, int) [node]
 9: 0xd179a3 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
14: 0xedf104 v8::internal::Map::CopyDropDescriptors(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>) [node]
15: 0xedf1a6 v8::internal::Map::ShareDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Descriptor*) [node]
17: 0xedfe29 v8::internal::Map::CopyWithField(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [node]
 3: 0xb5da9e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb5de19 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
14: 0xedf104 v8::internal::Map::CopyDropDescriptors(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>) [node]
15: 0xedf1a6 v8::internal::Map::ShareDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Descriptor*) [node]
16: 0xedfcae v8::internal::Map::CopyAddDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [node]
16: 0xedfcae v8::internal::Map::CopyAddDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [node]
18: 0xee15f2 v8::internal::Map::TransitionToDataProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::StoreOrigin) [node]
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 5: 0xd0a765  [node]
10: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
15: 0xedf1a6 v8::internal::Map::ShareDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::DescriptorArray>, v8::internal::Descriptor*) [node]
17: 0xedfe29 v8::internal::Map::CopyWithField(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [node]
17: 0xedfe29 v8::internal::Map::CopyWithField(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [node]
19: 0xed1cdf v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::StoreOrigin) [node]
11: 0xd1a959 v8::internal::Heap::ReserveSpace(std::vector<v8::internal::Heap::Chunk, std::allocator<v8::internal::Heap::Chunk> >*, std::vector<unsigned long, std::allocator<unsigned long> >*) [node]
 6: 0xd182ee v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
11: 0xd1afcc v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
16: 0xedfcae v8::internal::Map::CopyAddDescriptor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Descriptor*, v8::internal::TransitionFlag) [node]
18: 0xee15f2 v8::internal::Map::TransitionToDataProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::StoreOrigin) [node]
18: 0xee15f2 v8::internal::Map::TransitionToDataProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::StoreOrigin) [node]
19: 0xed1cdf v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::StoreOrigin) [node]
20: 0xf05566 v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
 7: 0xd18515 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
12: 0xce7cae v8::internal::Factory::NewMap(v8::internal::InstanceType, int, v8::internal::ElementsKind, int) [node]
17: 0xedfe29 v8::internal::Map::CopyWithField(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::FieldType>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::Representation, v8::internal::TransitionFlag) [node]
19: 0xed1cdf v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::StoreOrigin) [node]
20: 0xf05566 v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
21: 0xeb08e0 v8::internal::JSObject::DefineOwnPropertyIgnoreAttributes(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::JSObject::AccessorInfoHandling) [node]
22: 0xeb0bec v8::internal::JSObject::SetOwnPropertyIgnoreAttributes(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
23: 0x10283ef  [node]
24: 0x102c399  [node]
25: 0x102d303 v8::internal::Runtime_CreateObjectLiteral(int, unsigned long*, v8::internal::Isolate*) [node]
26: 0x13a71b9  [node]
@gireeshpunathil
Copy link
Member

able to recreate. the report data shows this:

  "javascriptHeap": {
    "totalMemory": 4059136,
    "totalCommittedMemory": 3299464,
    "usedMemory": 2861680,
    "availableMemory": 104855004168,
    "memoryLimit": 104857600000,
    "heapSpaces": {
      "read_only_space": {
        "memorySize": 262144,
        "committedMemory": 33328,
        "capacity": 33040,
        "used": 33040,
        "available": 0
      },
      "new_space": {
        "memorySize": 1048576,
        "committedMemory": 1047944,
        "capacity": 1047424,
        "used": 633768,
        "available": 413656
      },
      "old_space": {
        "memorySize": 1654784,
        "committedMemory": 1602320,
        "capacity": 1602528,
        "used": 1600304,
        "available": 2224
      },
      "code_space": {
        "memorySize": 430080,
        "committedMemory": 170720,
        "capacity": 154336,
        "used": 154336,
        "available": 0
      },
      "map_space": {
        "memorySize": 528384,
        "committedMemory": 309984,
        "capacity": 309120,
        "used": 309120,
        "available": 0
      },
      "large_object_space": {
        "memorySize": 135168,
        "committedMemory": 135168,
        "capacity": 131112,
        "used": 131112,
        "available": 0
      },
      "code_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 0,
        "used": 0,
        "available": 0
      },
      "new_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 1047424,
        "used": 0,
        "available": 1047424
      }
    }
  }

and top (just before the crash):

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                            
 10138 root      20   0  0.319t 7.563g  13644 S 350.8  0.8   1:46.31 node                               

@gireeshpunathil
Copy link
Member

there are many spaces seen as exhausted - such as code_space and map_space. how do I increase those? I am not sure which flags in node --v8-options to use

@nodejs/v8

@addaleax
Copy link
Member

I think the big hint there might actually be VIRT reporting as 0.319t – maybe the process is running out of virtual memory? (That would be somewhat related to #25933)

@gireeshpunathil
Copy link
Member

but I have unlimited virtual memory :

virtual memory (kbytes, -v) unlimited

plus the failing stack in the referenced issue has node::NewIsolate in it, which is not the case here, looks like we are doing gc?

@oh-frontend1 - what is your ulimit -v showing up?

@khoanguyen-3fc
Copy link
Author

khoanguyen-3fc commented Mar 16, 2020

@gireeshpunathil

> ulimit -v
unlimited

and my report.json

  "javascriptHeap": {
    "totalMemory": 4452352,
    "totalCommittedMemory": 3517904,
    "usedMemory": 1448464,
    "availableMemory": 85947560576,
    "memoryLimit": 85949677568,
    "heapSpaces": {
      "read_only_space": {
        "memorySize": 262144,
        "committedMemory": 33088,
        "capacity": 32808,
        "used": 32808,
        "available": 0
      },
      "new_space": {
        "memorySize": 2097152,
        "committedMemory": 1683416,
        "capacity": 1047456,
        "used": 188368,
        "available": 859088
      },
      "old_space": {
        "memorySize": 1396736,
        "committedMemory": 1368440,
        "capacity": 1064504,
        "used": 897832,
        "available": 166672
      },
      "code_space": {
        "memorySize": 430080,
        "committedMemory": 170400,
        "capacity": 154016,
        "used": 154016,
        "available": 0
      },
      "map_space": {
        "memorySize": 266240,
        "committedMemory": 262560,
        "capacity": 175440,
        "used": 175440,
        "available": 0
      },
      "large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 0,
        "used": 0,
        "available": 0
      },
      "code_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 0,
        "used": 0,
        "available": 0
      },
      "new_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 1047456,
        "used": 0,
        "available": 1047456
      }
    }
  },

@gireeshpunathil
Copy link
Member

thanks @oh-frontend1 - so our failing contexts seem to match. Let me see if I can figure out what caused the gc to fail

@gireeshpunathil
Copy link
Member

$ grep "ENOMEM" strace.txt | grep "mmap" | wc -l
1372184

there are several mmap calls that fail. Looking at the manual, the second probable reason stated is exhaustion of process' mappings.

$ sysctl vm.max_map_count
vm.max_map_count = 65530
$ sysctl -w vm.max_map_count=655300
vm.max_map_count = 655300

$ node --max-heap-size=100000 foo

{"rss":11898519552,"heapTotal":62894080,"heapUsed":32157136,"external":940898,"arrayBuffers":9386}
{"rss":14497640448,"heapTotal":62894080,"heapUsed":32192184,"external":940938,"arrayBuffers":9386}
{"rss":15572897792,"heapTotal":62894080,"heapUsed":32200208,"external":940938,"arrayBuffers":9386}
{"rss":16316686336,"heapTotal":62894080,"heapUsed":32203928,"external":940938,"arrayBuffers":9386}
...

$ top

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                            
 25537 root      20   0  0.575t 0.014t  13636 S 209.0  1.4   2:25.30 node     

so by increasing the mapping count, I am able to create 4K threads, and consume upto 05.t of virtual and 15G of rss. So looks like adjusting maximum mapping count in the kernel is the solution for this. @oh-frontend1 - can you pls verify?

@khoanguyen-3fc
Copy link
Author

@gireeshpunathil thank you, this solution also work on my real code.

@jasnell
Copy link
Member

jasnell commented Mar 25, 2020

@oh-frontend1 ... I was wondering if you wouldn't mind expanding on the reason why you need a worker thread pool of several thousand workers. What is the scenario / app case you're exploring here. The reason I'm asking is that we (NearForm) are doing some investigation into worker thread perf diagnostics and the dynamic of profiling small worker pools (4-50) range is much different than profiling pools in the 2k-4k range and we'd like to understand the use case a bit more.

@khoanguyen-3fc
Copy link
Author

@jasnell the application is confidential.

So, nothing much, in future, I have to monitor a large number of IoT device, having 500 network IO on same thread causing a large bottle neck on CPU, but split to child_process is hard to manage and communicate to master, so I decide using worker_thread.

And a simple case is one IO per thread, if I cannot resolve this problem, I would decide to increase number of IO per thread, but it will increase code complexity.

In this real application, as I benchmark, I can only create about ~200 threads and this error happened, so I would create a minimal source code to reproduce (and in this case, number of threads reached 1k5, before the error occurred)

@jasnell
Copy link
Member

jasnell commented Mar 27, 2020

Ok thank you! That is super helpful information @oh-frontend1!

@puzpuzpuz
Copy link
Member

@oh-frontend1 what do you mean by "500 network IO"? Is it 500 client connections? If that's true and your application is I/O bound, Node should be able to handle much more than that. In most cases, you just need to follow the golden rule of Node (don't block the event loop).

And if it's CPU-bound, then it's better to keep the number of worker threads close to number of CPU cores and queue tasks when all members are busy (just like ThreadPoolExecutor in Java does it). Otherwise, if you run CPU-bound tasks on a large number of threads, you will be wasting memory (due to the footprint per each worker thread) and CPU time spent on context switching on OS level.

Sorry in the advance, if I misunderstood your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants