Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory #2698

Closed
drouarb opened this issue May 13, 2020 · 6 comments
Labels

Comments

@drouarb
Copy link

drouarb commented May 13, 2020

  • Node.js Version:
    v14.2.0 (also tried in v13.8.0)
  • OS:
    Ubuntu 18.04 Server
  • Scope (install, code, runtime, meta, other?):
    Runtime
  • Module (and version) (if relevant):
    Typescript, sequelize, node-pg (all lastest)

I'm running a script, doing heavy data-processing in worker-threads, it runs until node crash with the following error:

FATAL ERROR: NewSpace::Rebalance Allocation failed - JavaScript heap out of memory

Writing Node.js report to file: report.20200512.172528.47517.24.011.json
Node.js report completed
 1: 0xa295e0 node::Abort() [node]
 2: 0x9782df node::FatalError(char const*, char const*) [node]
 3: 0xb99c2e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb99fa7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd3a3b5  [node]
 6: 0xd74f27  [node]
 7: 0xd84707 v8::internal::MarkCompactCollector::CollectGarbage() [node]
 8: 0xd481b9 v8::internal::Heap::MarkCompact() [node]
 9: 0xd48f0b v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
10: 0xd499a5 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
11: 0xd4aebf v8::internal::Heap::HandleGCRequest() [node]
12: 0xcf5f97 v8::internal::StackGuard::HandleInterrupts() [node]
13: 0x104b803 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
14: 0x13a5a99  [node]
Aborted (core dumped)

This error appears randomly after few hours of processing.
The worker are on the following architecture:

Main thread (idle) => Coordination Thread (Dispatch Job) => Worker Thread
                                                         => Worker Thread
                                                         => Worker Thread
                                                         => ... (32x)

Here is the end of my log: https://gist.github.com/drouarb/28f3b411a3088bc85bf82e65d0def25d

>>>>> Task Memory
>>>>> rss: 19209M
>>>>> heapTotal: 80M
>>>>> heapUsed: 78M
>>>>> external: 5M
>>>>> arrayBuffers: 4M

Is a log I added in a setInterval in the main thread.

==================================================
rss: 18572M
heapTotal: 758M
heapUsed: 725M
external: 196M
arrayBuffers: 195M
=====================After GC=====================
rss: 18628M
heapTotal: 274M
heapUsed: 88M
external: 196M
arrayBuffers: 195M
==================================================

Is a log of the memory before and after GC in Workers thread.

I added the following flags to node: --expose-gc --report-on-fatalerror --report-uncaught-exception --report-on-signal, that allow me to have a report once node crash.

Here is a diagnostic generated with a SIGUSR2 while running:
https://gist.github.com/drouarb/2f919f5152eb44dc569ea1ce6bf4accf

Here is the diagnostic when node crashed:
https://gist.github.com/drouarb/875ac90e074fca3b9b029be5633f1e09
I Converted all sizes to human size.

It appears that my workers disappeared when node generated the last crash.

My workers thread are sending data to the coordination thread with ArrayBuffer which are transfered, could it be a source of the memory leak ?

On crash we see that old_space memorySize have grown, but not the capacity.
Also new_space=>memorySize jumped form 1.05 MB to 33.6 MB, is it normal ?
The main thread, there is nothing happening except the display of a process.memoryUsage()

If it is a worker going out of memory, why don't I have a ERR_WORKER_OUT_OF_MEMORY ?

EDIT 1:
After looking at the report, it appears that the log comes from thread 24 ("threadId": 24 in the Diagnostic Report no 2).
EDIT 2:
I'm running my app again, and logging every 5seconds node Diagnostic Reports to grafana. I hope I will find something intersting.

@chhu
Copy link

chhu commented May 15, 2021

I have a similar setup and a similar problem. Up to 64 workers hammering a pg db with huge await loops. Set all possible mem adjustments to 256GB, with no luck. When the processes crash (with almost identical error) they just use around 36GB (all). Machine has 512GB.

Solved (for me): My mistake was when I got OOM I blindly increased all limits, ignoring the mechanism for heap alloc. NewSpace or YoungGen should not be increased. Newly allocated chunks just blow the mem available.

@drouarb
Copy link
Author

drouarb commented May 16, 2021

Yes I found a solution, actually that was not a NodeJS bug, it was a problem with the OS, in the virtual memory mapping of linux there is a maximum memory allocation, you can increase it with
sysctl -w vm.max_map_count=655300
Source: nodejs/node#32265

@co2nut
Copy link

co2nut commented Apr 22, 2022

@drouarb
i was wondering is it sufficient to just apply with this changes?
is there any other things to be taken care of ?

@drouarb
Copy link
Author

drouarb commented Apr 23, 2022

For me, this fix worked, this was not a pure nodejs error, but an error related to v8 failling to allocating more memory due to the kernel virtual memory map being full. This command will tell your kernel to increase the kernel virtual memory map and allow more malloc()

Copy link

It seems there has been no activity on this issue for a while, and it is being closed in 30 days. If you believe this issue should remain open, please leave a comment.
If you need further assistance or have questions, you can also search for similar issues on Stack Overflow.
Make sure to look at the README file for the most updated links.

@github-actions github-actions bot added the stale label May 13, 2024
Copy link

It seems there has been no activity on this issue for a while, and it is being closed. If you believe this issue should remain open, please leave a comment.
If you need further assistance or have questions, you can also search for similar issues on Stack Overflow.
Make sure to look at the README file for the most updated links.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants