Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream memory leak in 20.11.1 #51936

Closed
kshaa opened this issue Mar 1, 2024 · 9 comments
Closed

Stream memory leak in 20.11.1 #51936

kshaa opened this issue Mar 1, 2024 · 9 comments
Labels
confirmed-bug Issues with confirmed bugs. stream Issues and PRs related to the stream subsystem.

Comments

@kshaa
Copy link

kshaa commented Mar 1, 2024

Version

v20.11.1

Platform

Linux xxx SMP PREEMPT Tue Jan 9 09:56:30 CST 2024 aarch64 aarch64 aarch64 GNU/Linux (Debian-based distribution)

Subsystem

No response

What steps will reproduce the bug?

Using winston with multiple custom transports and a lot of logger instances. Also requires a fairly large volume of logs being produced.

How often does it reproduce? Is there a required condition?

Seems to happen every time we run our code.
Doesn't happen on v20.10.0, this bug requires node > v20.10.0.
We tested and changing the UV_USE_IO_URING value doesn't fix the bug (it seems to trade CPU usage vs RAM leak, but performance has regressed either way).

What is the expected behavior? Why is that the expected behavior?

Memory leak does not happen when logging huge volume of logs. It's expected because it used to be that way before upgrading from v20.10.0 to v20.11.1 (and IO_URING disabling doesn't seem to be at fault).

What do you see instead?

Here's a RAM consumption graph from a staging environment where we reproduced the memory leak.
The first three runs with memory leaks in this graph are when running on stock NodeJS v20.11.1.
The last two runs are run with a custom NodeJS built from v20.11.1 source by reverting commit dd52068.
Screenshot from 2024-03-01 17-47-49
Screenshot from 2024-03-01 17-51-25

Additional information

No response

@marco-ippolito marco-ippolito added the stream Issues and PRs related to the stream subsystem. label Mar 1, 2024
@bienzaaron
Copy link

Did you run on v20.11.0 at all? I've noticed what I believe to be same thing on a very similar service I work on (fastify server + pino logger with file transports configured, also processes very high log volume). I only see issues on v20.11.1 though and not on v20.11.0, which seems odd to me, since the commit you linked landed in v20.11.0.

v20.10.0 performs as expected, like you mentioned.

@mcollina
Copy link
Member

mcollina commented Mar 3, 2024

Thanks for reporting. Can you include a script to reproduce?

@kshaa
Copy link
Author

kshaa commented Mar 4, 2024

@bienzaaron In my case, the memory leak can also be observed in v20.11.0, it really seems like the aforementioned commit from v20.11.0 might be at fault. Here's a screenshot from my memory graphs in Grafana: first part is v20.11.1 with the commit in question reverted, second part is stock v20.11.0 - memory leak present still.
Screenshot from 2024-03-04 09-09-34

@mcollina
Copy link
Member

mcollina commented Mar 4, 2024

Does it crash for an OOM? We really need a script to reproduce.

It can be that the optimization in question unveiled a bug somewhere in Winston.

@kshaa
Copy link
Author

kshaa commented Mar 4, 2024

@mcollina

Does it crash for an OOM?

Yes, it does, I let the code run its course this time and it ends up crashing as follows:

# ./memleak.sh
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 unpipe listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(Use `node20 --trace-warnings ...` to show where the warning was created)
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 unpipe listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 finish listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [File]. Use emitter.setMaxListeners() to increase limit
(node:373378) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 finish listeners added to [File]. Use emitter.setMaxListeners() to increase limit

<--- Last few GCs --->

[373378:0x308ebee0]  1331902 ms: Scavenge 4061.9 (4140.6) -> 4060.6 (4140.8) MB, 5.90 / 0.00 ms  (average mu = 0.703, current mu = 0.013) task; 
[373378:0x308ebee0]  1331959 ms: Scavenge 4062.1 (4140.8) -> 4060.7 (4140.8) MB, 5.48 / 0.00 ms  (average mu = 0.703, current mu = 0.013) task; 
[373378:0x308ebee0]  1332002 ms: Scavenge 4062.2 (4140.8) -> 4060.9 (4145.1) MB, 6.06 / 0.00 ms  (average mu = 0.703, current mu = 0.013) task; 


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xc8640c node::Abort() [node20]
 2: 0xb755c8 void node::FPrintF<>(_IO_FILE*, char const*) [node20]
 3: 0xe8450c v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node20]
 4: 0xe846dc v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node20]
 5: 0x108c3ec  [node20]
 6: 0x10a28c4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node20]
 7: 0x10f36b0 v8::internal::MinorGCJob::Task::RunInternal() [node20]
 8: 0xcf47f8  [node20]
 9: 0xcf72ec node::PerIsolatePlatformData::FlushForegroundTasksInternal() [node20]
10: 0x17c1ad0  [node20]
11: 0x17d5414  [node20]
12: 0x17c288c uv_run [node20]
13: 0xbaeb58 node::SpinEventLoopInternal(node::Environment*) [node20]
14: 0xccc3d0  [node20]
15: 0xcccd00 node::NodeMainInstance::Run() [node20]
16: 0xc4615c node::Start(int, char**) [node20]
17: 0xffff82c20e10 __libc_start_main [/lib/aarch64-linux-gnu/libc.so.6]
18: 0xbacfa0  [node20]
./memleak.sh: line 3: 373378 Aborted                 node20 dist/index.js

For reference memleak.sh is just:

#!/usr/bin/env bash

node20 dist/index.js

where node20 --version is v20.11.1.

And afterwards I ran it with the revert patch applied and here are the two memory/cpu usage graphs again:
Screenshot from 2024-03-04 13-42-41

We really need a script to reproduce.

The following snippets of code seem to be enough to reproduce the problem - link.

On one hand - perhaps this code just produces so many logs that winston can't manage to process them quickly enough so they accrue in-memory and eventually crash with OOM.
On the other hand - I wonder why this memory leak happens only in v20.11.1 and it does not leak when I revert that commit.

@yannis-tseng
Copy link

@mcollina Could you please provide any updates? Thank you.

@mcollina mcollina added the confirmed-bug Issues with confirmed bugs. label Mar 30, 2024
@mcollina
Copy link
Member

I spent some time playing with your demo and it's leaking memory in Node v20.11.1 but not v20.10.0.
I did not bisect Node.js core yet.

Moreover, I haven't identified the cause of the leak yet and why it's showing up in the convoluted example. More importantly, several EventEmitter leaks are showing up in both.

Can you simplify your example only using Node.js core?

@kshaa
Copy link
Author

kshaa commented Apr 4, 2024

Unfortunately don't have a more concrete example without using winston. However it seems that the memory leak is not present anymore in node v20.12.1.

Tested on my staging environment:
Screenshot from 2024-04-04 13-18-50

Also tested with the simplified test case which only uses node and winston:
Screenshot from 2024-04-04 13-22-12

@mcollina
Copy link
Member

mcollina commented Apr 4, 2024

Then... closing.

@mcollina mcollina closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants