Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC is not able to instantly free up memory of request after Response finish ==> possible OOM on sequential requests #2836

Closed
viyaha opened this issue Jul 9, 2020 · 13 comments
Labels

Comments

@viyaha
Copy link

viyaha commented Jul 9, 2020

What steps will reproduce the bug?

const {createServer, get} = require("http");

const server = createServer((req, res) => {
    // simulate compression middleware
    const _write = res.write;
    res.write = function(content) {
        return _write.call(this, content);
    }

    setImmediate(() => {
        // simulate request-body middleware reading large body
        req.body = [];
        for (let i = 0; i < 18000000; i++) {
            req.body.push({
                a: "" + Math.random(),
            });
        }

        // write response
        for (let i = 0; i < 2000; i++) {
            res.write(``)
        }
        res.end();
    });
});

server.listen(3000, () => {
    console.log('Example app listening on port 3000!');
    client().catch(console.error);
});


async function client() {
    for(let i = 0; ; i++) {
        await new Promise(resolve => {
            get("http://localhost:3000", res => {
                res.on("data", () => {});
                res.on("end", () => {
                    resolve();
                })
            });
        })
        console.log(`Finished request ${i}`);        
        // Pause makes it less likely, but it can still happen
        // await new Promise(res => setTimeout(res, 1000));
    }
}

Log:

Starting request 0
Finished request 0
Starting request 1
Finished request 1
Starting request 2
Finished request 2
Starting request 3

<--- Last few GCs --->

[18960:00000169F26E3530]    34919 ms: Mark-sweep 2035.2 (2038.5) -> 2034.9 (2038.5) MB, 1395.3 / 0.0 ms  (+ 50.2 ms in 9 steps since start of marking, biggest step 10.1 ms, walltime since start of marking 1475 ms) (average mu = 0.230, current mu = 0.200) [18960:00000169F26E3530]    35207 ms: Scavenge 2039.5 (2042.5) -> 2038.8 (2042.5) MB, 3.8 / 0.0 ms  (average mu = 0.230, current mu = 0.200) allocation failure


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 00007FF6589B463D]
Security context: 0x017df0a008a1 <JSObject>
    1: /* anonymous */ [000000AA53C6D841] [\index.js:~4] [pc=00000286E3DCE432](this=0x00aa53c6d879 <Immediate map = 000000FE427D3DF1>)
    2: processImmediate [000003127DBC2351] [internal/timers.js:439] [bytecode=0000036B0460F7E9 offset=262](this=0x03127dbc2391 <process map = 000002EA82075631>)
    3: InternalFrame [pc: 00007FF65893...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Writing Node.js report to file: report.20200709.160202.18960.0.001.json
Node.js report completed
 1: 00007FF657DE232F napi_wrap+124543
 2: 00007FF657D836A6 public: bool __cdecl v8::base::CPU::has_sse(void)const __ptr64+34502
 3: 00007FF657D84366 public: bool __cdecl v8::base::CPU::has_sse(void)const __ptr64+37766
 4: 00007FF658588C5E private: void __cdecl v8::Isolate::ReportExternalAllocationLimitReached(void) __ptr64+94
 5: 00007FF658570CA1 public: class v8::SharedArrayBuffer::Contents __cdecl v8::SharedArrayBuffer::Externalize(void) __ptr64+833
 6: 00007FF65843E56C public: static void __cdecl v8::internal::Heap::EphemeronKeyWriteBarrierFromCode(unsigned __int64,unsigned __int64,class v8::internal::Isolate * __ptr64)+1436
 7: 00007FF658449910 public: void __cdecl v8::internal::Heap::ProtectUnprotectedMemoryChunks(void) __ptr64+1312
 8: 00007FF658446444 public: static bool __cdecl v8::internal::Heap::PageFlagsAreConsistent(class v8::internal::HeapObject)+3204
 9: 00007FF65843BCD3 public: bool __cdecl v8::internal::Heap::CollectGarbage(enum v8::internal::AllocationSpace,enum v8::internal::GarbageCollectionReason,enum v8::GCCallbackFlags) __ptr64+1283
10: 00007FF65843A4A4 public: void __cdecl v8::internal::Heap::AddRetainedMap(class v8::internal::Handle<class v8::internal::Map>) __ptr64+2356
11: 00007FF65845B775 public: class v8::internal::Handle<class v8::internal::HeapObject> __cdecl v8::internal::Factory::NewFillerObject(int,bool,enum v8::internal::AllocationType) __ptr64+53
12: 00007FF6581C78D7 public: class v8::internal::interpreter::JumpTableTargetOffsets::iterator & __ptr64 __cdecl v8::internal::interpreter::JumpTableTargetOffsets::iterator::operator=(class v8::internal::interpreter::JumpTableTargetOffsets::iterator && __ptr64) __ptr64+3687
13: 00007FF6589B463D public: virtual bool __cdecl v8::internal::SetupIsolateDelegate::SetupHeap(class v8::internal::Heap * __ptr64) __ptr64+567949
14: 00000286E3DCE432

How often does it reproduce? Is there a required condition?

Seems like always, most of the time not in an OOM (less Memory Consumption) but the Request-Object can't be disposed for some time.
It seems like following conditions are required:

  • Data needs to be added to the request
  • something needs to be written to the response (less calls ==> more unlikely to result in an oom)
  • it needs to be async (setImmediate or Promises)
  • It needs to exist a function with res/req in its closure (here: overwritten write-function, an callback-function would be possible too)
  • It should not pass "too much time": sometimes the memory is freed instantly, after some seconds, ... ==> it seems like not a "real" leak, because the memory is freed eventually, but quite some time can pass until the GC is able to free it...

What is the expected behavior?

After finish of the request, the GC should be able to dispose all request related objects (freeing the memory).

@bnoordhuis
Copy link
Member

This has to do with how V8 allocates closures. Specifically, the way you monkey-patch res.write() creates a closure (because it uses _write from the outer scope) that retains more than just the _write variable, because it shares the closure with the setImmediate() callback that retains req and res.

If you rewrite your example, it'll keep running:

// ...
const patch = f => function(s) { return f.call(this, s) };
const server = createServer((req, res) => {
    // simulate compression middleware
    res.write = patch(res.write);  // ok, no longer shares a closure
// ...

This is longstanding behavior (as in: I think V8 has always worked this way) and not something I expect to change because 99% of the time it's the desired behavior: it drastically reduces the number of allocated closures.

@devsnek
Copy link
Member

devsnek commented Jul 9, 2020

FWIW this is an open bug: v8:7610

@bnoordhuis
Copy link
Member

@viyaha I'm going to close this as a wontfix on our side but I can move it to nodejs/help if you have more questions.

@mika-fischer
Copy link

@bnoordhuis This is interesting info, thank you.

However, I still fail to understand why the GC cannot reclaim the memory. Client requests are serial and after one client request is finished surely all work scheduled by res.write and res.end is done and nothing should still have a live reference to anything regarding the request or response in the server. So why does the GC fail to free that memory?

Also please note that this came out of a OOM crash we encountered in production and we spent a lot of time breaking down into a small self-contained example in order to report here. But it is a real problem for us and we would like to prevent it in the future (although we mitigated through other means in the meantime).

In reality the problem is introduced by the express compression middleware: https://github.com/expressjs/compression/blob/master/index.js Do you see a glaring problem with the way it monkey-patches the response? We'd be willing to work on and test a fix, but still don't really understand what exactly to change, i.e. what exactly is keeping the request alive...

Or are you saying that nothing can be done here?

@bnoordhuis bnoordhuis transferred this issue from nodejs/node Jul 10, 2020
@bnoordhuis bnoordhuis reopened this Jul 10, 2020
@bnoordhuis
Copy link
Member

@mika-fischer Just so we're on the same page, the basic issue is this:

function t() {
  var x = small()
  var y = big()
  function f() { return x }
  function g() { return y }
  return [f, g] // sharing a single [x,y] closure
}
var [f] = t() // discards g but still retains y

Circling back to the example @viyaha posted:

    setImmediate(() => {
        // simulate request-body middleware reading large body
        req.body = [];
        for (let i = 0; i < 18000000; i++) {
            req.body.push({
                a: "" + Math.random(),
            });
        }

        // write response
        for (let i = 0; i < 2000; i++) {
            res.write(``)
        }
        res.end();
    });

req becomes unreachable/collectable after the first for loop but only if it's not also retained by the closure (which it was - and no longer is when you rewrite the example.)

So, long story short: yes, the problem is in how expressjs/compression creates a closure that retains more than you'd at first glance expect.

@mika-fischer
Copy link

@bnoordhuis That's how I understood your first response, yes. And I understand that part. [Incidentally, I don't think it's a big problem here, since in our use-case with express, the response obviously needs to be kept alive until it is finished and it holds a reference to the request. So this behavior is not surprising. Still very interesting to know though, so thanks for that.]

But from our perspective the issue is not that req is not released early within the handling of the HTTP request. The issue is that after the HTTP request is completely finished (i.e. the client has read the response), the GC still cannot free the memory, and if another large request comes in then node terminates with an OOM. And it's totally unclear to us, what is keeping the memory alive after the setImmediate, including all the write()s and the end() are finished.

@bnoordhuis
Copy link
Member

@mika-fischer That needs more explaining than I can probably fit into a single-page comment but the executive summary is that those 2,000 res.write(``) calls create 2,000 write requests that can't be collected until the next tick of the event loop. The combined memory pressure of those write requests and the retained req was enough to push it over the memory limit.

(There are reasons it works that way, it lets Node coalesce writes into fewer - in this case: one - TCP packet. Without that optimization, the example would send 2,000 + 1 packets.)

The good news is that Node master is a bit smarter about it. The example runs without OOM errors there.

@mika-fischer
Copy link

@bnoordhuis Thanks for your explanations, but I think we're still talking somewhat past each other.

The OOM does not happen within the handling of a request! I would understand the issue with that, but that's not the case here. Instead, the first request is completely finished (i.e. went over the network, and was received by the client), then a second request comes in and that triggers the OOM because the GC for some reason cannot free the memory associated with the first request (it is not deterministic, and might happen with the third or fourth, etc.).

At the point in time, when the client has finished reading the first response, obviously all write and end requests have already been done (otherwise the client would not have finished reading the response) and so the GC should be able to free them if needed. But this is not happening.

Note that in the extreme this even works with only a single res.end(), but it makes is somewhat less likely to trigger, so we opted to post the more reliable way here.

So let me pose my confusion as a question: At the point in time when the client has finished receiving the response to the first request (i.e. all writes and the end request were handled, went over the network, got recieved by the client), is there any valid reason that the GC could fail to free the memory associated with this completed first request in the server? If so, what exactly is keeping these things alive? From my perspective, it can't be the write()s or the end(), because they have already been handled at that point, because the client already got to the end of the request.

We tried with Node 14, will also try with master. But do note that with Node 14, we already had to use --max-old-space-size or increase the request size in order to trigger the issue. Something changed there between 12 and 14 I assume, and probably the same has to be done when testing with master.

@bnoordhuis
Copy link
Member

At the point in time when the client has finished receiving the response to the first request (i.e. all writes and the end request were handled, went over the network, got recieved by the client), is there any valid reason that the GC could fail to free the memory associated with this completed first request in the server?

The client can receive the response and fire off a new request before the next event loop tick happens on the server, when the write requests become eligible for garbage collection.

Put another way: there's a short time frame where the memory footprint of two sequential client requests may co-exist.

@mika-fischer
Copy link

The client can receive the response and fire off a new request before the next event loop tick happens on the server, when the write requests become eligible for garbage collection.

Put another way: there's a short time frame where the memory footprint of two sequential client requests may co-exist.

But this is not happening here, as the behavior can be triggered even with a pause of 1 second between requests (see commented out code in the initial code).

@mika-fischer
Copy link

@bnoordhuis All right, I've prepared a new version, this time with only 100 writes, 5 seconds(!) pause between requests, some more logging and tested with current master. It still OOMs reliably.

I still fail to understand why the data of the first request cannot be GCed, 20 seconds(!) after the response was received by the client.

const {createServer, get} = require("http");
const log = msg => console.log(`[${process.pid}:${''.padStart(16)}]${(process.uptime() * 1000).toFixed(0).padStart(9)} ms: ${msg}`);
let n = 0;
createServer((req, res) => {
    setImmediate(() => { // setImmediate or Promises are necessary
        log(`S: Got new request ${n}`);
        // simulate compression middleware
        const _write = res.write;
        res.write = function (data) {
            return _write.call(this, data);
        }
        log(`S: Reading request body ${n}`);
        req.body = [];
        for (let i = 0; i < 19 * 1000 * 1000; i++) {
            req.body.push({a: `${i}`});
        }
        log(`S: Writing response ${n}`);
        for (let i = 0; i < 100; i++) {
            res.write("");
        }
        res.end(() => log(`S: Finished writing response ${n++} (res.end() cb)`));
    });
}).listen(3000, () => { client().catch(console.error) });
async function client() {
    for (let i = 0; ; i++) {
        log(`C: Starting request ${i}`);
        await new Promise(resolve => {
            get("http://localhost:3000", res => {
                res.on("data", () => {});
                res.on("end", resolve);
            });
        });
        log(`C: Finished reading response to request ${i}`);
        log(`C: Pausing for 5 seconds`);
        await new Promise(res => setTimeout(res, 5000));
    }
}

gives me:

> C:\Users\mfischer\Desktop\node-v15.0.0-nightly20200714c7627da837-win-x64\node.exe --trace-gc --trace-gc-ignore-scavenger --max-old-space-size=2048 .\index.js
[4336:                ]       62 ms: C: Starting request 0
[4336:                ]       83 ms: S: Got new request 0
[4336:                ]       84 ms: S: Reading request body 0
[4336:0000026EDF24B090]      220 ms: Mark-sweep 36.8 (68.1) -> 32.1 (65.5) MB, 1.8 / 0.0 ms  (+ 10.6 ms in 26 steps since start of marking, biggest step 4.3 ms, walltime since start of marking 46 ms) (average mu = 1.000, current mu = 1.000) finalize incremental marking via stack guard GC in old space requested
[4336:0000026EDF24B090]      724 ms: Mark-sweep 189.4 (207.1) -> 153.4 (186.4) MB, 13.9 / 0.0 ms  (+ 37.3 ms in 179 steps since start of marking, biggest step 0.4 ms, walltime since start of marking 125 ms) (average mu = 0.899, current mu = 0.899) finalize incremental marking via stack guard GC in old space requested
[4336:0000026EDF24B090]     3016 ms: Mark-sweep 709.6 (740.2) -> 603.3 (636.9) MB, 4.4 / 0.0 ms  (+ 150.1 ms in 731 steps since start of marking, biggest step 0.7 ms, walltime since start of marking 787 ms) (average mu = 0.927, current mu = 0.933) finalize incremental marking via stack guard GC in old space requested
[4336:                ]     5627 ms: S: Writing response 0
[4336:                ]     5631 ms: S: Finished writing response 0 (res.end() cb)
[4336:                ]     5635 ms: C: Finished reading response to request 0
[4336:                ]     5636 ms: C: Pausing for 5 seconds
[4336:0000026EDF24B090]     6168 ms: Mark-sweep 1376.2 (1402.2) -> 1187.0 (1221.3) MB, 10.3 / 0.0 ms  (+ 611.5 ms in 1043 steps since start of marking, biggest step 1.3 ms, walltime since start of marking 986 ms) (average mu = 0.841, current mu = 0.803) finalize incremental marking via task GC in old space requested
[4336:                ]    10637 ms: C: Starting request 1
[4336:                ]    10651 ms: S: Got new request 1
[4336:                ]    10651 ms: S: Reading request body 1
[4336:0000026EDF24B090]    14055 ms: Mark-sweep 1670.5 (1695.6) -> 1578.5 (1613.2) MB, 1639.8 / 0.0 ms  (+ 95.6 ms in 481 steps since start of marking, biggest step 1.0 ms, walltime since start of marking 2044 ms) (average mu = 0.794, current mu = 0.780) finalize incremental marking via stack guard GC in old space requested
[4336:0000026EDF24B090]    16699 ms: Mark-sweep (reduce) 1822.1 (1828.5) -> 1771.8 (1778.4) MB, 1613.3 / 0.0 ms  (+ 43.9 ms in 215 steps since start of marking, biggest step 0.3 ms, walltime since start of marking 1881 ms) (average mu = 0.650, current mu = 0.373) allocation failure GC in old space requested
[4336:0000026EDF24B090]    18366 ms: Mark-sweep (reduce) 1910.3 (1913.5) -> 1834.8 (1839.4) MB, 1228.3 / 0.0 ms  (+ 0.0 ms in 7 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1398 ms) (average mu = 0.533, current mu = 0.263) allocation failure GC in old space requested
[4336:0000026EDF24B090]    21222 ms: Mark-sweep (reduce) 1943.9 (1948.4) -> 1943.6 (1953.7) MB, 2123.7 / 0.0 ms  (+ 7.7 ms in 33 steps since start of marking, biggest step 0.3 ms, walltime since start of marking 2166 ms) (average mu = 0.391, current mu = 0.253) allocation failure GC in old space requested
[4336:0000026EDF24B090]    23087 ms: Mark-sweep (reduce) 1998.7 (2002.7) -> 1998.5 (2007.9) MB, 1362.3 / 0.0 ms  (+ 0.0 ms in 33 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1510 ms) (average mu = 0.343, current mu = 0.270) allocation failure GC in old space requested
[4336:0000026EDF24B090]    24894 ms: Mark-sweep (reduce) 2026.4 (2038.4) -> 2026.2 (2040.2) MB, 1393.5 / 0.0 ms  (+ 0.0 ms in 36 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1651 ms) (average mu = 0.293, current mu = 0.229) allocation failure GC in old space requested
[4336:0000026EDF24B090]    26530 ms: Mark-sweep (reduce) 2039.8 (2042.7) -> 2039.6 (2048.2) MB, 1299.6 / 0.0 ms  (+ 0.0 ms in 32 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1571 ms) (average mu = 0.254, current mu = 0.206) allocation failure GC in old space requested

<--- Last few GCs --->
= [4336:0000026EDF24B090]    24894 ms: Mark-sweep (reduce) 2026.4 (2038.4) -> 2026.2 (2040.2) MB, 1393.5 / 0.0 ms  (+ 0.0 ms in 36 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1651 ms) (average mu = 0.293, current mu = [4336:0000026EDF24B090]    26530 ms: Mark-sweep (reduce) 2039.8 (2042.7) -> 2039.6 (2048.2) MB, 1299.6 / 0.0 ms  (+ 0.0 ms in 32 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1571 ms) (average mu = 0.254, current mu =

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 00007FF63D16113F napi_wrap+109951
 2: 00007FF63D107296 v8::base::CPU::has_sse+61622
 3: 00007FF63D108196 v8::base::CPU::has_sse+65462
 4: 00007FF63D9BB8DE v8::Isolate::ReportExternalAllocationLimitReached+94
 5: 00007FF63D9A066D v8::SharedArrayBuffer::Externalize+781
 6: 00007FF63D85EE7C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1516
 7: 00007FF63D86A05A v8::internal::Heap::ProtectUnprotectedMemoryChunks+1258
 8: 00007FF63D867219 v8::internal::Heap::PageFlagsAreConsistent+2457
 9: 00007FF63D85BF21 v8::internal::Heap::CollectGarbage+2033
10: 00007FF63D85A115 v8::internal::Heap::AllocateExternalBackingStore+1365
11: 00007FF63D874326 v8::internal::Factory::AllocateRaw+166
12: 00007FF63D88887B v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString+75
13: 00007FF63D882BF9 v8::internal::Factory::NewStringFromOneByte+121
14: 00007FF63D35A823 v8::internal::Factory::NewStringFromAsciiChecked+51
15: 00007FF63D88521D v8::internal::Factory::SmiToString+349
16: 00007FF63D884401 v8::internal::Factory::NumberToString+33
17: 00007FF63D5A0BEE v8::internal::Runtime::SetObjectProperty+43070
18: 00007FF63DA444AD v8::internal::SetupIsolateDelegate::SetupHeap+463949
19: 00007FF63DA6D244 v8::internal::SetupIsolateDelegate::SetupHeap+631268
20: 000003861574CFFE

@github-actions
Copy link

There has been no activity on this issue for 3 years and it may no longer be relevant. It will be closed 1 month after the last non-automated comment.

@github-actions github-actions bot added the stale label Jul 15, 2023
@github-actions
Copy link

There has been no activity on this issue and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants