Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers affected by V8 aborting on virtual allocation failure #25933

Open
trevnorris opened this issue Feb 4, 2019 · 11 comments
Open

Workers affected by V8 aborting on virtual allocation failure #25933

trevnorris opened this issue Feb 4, 2019 · 11 comments
Labels
confirmed-bug Issues with confirmed bugs. v8 engine Issues and PRs related to the V8 dependency. worker Issues and PRs related to Worker support.

Comments

@trevnorris
Copy link
Contributor

  • Version: 4b6e4c1
  • Platform: Linux 4.19.15 x86_64
  • Subsystem: worker_threads

A process aborts when V8 fails to allocate virtual memory. This is detremental to using Workers because:

  1. If a Worker fails to allocate virtual memory during initialization the process aborts.
  2. V8's virtual memory allocator doesn't play nicely with ulimit.

To demonstrate this issue:

  1. Set ulimit -v to some value smaller than actual physically available memory (I've allocated 8GB out of 16GB).
  2. Run the following script that spawns a set number of Workers (the number to spawn may need adjusting depending on the system):
const WORKER_COUNT = 60; // Value may need adjustment

const { Worker, parentPort, workerData, threadId } = require('worker_threads');
const worker_array = [];

if (workerData) {
  process._rawDebug(workerData, threadId, process.pid, process.ppid);
  parentPort.postMessage(42);
  setTimeout(() => {}, 1000000);
  return;
}

(function runner(n) {
  if (++n > WORKER_COUNT) return setTimeout(killAll, 10);
  const w = new Worker(__filename, { workerData: n });
  w.on('message', () => {
    process._rawDebug(JSON.stringify(getMemUsage()));
    runner(n);
  });
  w.on('exit', c => process._rawDebug(`${n} exited with ${c}`));
  worker_array.push(w);
})(0);

function killAll() {
  for (let w of worker_array)
    w.terminate();
}

function getMemUsage() {
  const o = process.memoryUsage();
  for (let i in o) o[i] = o[i] / 1024;
  return o;
}

Which results in a Fatal process OOM in CodeRange setup: allocate virtual memory error. With the stack trace of:

 * thread #1: tid = 4094, 0x0000000003eab702 node_g`v8::base::OS::Abort() at platform-posix.cc:399, name = 'node_g', stop reason = signal SIGILL: illegal instruction operand
  * frame #0: 0x0000000003eab702 node_g`v8::base::OS::Abort() at platform-posix.cc:399
    frame #1: 0x0000000002954ac2 node_g`v8::Utils::ReportOOMFailure(isolate=0x0000000005c13170, location="CodeRange setup: allocate virtual memory", is_heap_oom=false) at api.cc:460
    frame #2: 0x000000000295492d node_g`v8::internal::V8::FatalProcessOutOfMemory(isolate=0x0000000005c13170, location="CodeRange setup: allocate virtual memory", is_heap_oom=false) at api.cc:428
    frame #3: 0x000000000322e4a6 node_g`v8::internal::MemoryAllocator::InitializeCodePageAllocator(this=0x0000000005c076d0, page_allocator=0x00000000045aa030, requested=134217728) a t spaces.cc:168
    frame #4: 0x000000000322e083 node_g`v8::internal::MemoryAllocator::MemoryAllocator(this=0x0000000005c076d0, isolate=0x0000000005c13170, capacity=1526909922, code_range_size=0) at spaces.cc:132
    frame #5: 0x000000000318d6da node_g`v8::internal::Heap::SetUp(this=0x0000000005c13190) at heap.cc:4349
    frame #6: 0x000000000331aa77 node_g`v8::internal::Isolate::Init(this=0x0000000005c13170, des=0x00007fffffff6db8) at isolate.cc:3176
    frame #7: 0x00000000036e558c node_g`v8::internal::Snapshot::Initialize(isolate=0x0000000005c13170) at snapshot-common.cc:55
    frame #8: 0x0000000002994b0d node_g`v8::Isolate::Initialize(isolate=0x0000000005c13170, params=0x00007fffffff7170) at api.cc:8224
    frame #9: 0x00000000025db63b node_g`node::NewIsolate(allocator=0x00007fff34002c80, event_loop=0x0000000005bf2530) at node.cc:1420
    frame #10: 0x000000000270d9a6 node_g`node::worker::Worker::Worker(this=0x0000000005bf2500, env=0x00007fffffffcd08, wrap=(val_ = 0x00007fffffff84a0), url="\xb0?"..., per_isolate_ opts=nullptr) at node_worker.cc:106

Examination of the strace log shows:

31103 mmap(0x129b53343000, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
31103 mmap(0x129b53343000, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
31103 write(2, "\n#\n# Fatal process OOM in CodeRa"..., 70) = 70

Though if I run the same script on a machine with 8GB physical memory and ulimit -v unlimited the script doesn't have a problem.

In the end, this is probably something that'd need to be resolved by V8. Both to allocate virtual memory more intelligently and allow the Isolate to notify when no more memory could be allocated. Then have that hooked into the Worker's 'error' event.

@Fishrock123 Fishrock123 added confirmed-bug Issues with confirmed bugs. v8 engine Issues and PRs related to the V8 dependency. worker Issues and PRs related to Worker support. labels Feb 4, 2019
@bnoordhuis
Copy link
Member

this is probably something that'd need to be resolved by V8

It's come up on the v8-users mailing list more than once and I believe there are multiple V8 developers on record saying graceful handling of OOM errors is an explicit non-goal.

But cc @nodejs/v8 just in case. :)

@hashseed
Copy link
Member

hashseed commented Feb 5, 2019

@ulan

there is a way to hook into OOM

@ulan
Copy link
Contributor

ulan commented Feb 5, 2019

The OOM hook works when the OOM happens due to heap object allocations.

In this case we are exhausting the virtual memory on isolate initialization. The reason is that on 64-bit platforms, an isolate pre-allocates 128MB virtual range for code objects.

The workaround would be to reduces the code range size for workers using ResourceConstraints::set_code_range_size[1]
https://cs.chromium.org/chromium/src/v8/include/v8.h?rcl=576d82eeec75847d0095eca2d58dd08e23a58c89&l=6553

We could also move the code range construction to the beginning of the isolate initialization and return some indication of error, so that the embedder can handle that gracefully.

@bnoordhuis
Copy link
Member

I'm aware of v8::Isolate::SetOOMErrorHandler() but that doesn't help when V8 runs into a call to new or malloc() that fails, right?

Put another way, there's room to improve the status quo but no complete fix is possible. Is that a correct summary?

@ulan
Copy link
Contributor

ulan commented Feb 5, 2019

@bnoordhuis that's right

@trevnorris
Copy link
Contributor Author

@ulan Returning with an error if the v8::Isolate fails to allocate would be very helpful. I foresee plenty attempts to do things, such as spawning too many threads on a n1-highcpu-2.

Is there a way to get around the ulimit -v issue? When set to unlimited (16GB) I can create over 500 threads, but when it's set to 8GB the allocation fails at 38 threads.

@bnoordhuis Now that worker_threads are no longer behind a flag, I'm seeing more people starting to experiment and anticipate when this will graduate from experimental. Would an uncontrollable abort from creating a thread be a blocker?

@ulan
Copy link
Contributor

ulan commented Feb 5, 2019

@trevnorris limiting the resident set size with ulimit -m (instead limit the virtual memory) might help in this case.

@ulan
Copy link
Contributor

ulan commented Feb 5, 2019

@hashseed do you know if the spec allows throwing an error in new Worker()? If so, propagating the error from the worker isolate initialization seems like useful feature (both for Chrome and NodeJS).

@Fishrock123
Copy link
Contributor

We don't necessarily follow the spec in regards to all specifics around the workers api, fwiw.

@trevnorris
Copy link
Contributor Author

trevnorris commented Feb 6, 2019

@ulan Looking through the Web Workers spec, this is the most relevant thing I can find:

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

I interpret that as it being node's job to make sure the application doesn't crash on OOM. Though, because running the Worker should be done in parallel the OOM error would need to be passed to onerror() (or in node's case the 'error' event).

In short, the spec doesn't specify how OOM should be handled, but does does seem to give "user agents" control over how they should be handled (though my spec interpretation skills are average at best, and need confirmation by someone else).

Note: uname -m doesn't limit the amount of memory the process is allowed to use. The only other option I can think of now is to container-ize it or place it in a VM.

@addaleax
Copy link
Member

We could also move the code range construction to the beginning of the isolate initialization and return some indication of error, so that the embedder can handle that gracefully.

@ulan Yeah, I think that would be best…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. v8 engine Issues and PRs related to the V8 dependency. worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests

6 participants