Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm: add experimental NodeRealm implementation #47855

Closed
wants to merge 30 commits into from

Conversation

mcollina
Copy link
Member

@mcollina mcollina commented May 4, 2023

A NodeRealm is a complete Node.js environment that runs within the same thread. This feature is added behind a --experimental-noderealm flag.

This work is based on #45018, which adds the synchronous-worker module initially written by @addaleax. and is incorporated here with Anna's permission as stated in #45018 by @jasnell (listed as a co-author of this change). The main difference from the synchronous-worker module is that I removed all the "synchronous" parts: it's now impossible to spin the event loop manually.

The reason for including this inside Node.js instead of keeping it as a separate module is access to internal API that would make stop() possible without crashing the process.

This implementation still suffers from a few bugs that I would aim to solve before removing the experimental flags:

  • leaks memory like ShadowRealm (memory leak in shadow realms #47353);
  • avoid polling during stop();
  • ESM support using internals (instead of the current hack);
  • complete inspector support (currently, it crashes with --inspect-brk due to an async_hooks issue).

A LocalWorker is a Node.js environemnt that runs within the same thread.
This feature is added behind a `--experimental-localworker` flag.

This work is based on The synchronous-worker module as originally
written by Anna Henningsen and is incorporated here with
Anna's permission.

Signed-off-by: Matteo Collina <[email protected]>
Co-Authored-By: James M Snell <[email protected]>
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/tsc

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 4, 2023
@mcollina mcollina requested a review from joyeecheung May 4, 2023 10:36
Signed-off-by: Matteo Collina <[email protected]>
@legendecas
Copy link
Member

legendecas commented May 4, 2023

The removal of the ability to spin the loop sounds great!

This change allows us to model it in a node::Realm when node::Realm is stable with ShadowRealm. It would be good to ensure the API exposed can be migrated to built on top of node::Realm so that we can share the infrastructure between the two (e.g. integration of async hooks as the LocalWorker can create async continuation in the outer environment as well).

Name bikeshed: what do you think about naming the API as vm.Realm or vm.NodeRealm? If I understand correctly, its execution blocks the thread. The name of Realm indicates it doesn't create any thread but rather a Context with Node.js built-ins. Also, this would suggest it is an API contrast to the ECMAScript built-in API ShadowRealm.

@legendecas legendecas added the vm Issues and PRs related to the vm subsystem. label May 4, 2023
@targos
Copy link
Member

targos commented May 4, 2023

I agree that having "Worker" in the name is confusing, since the API looks nothing like a Worker.

@joyeecheung
Copy link
Member

To me "Worker" already implies it's on a different thread (and a different V8 isolate), so yeah calling this worker is a bit weird as it's essentially just a new context. I think a name with the word "Context" or "Realm" would really be more intuitive.

@mcollina
Copy link
Member Author

mcollina commented May 4, 2023

I'm happy to call it NodeReam.

src/node_contextify.cc Outdated Show resolved Hide resolved
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label May 4, 2023
@nodejs-github-bot
Copy link
Collaborator

Signed-off-by: Matteo Collina <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m only reviewing the documentation for now but this seems very promising. I wonder if this can replace the unfinished --experimental-vm-modules, at least for the most urgent use cases discussed in #37648. The support for ESM fills a gap that is frustrating for a lot of library authors.

doc/api/vm.md Outdated
Comment on lines 1588 to 1589
A `LocalWorker` is effectively a Node.js environment that runs within the
same thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bit more detail than this. A Node.js environment . . . with its own global scope? That can have separate NODE_OPTIONS? Is it CommonJS or ESM, or either?

To others’ points, how does this differ from Realm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Realm (in specs terms) does not support ESM, require and all Node.js core modules. It's like a Realm, but with all the Node.js stuff.

doc/api/vm.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated
added: REPLACEME
-->

* `filename` {string}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really any module specifier that can be passed to require, right? So not just filenames but also bare specifiers like lodash? Or node:fs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just require, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the same as module.createRequire(), we should copy some of the docs from that API. For example, filename can also be a URL.

doc/api/vm.md Outdated
Comment on lines 1652 to 1653
Create a dynamic `import()` function that can be used for loading EcmaScript
modules inside the inner Node.js instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the signature of the returned function? Does it return a promise? Does it match the signature of import(), where the second argument is an options bag (like { assert: { type: 'json' } })?

@mcollina
Copy link
Member Author

mcollina commented May 4, 2023

@GeoffreyBooth the implementation of the ESM support is at best hacky. It works but it involves writing a file, requiring it, and exporting the dynamic import in there 🥶. I would likely need some guidance on the internals of ESM to make it stables

Ultimately, I think this is want to use for testing, hot reloading, and generic isolation within the same thread.

@GeoffreyBooth
Copy link
Member

It works but it involves writing a file, requiring it, and exporting the dynamic import in there 🥶.

That . . . is pretty hacky. This part? https://github.com/nodejs/node/pull/47855/files#diff-5b84120ee7ea85a4b78330964d42d04d9d7ab1bfa12983e6b379cb33f796a10eR115-R136

It seems like you’re doing this because you need to use require via createRequire? This is already an async function, presumably you could use import() here but I assume there’s something you need from this.#module.require?

This would be a bit of a wild solution but one thing we could do is give require support for data: URIs. Then the source you’re generating here could be passed directly to require similarly to my example above, along the lines of require(`data:application/node,${encodeUriComponent(source)}`) (where application/node is the MIME type for CommonJS JavaScript). That would at least eliminate the “write to disk” hackiness.

We could perhaps just make a new internal API for “require a source string” rather than actually ship this as a new public-facing feature, to get around this immediate need without signing up for a potentially big expansion of what we support CommonJS being able to do. I also suspect that “require a source string” might already exist in the codebase somewhere, like within the CommonJS loader as the step that happens after the source is loaded from disk.

if (process.listenerCount('uncaughtException') === 1) {
// If we are stopping, silence all errors
if (!this.#stoppedPromise) {
this.emit('error', err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.emit('error', err);
this.emit('uncaughtException', err);

nitpick. makes more sense IMO

src/node_contextify.cc Outdated Show resolved Hide resolved
ThreadId thread_id = AllocateEnvironmentThreadId();
auto inspector_parent_handle = GetInspectorParentHandle(
outer_env, thread_id, "file:///synchronous-worker.js");
env_ = CreateEnvironment(isolate_data_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be a good idea - I think so far we've been assuming in the code base that there is a 1:1 relationship between the environment and the isolate, things can be subtly broken once we have n:1. It would be cleaner if we follow the Realm approach and just split out states in the Environment that are unique to each context/realm to a subclass of node::Realm (or maybe it is already just node::PrincipalRealm).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so far we've been assuming in the code base that there is a 1:1 relationship between the environment and the isolate, things can be subtly broken once we have n:1.

Quite a few of the crashes I've been fighting with this approach are due to that. I'm using the reference to env to clean up all the handles. Otherwise, we will have bad crashes after stop() is called (which is why I'm doing this in core). Using a node::PrincipalReam would require removing the deliberate stop() method: how would we shut it down?

The other question I have if we dith the CreateEnvironment call, how do we guarantee some level of isolation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just move the handles into the Realm instead. Basically, just move things that should belong to individual realms instead of being shared across them to the realm, and do the setup/cleanup on a per-realm basis, which is what we've been trying to do with the ShadowRealm integration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just move the handles into the Realm instead.

How would you do that?

As an example HandleWrap add things to the Environment here:

env()->handle_wrap_queue()->PushBack(this);
(there are a few other places too).

Should I try to move that list from the Env to the Realm?

Copy link
Member

@joyeecheung joyeecheung May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just make that realm()->handle_wrap_queue()->PushBack(this). That's what we've been doing to support other BaseObjects in ShadowRealms, we just haven't got to HandleWrap yet. If our intent is to make individual realms have their own sets of handles etc., we'd have to move them into realms and stop mixing them in one giant Environment that contains per-thread information anyway.

Copy link
Member

@legendecas legendecas May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, tracking handle wraps and request wraps by realms is the plan for shadow realm integration. It's not the current focus yet.

However, moving the list from the Environment to the Realm breaks the postmortem diagnostics since tools like llnode are built on top of the Environment::handle_wrap_queue structure: https://github.com/nodejs/node/blob/main/src/node_postmortem_metadata.cc#L23

An option can be tracking the handle wraps and request wraps by both the Env and its creation Realm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to worry much about the postmortem diagnostics data, we only need to leave the offset of the queues within Realms and the tools would then figure out how to adapt to newer versions of Node.js. It's never guaranteed that we'd never change the layout of our internals, only that when we do, we still leave some information in the binary (the offsets) for these tools to figure out how to extract information from a core dump / process memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we won't be able to have a process object without a new Node.js Environment, right?

May we land this PR and work to remove CreateEnvironment in follow-on work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so far we've been assuming in the code base that there is a 1:1 relationship between the environment and the isolate

That that's not the case is precisely the difference between Environment and IsolateData, though. Just because the Node.js CLI doesn't create multiple Environment instances per Isolate doesn't mean that it's not semantically correct to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That that's not the case is precisely the difference between Environment and IsolateData, though. Just because the Node.js CLI doesn't create multiple Environment instances per Isolate doesn't mean that it's not semantically correct to do so.

I don't think in practice we have really been writing the code that way. There are some places where we configure V8 isolate hooks (e.g. the ones in Environment::InitializeDiagnostics) with the current Environment as data to be used in the callback. The async hooks where things like Environment::GetCurrent(isolate)->trigger_async_id() is used a lot don't look particularly robust against a multi-Environment architecture either, and there are probably more places than I can think of off the top of my head..

lib/internal/vm/localworker.js Outdated Show resolved Hide resolved
const w = new LocalWorker();
const myAsyncFunction = w.createRequire(fileURLToPath(import.meta.url))('my-module');
console.log(await myAsyncFunction());
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the docs should clarify the difference between this and a ShadowRealm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to understand the differences (and similarities) between this and a worker. Because they look very similar. For example, does a realm have an event loop? Does it share globals? (I'm assuming yes and no?)

doc/api/vm.md Outdated Show resolved Hide resolved
mcollina and others added 5 commits May 8, 2023 14:45
Co-authored-by: Shrujal Shah <[email protected]>
Co-authored-by: Antoine du Hamel <[email protected]>
Co-authored-by: Antoine du Hamel <[email protected]>
Co-authored-by: Moshe Atlow <[email protected]>
Co-authored-by: cjihrig <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
doc/api/vm.md Outdated Show resolved Hide resolved
Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth I’ve removed the file hack for createImport() and removed createRequire().

Thank you so much! I really appreciate the support 😄

There are bunch of little questions that need resolving but I agree with the general intent and API of this, and I’m happy to approve once we sort through the details.

doc/api/cli.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated
```mjs
import { NodeRealm } from 'node:vm';
const noderealm = new NodeRealm();
const myAsyncFunction = noderealm.createImport(import.meta.url)('my-module');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified to noderealm.import('my-module')? As in, are we able to infer the module parent (import.meta.url) from within createImport, and if so, is there ever a reason we wouldn’t want to import relative to that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think retrieving a function caller's location reliably is possible.

doc/api/vm.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated
added: REPLACEME
-->

#### `noderealm.stop()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### `noderealm.stop()`
#### `nodeRealm.stop()`

And for all following.

// We might want to change this in the future to use a callback,
// but at this point it seems like a premature optimization.
// TODO(@mcollina): refactor to use a close callback
setTimeout(tryClosing, 100).unref();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 100 and not, say, 10?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above. Completely arbitrary, it could have been 42. I feel retrying 10 times per second is a good compromise, retrying 100 times per second might be too much.

I will refactor this to have a better solution for this active wait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(After this lands)

lib/internal/vm/noderealm.js Outdated Show resolved Hide resolved
lib/internal/vm/noderealm.js Outdated Show resolved Hide resolved
mcollina and others added 6 commits May 8, 2023 18:37
Signed-off-by: Matteo Collina <[email protected]>
Co-authored-by: Geoffrey Booth <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some grammar fixes, but 👍 from me!

doc/api/errors.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated Show resolved Hide resolved
outer_context_.Reset(env->isolate(), outer_context);
}

NodeRealm* NodeRealm::Unwrap(const FunctionCallbackInfo<Value>& args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should NodeRealm inherit from BaseObject? This seems pretty similar to BaseObject's work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Only reason I didn't do this in synchronous-worker is because it was built outside of Node.js core.

return MaybeLocal<Value>();
}

NodeRealmScope worker_scope(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NodeRealmScope worker_scope(this);
NodeRealmScope realm_scope(this);

doc/api/vm.md Outdated
associated with this Node.js instance are released. This promise resolves on
the event loop of the _outer_ Node.js instance.

#### `nodeRealm.createImport(filename)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create the NodeRealm with a specifier?

interface NodeRealm {
  constructor(specifier: string);
  import(specifier, importAssertions): ModuleNamespace;
}

In this way, we can get rid of the higher order function createImport and makes the class method more aligned with ShadowRealm.prototype.importValue:

import { NodeRealm } from 'node:vm';
const nodeRealm = new NodeRealm(import.meta.url);
const { myAsyncFunction } = await nodeRealm.import('my-module');
console.log(await myAsyncFunction());

doc/api/vm.md Outdated Show resolved Hide resolved
doc/api/vm.md Outdated Show resolved Hide resolved
~NodeRealmScope();

private:
NodeRealm* w_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nit: not really a fan of w_ as a member variable name. Something more descriptive would be ideal. realm_ perhaps?

EnvironmentFlags::kTrackUnmanagedFds),
thread_id,
std::move(inspector_parent_handle));
assert(env_ != nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert()s should be turned into CHECKs

Context::Scope context_scope(outer_context);
Isolate::SafeForTerminationScope termination_scope(isolate_);
Local<Value> onexit_v;
if (!self->Get(outer_context, String::NewFromUtf8Literal(isolate_, "onexit"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using onexit_string()

Co-authored-by: Geoffrey Booth <[email protected]>
Co-authored-by: Chengzhong Wu <[email protected]>
Co-authored-by: Antoine du Hamel <[email protected]>
Co-authored-by: James M Snell <[email protected]>
@ThisIsMissEm
Copy link

This would be a bit of a wild solution but one thing we could do is give require support for data: URIs. Then the source you’re generating here could be passed directly to require similarly to my #47855 (comment) above, along the lines of require(data:application/node,${encodeUriComponent(source)}) (where application/node is the MIME type for CommonJS JavaScript). That would at least eliminate the “write to disk” hackiness.

@GeoffreyBooth if anything, wouldn't it be application/node+javascript ? i.e., node.js is a subset of javascript.

@aduh95
Copy link
Contributor

aduh95 commented May 11, 2024

This needs a rebase.

@mcollina mcollina closed this May 12, 2024
@mcollina
Copy link
Member Author

Closing this for now. Will reopen when we can do this without issues (roughly after the Olipan implementation).

@joyeecheung
Copy link
Member

FWIW if what you are concerned about are AsyncWrap I think there is still a long way to go before we can properly migrate those (as they often have complex references to other native objects). For other simpler objects we are basically blocked by the next V8 upgrade for something like #52295 to avoid using deprecated APIs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. vm Issues and PRs related to the vm subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.