Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-api: use c-based api for libnode embedding #54660

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

vmoroz
Copy link
Member

@vmoroz vmoroz commented Aug 30, 2024

This is a temporary spin off from the PR #43542.
This separate PR is created to simplify merging and rebasing with the latest code while we discuss the new API design.
When the code is ready it should be merged back to PR #43542.

The goal of the original PR is to enable C API and the Node-API for the embedded scenarios.
The C API allows using the shared libnode from runtimes that do not interop with C++ such as WASM, C#, Java, etc.
This PR works towards the same goal with some changes to the original code.

This is the related issue #23265.

The API design principles

  • Follow the best practices of the Node-API design and provide a way to interop with it.
  • Prefix the new API constructs with node_embedding_.
  • Design the API for ABI safety and being future proof for new requirements.
    • Follow the Builder pattern for the API design.
    • The typical use is to create an object, configure it, initialize it based on the configuration, use it, and then delete it. The configuration changes are prohibited after the object is initialized.
    • What if the initialization sequence must be customized? It means that we add a new configuration function and insert a customization hook into the initialization sequence. Thus, we can evolve the API by adding new configuration functions, and occasionally deprecating the old functions.
    • All behavior changes must be associated with a new API version number.

The API usage

  • To use the C embedding API, we must create, configure, and initialize the global node_embedding_platform. It initializes Node and V8 JS engine once per process and parses the CLI arguments.
  • Then, we create, configure, and initialize one or more node_embedding_runtimes. A runtime is responsible for running JavaScript code.
  • The runtime CLI arguments are initialized by default with the args and exec_args from the result of the platform initialization. They can be overridden while configuring the runtime.
  • A runtime can run in its own thread, several runtimes can share the same thread, or the same runtime can be run from multiple threads.
  • The runtime event loop APIs provide control over the runtime execution. These functions can be called many times because they do not destroy the runtime in the end.
  • The runtime offers to specify version of Node-API and to retrieve the associated napi_api instance. Any Node-API code that uses the napi_env must be run in the runtime scope controlled by node_embedding_runtime_open_scope and node_embedding_runtime_close_scope functions.

The API overview

Based on the use scenarios, the API can be split up into six groups.

Node.js CLI API

  • node_embedding_run_nodejs_main runs Node.js CLI without any customizations.

Error handling API

  • node_embedding_on_error sets the global error handling hook.

Global platform API

  • node_embedding_create_platform
  • node_embedding_delete_platform
  • node_embedding_platform_is_initialized
  • node_embedding_platform_set_flags
  • node_embedding_platform_set_args
  • node_embedding_platform_initialize
  • node_embedding_platform_get_parsed_args

Runtime API

  • node_embedding_create_runtime
  • node_embedding_delete_runtime
  • node_embedding_runtime_is_initialized
  • node_embedding_runtime_set_flags
  • node_embedding_runtime_set_args
  • node_embedding_runtime_on_preload
  • node_embedding_runtime_add_module
  • node_embedding_runtime_initialize

Runtime API to run event loops

  • node_embedding_runtime_on_event_loop_run_request
  • node_embedding_runtime_run_event_loop
  • node_embedding_runtime_complete_event_loop

Runtime API to interop with Node-API

  • node_embedding_runtime_set_node_api_version
  • node_embedding_runtime_invoke_node_api

Documentation

  • The new C embedding API is added to the existing embedding.md file after the C++ embedding API description.
  • The index.md is changed to indicate that the embedding.md has docs for C++ and C APIs.
  • TODO: complete the examples section.

Tests

  • The new C embedding API tests pass the same scenarios as the C++ embedding API tests.
  • The embedtest executable can be run in several modes controlled by the first CLI argument. It effectively contains several main functions for different test scenarios.
  • The JS test code is changed to provide the test mode argument based on the scenario.
  • Added several new test scenarios:
    • run several Node.js runtimes each in its own thread;
    • run several Node.js runtimes all in the same thread;
    • run Node.js runtime from different threads.
    • test that preload callback is called for the main and worker threads.

The PR status

The code is not 100% complete yet. There are still a few TODO items, but I would like to start a discussion with the Node-API team about the new API.

  • Address outstanding TODOs
    • The main_script must be an option for the runtime configuration.
    • Add startup callback with process and require parameters.
    • Consider generating the main script based on runtime configuration.
    • Allow setting the global inspector for a selected runtime (there can be only one inspectable runtime per process).
    • Start worker threads from C++.
    • Worker threads to inherit parent inspector.
    • Allow cancelling pending loop tasks on runtime deletion.
    • Can we init platform again if it retuns early?
    • Add simplified runtime mode without explicit open/close scope.
    • Simplify API use for simple default cases.
    • Enable embedded native modules that are a part of the main executable.
    • Test passing the V8 thread pool size.
    • Implement better error handling for function arguments.
    • Clear up usage of args vs exec_args. It all looks quite confusing.
  • Review the API design
  • Write docs

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/node-api

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Aug 30, 2024
@vmoroz vmoroz marked this pull request as draft August 30, 2024 14:58
@legendecas legendecas added the node-api Issues and PRs related to the Node-API. label Aug 30, 2024
// Skip printing output for --help, --version, --v8-options.
node_api_platform_no_print_help_or_version_output = 1 << 12,
// Initialize the process for predictable snapshot generation.
node_api_platform_generate_predictable_snapshot = 1 << 14,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an option which is something like

node_api_platform_nodejs_binary_default

which gives you the same configuration that is present for the node.js binary

typedef struct node_api_env_options__* node_api_env_options;

typedef enum {
node_api_platform_no_flags = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since a bunch of them seem to disable specific flags should there be an all_flags, or are they all on by default and then there are no/disable flags only?

Copy link
Member Author

@vmoroz vmoroz Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the approach since our last Node-API meeting. These flags are 1-to-1 mapping to the flags defined in the node.h. The default is the no_flags configuration. Then, embedders can disable some default Node.js features.
We can add an alias for the no_flags as a default_flags.

src/node_api_embedding.cc Outdated Show resolved Hide resolved
return napi_ok;
}

napi_status NAPI_CDECL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an engine does not support snapshots, can it just do nothing in the snapshot functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. Maybe we can change it in a way that the snapshot can be just a JS text. In JSI they use term "prepared JavaScript" for the same purpose. The only question if we want this API to be Node-specific, or we rather target it to be Runtime/engine independent. E.g. I use this API with the jsr_ prefix across the V8 and Hermes JS engines (it is also based on the Node-API): https://github.com/microsoft/v8-jsi/blob/master/src/node-api/js_runtime_api.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are already using it, being cross runtime might makes sense, just need to makes sure its easy for a platform to not support it and still have the same code run.

return std::move(env_setup_);
}

napi_status OpenScope() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that a scope is something different than a handle_scope - https://nodejs.org/api/n-api.html#napi_handle_scope, just wondering if there might be confusion between the concepts?

Copy link
Member Author

@vmoroz vmoroz Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is different. When we are inside of a module we already have some current v8::Isolate and v8::Context. We do not have them when we are outside and operating with the environment. So, we must establish them to use any V8/Node API. In the standalone v8-jsi project I used a function jsr_run_task that opens/closes the scope internally. (edit: I see that the v8-jsi also has the open/close scope. It is convenient to use when we do not want to create a lot of lambdas.)

doc/api/embedding.md Outdated Show resolved Hide resolved
doc/api/n-api.md Outdated Show resolved Hide resolved
src/node_api_embedding.h Outdated Show resolved Hide resolved
return napi_ok;
}

napi_status NAPI_CDECL node_api_open_env_scope(napi_env env) {
Copy link
Member

@mhdawson mhdawson Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if all functions which are only to be called as part of embedding versus in an add-on implementation should have some extra bit in the name. For exampe in this method node_api_embed_open_env_scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This open/close scope API gives us a lot of flexibility, but it is difficult to use and like you said it is quite confusing.
I am currently considering to replace it with a function that receives a lambda (c function + void state), and then the napi_env will be available only for that lambda. Other APIs will change from using napi_env to something like node_embedding_env or node_embedded_env.

return napi_ok;
}

napi_status NAPI_CDECL node_api_await_promise(napi_env env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this might be an extension to the promise support we already have - https://nodejs.org/api/n-api.html#promises

This is a good example were I think we needed the embed or something else in the name as otherwise people might get confused and think it could be called from an addon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe the prefix should be node_embedding_api_XXX

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we shorten it to the node_embedding_?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with node_embedding_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All API is changed to use the napi_embedding_ prefix.

@vmoroz vmoroz added the embedding Issues and PRs related to embedding Node.js in another project. label Sep 11, 2024
@vmoroz
Copy link
Member Author

vmoroz commented Sep 13, 2024

This PR was discussed today 9/13/2024 in the Node-API meeting.
This is the summary as I recall it. @mhdawson , @legendecas , @KevinEady , @gabrielschulhof , feel free to augment this comment in case if I missed or misunderstood something.

  • The global error handling callback.
    • The initial suggestion from the team was to use the "get-the-last-error" approach as in the Node-API.
    • The counter argument was that while the last error approach works great in the single threaded case it may not work in the multi-thread environment.
    • Since the detailed error info is mostly used for logging, implementing it in the single place is much simpler.
    • The default C embedded API error handler prints the error message to the stderr and exits the process. It is intended to handle "non-recoverable" errors such as wrong argument value passed to the API, or wrong CLI arguments.
    • The related question was what to do if a V8 Isolate runs out of memory. It needs to be investigated, but I guess the answer is that it will be handled by Node.js as it is handled today. The C embedding API does not currently participate in the process. If Node.js typically recovers from that condition, then it must continue doing it.
  • Does the new C-based embedding API has a goal to do the same as the C++ embedding API?
    • The answer is "yes" and "no", or better to say "it depends".
    • While we want to have the same functionality, there is no goal to wrap up all existing C++ embedding APIs.
    • The new C embedding API is going to grow based on the scenarios, and we hope that the Builder pattern let us evolve the API without ABI-breaking changes.
    • The C embedding API is going to be implemented on the top of the existing C++ embedded API.
  • The API growth based on Builder pattern aims to inject various callbacks in the different parts of the initialization process when needed. E.g. if the Electron needs to do some extra work between the CLI args parsing and V8 platform initialization, then we can add a callback that can be called between these steps.
    • The concern is that such hooks may bloat the C embedding API. Would it be better to use the V8 API instead such as rusty_v8?
    • The answer is that hopefully we are not going to have too many hooks.
    • Providing the C wrappers around the whole V8 API seems to be outside of scope of this PR. One of the goals is to see if we can implement the API in a way that it might be useful for other JS runtimes and engines. Though it is not strictly necessary.
    • Another approach is to see if the whole initialization process can be represented as a pipeline connecting various tasks, and then the embedder can configure the sequence of the tasks in the pipeline.
  • Why to create the new C based embedder API if the C++ embedder API provides much more freedom?
    • The main goal is to provide access to shared libnode from languages that do not support C++ interop. E.g. C#.
  • Will the new API make it it be more difficult to support and change the C++ embedding API?
    • In many cases the C API is just a thin wrapper on top of the C++ API. Hopefully it will not introduce too many issues.
  • It is worth to focus on specific use cases.
    • It is a good point and it should help us to introduce only a bare minimal API to start with. Then, we can grow it based on the new scenarios.
    • We discussed if we should start with a single threaded cases.
    • For one of my use cases it is not enough: we want to use libnode from ASP.NET where we must run multiple threads.
    • Should we have one primary Runtime and others are just the worker threads?
      • The node::Environment was introduced in Node.js to implement worker threads in Node.js.
      • Unlike the worker thread created from JS, the embedder has a control over the thread where the node::Environment is executed.
      • It maybe makes sense to have a single "root" node::Environment and others to be dependent upon it. It must address the issue with the Inspector that currently can be only attached to a single node::Environment or its child worker threads.
  • Should we support Node.js experimental features such as the snapshots and the ES6 modules.
    • Since the C embedding API is also an experimental feature, I do not see big drawbacks against it as long as the C API experimental status will be aligned with the features experimental status.
  • We have discussed the node_embedding_runtime_add_module function.
    • The function allows to add native modules that can be implemented in the same executable that embeds the libnode.
    • The implementation simply wraps the existing linked modules implementation available in the C++ embedders API.
    • We should consider to rename it to reduce confusion.
  • Why do we need to invoke the Node-API code inside of a callback for the node_embedding_runtime_invoke_node_api?
    • Unlike use of the Node-API inside of the native modules, embedders must explicitly establish the V8 Isolate context, etc and then handle the Node-API and JS errors. This function is responsible for taking case of these tasks.
    • The callback for node_embedding_runtime_on_preload and node_embedding_runtime_add_module functions use the same Node-API CallIntoModule internal function.
    • As an alternative we can return back the node_embedding_runtime_open_scope and node_embedding_runtime_close_scope functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. embedding Issues and PRs related to embedding Node.js in another project. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants