Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: share a single napi_threadsafe_function for all EventQueues #727

Closed
jrose-signal opened this issue May 7, 2021 · 3 comments · Fixed by #739
Closed

Optimization: share a single napi_threadsafe_function for all EventQueues #727

jrose-signal opened this issue May 7, 2021 · 3 comments · Fixed by #739

Comments

@jrose-signal
Copy link
Contributor

Today each EventQueue has its own napi_threadsafe_function to send tasks to the Node event loop. However, nodejs/node#38506 (by @indutny-signal) makes it faster to run multiple calls on the same napi_threadsafe_function than to hop between them. If Neon saves a single napi_threadsafe_function in its instance data, it can be shared among all EventQueues for a particular instance.

One complication with this is the reference/unref/has_ref API on EventQueue, which uses the lifetime of the napi_threadsafe_function to keep the main event loop alive. This would have to be implemented with refcounting alongside the single napi_threadsafe_function:

  1. The napi_threadsafe_function starts as unref with a refcount of 0.
  2. Creating an EventQueue adds one to the refcount.
  3. Going from 0 to 1 calls napi_ref_threadsafe_function.
  4. reference and unref modify the refcount only if they change the local has_ref flag.
  5. Going from 1 to 0 calls napi_unref_threadsafe_function.
indutny-signal pushed a commit to indutny/neon that referenced this issue May 18, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change EventQueue will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
indutny-signal pushed a commit to indutny/neon that referenced this issue May 18, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change EventQueue will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
indutny-signal pushed a commit to indutny/neon that referenced this issue May 18, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change EventQueue will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
indutny added a commit to indutny/neon that referenced this issue May 19, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change EventQueue will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
@kjvalencik
Copy link
Member

This is an interesting idea and I could see how it could be beneficial in many use cases. I think if this feature were added, I would want to maintain both behaviors because there are valid use cases for maintaining multiple queues (for example, pre-empting events in other queues).

One ergonomic way to handle this might be cx.queue() returns the shared queue (more common and better in most cases), while EventQueue::new() creates an independent queue.

The design space is probably large enough that an RFC would be beneficial.

@indutny-signal
Copy link
Contributor

(Copy-pasting my response from Slack for posterity)

@kjvalencik I see what you mean, but the behavior of threadsafe function in n-api is intentionally underspecified. It provides you with a way to invoke a function on JS thread from another non-JS thread and that's it. The fact that they use different uv_async_t under the hood is an implementation detail, not a feature!

On a more technical side, though, users won't preempt event queue by managing several different instances. What would happen is in fact the opposite, call one event queue fast enough and the other one is going to wait for callbacks of the first one to complete.

With all of this in mind, I would say that that PR might be best viewed as a performance optimization and not as a behavior or API change. FWIW, it even creates new EventQueue instances so the API surface is exactly the same before and after this change.

Let me know if I can elaborate more on it.

@kjvalencik
Copy link
Member

the behavior of threadsafe function in n-api is intentionally underspecified

It does specify that threadsafe functions are different queues. This is especially critical for managing both unbounded and bounded queues. Neon does not currently support bounded queues, but I think we need to be careful not to inhibit this design space in the future.

On a more technical side, though, users won't preempt event queue by managing several different instances.

This appears to be fairly large change in how N-API operates and it will depend on the Node version.

I do not consider this purely a performance optimization because it fundamentally changes the way event queue operates and also impacts future API design of bounded queues.

If you feel strongly about this feature, I encourage you to open up a PR against the RFCs repo where the design can be discussed in more depth. https://github.com/neon-bindings/rfcs

indutny-signal pushed a commit to signalapp/neon that referenced this issue May 19, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change EventQueue will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
indutny added a commit to indutny/neon that referenced this issue Jun 1, 2021
Node.js optimizes subsequent ThreadsafeFunction invocations to happen
during the same event loop tick, but only if the same instance of
ThreadsafeFunction is used. The performance improvement is most
noticeable when used in Electron, because scheduling a new UV tick in
Electron is very costly.

With this change `cx.queue()` will use an
existing instance of ThreadsafeTrampoline (wrapper around
ThreadsafeFunction) if compiled with napi-6 feature, or it will fallback
to creating a new ThreadsafeFunction per EventQueue instance.

Fix: neon-bindings#727
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants