Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker thread aborts on terminate if native node module has an open async handle #34567

Closed
implausible opened this issue Jul 30, 2020 · 6 comments
Assignees
Labels
addons Issues and PRs related to native addons. feature request Issues that request new features to be added to Node.js. worker Issues and PRs related to Worker support.

Comments

@implausible
Copy link
Contributor

implausible commented Jul 30, 2020

  • Version: v14.6.0, v12.14.1
  • Platform: 4.15.0-1091-oem #101-Ubuntu SMP Thu Jun 25 17:55:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: worker_threads

What steps will reproduce the bug?

I have opened up a repository here that can be used to reliably crash node. Follow the steps in README.md to replicate.

How often does it reproduce? Is there a required condition?

This reproduces so long as a native node module has an active async handle for async completion.

What is the expected behavior?

Worker thread should not terminate until all handles are cleaned up. Modules should be able to exit gracefully when terminate is requested.

npm test

> [email protected] test /home/tylerw/github/test/native-node-tests
> node -e "w = new (require('worker_threads').Worker)('./runDoAsyncThing.js'); setTimeout(w.terminate.bind(w), 2000);"

Entering execute thread

What do you see instead?

npm test

> [email protected] test /home/tylerw/github/test/native-node-tests
> node -e "w = new (require('worker_threads').Worker)('./runDoAsyncThing.js'); setTimeout(w.terminate.bind(w), 2000);"

Entering execute thread
uv loop at [0x7f18151e1ad8] has open handles:
uv loop at [0x7f18151e1ad8] has 0 open handles in total
node[23076]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
 1: 0x9fb000 node::Abort() [node]
 2: 0x9fb07e  [node]
 3: 0x98e9f1  [node]
 4: 0xab2e44 node::worker::Worker::Run() [node]
 5: 0xab2e88  [node]
 6: 0x7f1817de06db  [/lib/x86_64-linux-gnu/libpthread.so.0]
 7: 0x7f1817b09a3f clone [/lib/x86_64-linux-gnu/libc.so.6]
Aborted (core dumped)
npm ERR! Test failed.  See above for more details.
@addaleax addaleax added worker Issues and PRs related to Worker support. addons Issues and PRs related to native addons. feature request Issues that request new features to be added to Node.js. labels Jul 30, 2020
@addaleax
Copy link
Member

addaleax commented Jul 30, 2020

I think this is basically looking for async versions of AddEnvironmentCleanupHook()/RemoveEnvironmentCleanupHook(). That’s possible, but demand has been pretty low since mixing Workers with libuv features is not something that usually makes sense.

@addaleax addaleax self-assigned this Jul 30, 2020
@implausible
Copy link
Contributor Author

implausible commented Jul 30, 2020

Hopefully I can represent demand from 2 of my larger native projects 😄.

NodeGit

I maintain https://github.com/nodegit/nodegit which I think would still overall benefit from async environment clean up hooks. In that library we leverage a separate thread pool from libuv (more information can be found here).

We kind of have 2 options for enabling nodegit in worker threads:

  1. we can disable the threadpool for nodegit and run it single threaded when it is on a worker thread. This I assume would try to reconfigure the use of AsyncWorker to instead queue work in the main thread instead of off thread.
  2. we can allow nodegit to have a thread pool internally, this works really well with it's model already and allows users of the library to do post-processing of data retrieved from nodegit off main thread. Imagine you were an Electron app and wanted to build the git graph from commit data retrieved from nodegit. That might be a fairly substantial synchronous workload - which would block the main thread / render thread any time that's being calculated. If you could do that work on a worker thread, you would definitely still benefit from doing that work in a worker thread while also leveraging a threadpool for nodegit (retrieving commit data and the like).

NSFW (Node Sentinel File Watcher)

I maintain https://github.com/axosoft/nsfw.git. This is a custom built recursive file watcher that replaces the standard node capabilities off thread and at an OS level. It implements linux, osx, and windows file watching and requires at least one internal thread.

When file events are arriving, depending on the scale of the operation occurring in the folder you're watching, you might have to do a lot of synchronous work in javascript to find an important event. It would be nice if we could leverage NSFW in a worker thread instead of on the main thread, so that we could handle those events and do more javascript parsing of what's important off main thread and only communicate interesting changes back to the main thread.

Again my example here is imagine you're a text editor built in Electron. If you're watching an entire directory tree, you might care about events like changes to .gitignore. As well as active file changes. You might want to parse changes to the gitignore and calculate changes to full directory structures on a worker thread and have those results be posted to the main thread when the heavy lifting is done.

I think eliminating the potential double hop of retrieving events in the main thread, shipping them off to a worker thread to be parsed, and then having them come back from the worker thread for response can be limiting architecturally.

NSFW also suffers a similar abort crash through it's use of N-API. I'd love to see that resolved as well.


Hopefully these help establish why it could make sense to leverage async behavior even in worker threads. I believe that even though I'm mentioning Electron, providing the async clean up at this level would allow WebWorker in Electron to handle these concerns as well in a future release.

addaleax added a commit to addaleax/node that referenced this issue Jul 31, 2020
Sometimes addons need to perform cleanup actions, for example
closing libuv handles or waiting for requests to finish, that
cannot be performed synchronously.

Add C++ API and N-API functions that allow providing such
asynchronous cleanup hooks.

Fixes: nodejs#34567
@addaleax
Copy link
Member

#34572 should provide APIs that enable this, feel free to take a look :)

@implausible
Copy link
Contributor Author

First I just want to thank you for such a quick response and an unexpectedly quick solution.


Ok I have read over those changes. If I'm understanding everything correctly, the async cleanup hook will provide a finalize function pointer to FinishAsyncCleanupHook to the user of the library via a callback hook provided to the library from the user. When the environment is going to shut down, the user's hook will be called. When the user of the library has completed their async work safely and is ready to shutdown, the user of the library should call FinishAsyncCleanupHook which will complete termination of the environment?

If that's what we're looking at I think that would solve my use-case.

One more question for you, will this keep the v8 context alive until the FinishAsyncCleanupHook has been called?

Part of doing async work is frequently built around persistent handles of v8 objects that have external pointers associated with them (like ObjectWrap). The usual flow is to persist the object that the pointer is attached to long enough for the async work to complete the execute thread. If any of those objects were to be marked for GC while the execute thread is in motion, but before the library has finished shutting down its async work, that would definitely lead to a segfault.

@addaleax
Copy link
Member

@implausible Just to avoid misunderstandings, FinishAsyncCleanupHook() is not a public API, and not exposed in any way. The intent is that the user library should call the callback (which currently is named FinishAsyncCleanupHook in the PR but might be another function at some point in the future) once it’s done, yes.

One more question for you, will this keep the v8 context alive until the FinishAsyncCleanupHook has been called?

Yes. However, you are not allowed to perform any calls into JS anymore.

Part of doing async work is frequently built around persistent handles of v8 objects that have external pointers associated with them (like ObjectWrap). The usual flow is to persist the object that the pointer is attached to long enough for the async work to complete the execute thread. If any of those objects were to be marked for GC while the execute thread is in motion, but before the library has finished shutting down its async work, that would definitely lead to a segfault.

The persistent handles remain alive, but keep in mind that the cleanup hook should also tear all of them down at some point before it is finished, otherwise you’ll end up with a memory leak.

@implausible
Copy link
Contributor Author

Perfect, I can understand all of that 😄! Thank you again for your help.

addaleax added a commit that referenced this issue Aug 8, 2020
Sometimes addons need to perform cleanup actions, for example
closing libuv handles or waiting for requests to finish, that
cannot be performed synchronously.

Add C++ API and N-API functions that allow providing such
asynchronous cleanup hooks.

Fixes: #34567

PR-URL: #34572
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
codebytere pushed a commit that referenced this issue Aug 11, 2020
Sometimes addons need to perform cleanup actions, for example
closing libuv handles or waiting for requests to finish, that
cannot be performed synchronously.

Add C++ API and N-API functions that allow providing such
asynchronous cleanup hooks.

Fixes: #34567

PR-URL: #34572
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
addaleax added a commit that referenced this issue Sep 22, 2020
Sometimes addons need to perform cleanup actions, for example
closing libuv handles or waiting for requests to finish, that
cannot be performed synchronously.

Add C++ API and N-API functions that allow providing such
asynchronous cleanup hooks.

Fixes: #34567

PR-URL: #34572
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Gabriel Schulhof <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. feature request Issues that request new features to be added to Node.js. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants