Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error when using N-API async code from process exit hooks #591

Closed
rolftimmermans opened this issue Nov 6, 2019 · 9 comments
Closed
Labels

Comments

@rolftimmermans
Copy link
Contributor

rolftimmermans commented Nov 6, 2019

When calling N-API based async code (in particular written with none-addon-api) from a process exit hook process.on("exit") it leads to a fatal error.

This is very easy to reproduce. For example, take the asyncworker.js tests and put the last block within a process.on("exit") callback. Like this

Running as node --expose-gc test/asyncworker.js the process is aborted and I see the following stack trace. Tested with Node.js 12.12.0 on macOS 10.14.

FATAL ERROR: Error::Error napi_create_reference
 1: 0x100b6a52a node::Abort() (.cold.1) [/usr/local/bin/node]
 2: 0x100081f70 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x100082098 node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 4: 0x100081f79 node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 5: 0x100060554 napi_open_callback_scope [/usr/local/bin/node]
 6: 0x104c01746 Napi::Error::Fatal(char const*, char const*) [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
 7: 0x104c0588b Napi::Error::New(napi_env__*) [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
 8: 0x104c066db Napi::FunctionReference::Call(napi_value__*, std::initializer_list<napi_value__*> const&) const [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
 9: 0x104c06e42 Napi::AsyncWorker::OnError(Napi::Error const&) [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
10: 0x104c05d59 Napi::AsyncWorker::OnWorkComplete(napi_env__*, napi_status, void*)::'lambda'()::operator()() const [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
11: 0x104c04191 Napi::AsyncWorker::OnWorkComplete(napi_env__*, napi_status, void*) [/Users/rolftimmermans/Code/vendor/node-addon-api/test/build/Release/binding.node]
12: 0x100061e1d (anonymous namespace)::uvimpl::Work::AfterThreadPoolWork(int) [/usr/local/bin/node]
13: 0x100691aec uv__work_done [/usr/local/bin/node]
14: 0x100695037 uv__async_io [/usr/local/bin/node]
15: 0x1006a4426 uv__io_poll [/usr/local/bin/node]
16: 0x100695472 uv_run [/usr/local/bin/node]
17: 0x10003ddb2 node::Environment::CleanupHandles() [/usr/local/bin/node]
18: 0x10003df13 node::Environment::RunCleanup() [/usr/local/bin/node]
19: 0x1000b6f8a node::NodeMainInstance::Run() [/usr/local/bin/node]
20: 0x10005fd0c node::Start(int, char**) [/usr/local/bin/node]
21: 0x7fff6dd953d5 start [/usr/lib/system/libdyld.dylib]
[1]    32869 abort      node --expose-gc test/asyncworker.js

I would expect:

  1. the code to remain working as is, or
  2. any async code to not be called anymore and the process to exit cleanly, or
  3. (edit) a user-friendly JS exception to be thrown

I am running into this issue with zeromq.js, where I can't control users calling into the native code from an exit hook and this leads to similar fatal errors. Maybe there could be a way to check if the process/thread is exiting from within the native code?

@KevinEady
Copy link
Contributor

Hi @rolftimmermans ,

I don't think you can perform asynchronous cleanup on the exit event, see:

➜  ~ node -e 'process.on("exit", ()=>{console.log("exit"); setTimeout( ()=>{ console.log("timeout!") }, 10000); });'
exit

@rolftimmermans
Copy link
Contributor Author

That makes sense. I just prefer to avoid the crash. 🙂

@KevinEady
Copy link
Contributor

Yeaaah that makes sense 😄 Well, I think this is an underlying napi issue. I would imagine the napi_create_async_work should return an error if the env is cleaned up, but I don't know the internals too much. Thoughts @gabrielschulhof ?

@rolftimmermans
Copy link
Contributor Author

rolftimmermans commented Nov 6, 2019

It might be bigger than only napi_create_async_work, I posted this just as an example. FWIW I first ran into this behaviour with various N-API methods inside uv_poll_t/uv_poll_start callbacks within the on-exit handler. (I use libuv directly here because there is no N-API equivalent.) Some appear to work (creating objects?) and others don't (resolving promises, setting object properties...?), but the resulting fatal error is the same as above.

Thinking about this more I'd be really happy if node-addon-api could detect this situation and throw a sensible JS exception.

@mhdawson
Copy link
Member

mhdawson commented Jan 9, 2020

I took a quick look and in node_main_instance.cc were the main loop is and EmitExit is called to trigger the "exit" event I don't see code that sets a bit etc that we could check to see that the process is existing.

EmitExit does as boolean property _exiting on the process object but checking that from C code would not make sense.

Even if we added something that could be easily checked in the C N-API methods (or by node-addon-api through a new method) I'm not sure the performance cost doing that check would be acceptable to provide a better error in this case.

@rolftimmermans
Copy link
Contributor Author

Is it possible to check such a condition after an error occurs? It's just that the crash that happens now is very unclear. For a consumer of an N-API library it's not easy to figure out why this is happening. I don't expect this to work, just a clearer (maaaybe, if possible, even a custom) error message...

@mhdawson
Copy link
Member

There would have to be some sort of crash handler which would be able to figure out

  • code was running as a result of the exit hook
  • async code had caused the crash

I don't have any good ideas of how to do that easily.

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Dec 16, 2020
@jupp0r
Copy link

jupp0r commented Jan 11, 2021

I'm also seeing this from native code that's calling back into Javascript via a ThreadSafeFunction. It's really hard to synchronize cleanup of these callbacks with the env context going away during shutdown. Having an API for querying the liveliness of env would really help here (env->can_call_into_js() being exposed to native code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants