Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: fix race on modpending by migrating it to a thread local #21611

Closed
wants to merge 2 commits into from

Conversation

rpetrich
Copy link
Contributor

@rpetrich rpetrich commented Jun 30, 2018

Fixes a rare race condition on modpending when two native modules are loaded simultaneously on different threads by migrating modpending to be a thread local.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows [commit guidelines]

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jun 30, 2018
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Just fyi, there are some alternative approaches that address this problem more completely – #20759 goes a bit into this direction (or at least touches a lot of this code), and I wanted to fix up addaleax/node@541ec98 to work as a “proper” solution to this problem.

In particular, Node shouldn’t load addons from Workers that don’t explicitly state that they support being loaded in such an environment – Many addons create some kind of libuv handles, and don’t install proper cleanup hooks for when the Worker shuts down that close those handles

If you want to help with that, that would be appreciated a lot 💙

src/node.cc Outdated
@@ -1144,7 +1143,7 @@ extern "C" void node_module_register(void* m) {
mp->nm_link = modlist_linked;
modlist_linked = mp;
} else {
modpending = mp;
Environment::GetThreadLocalEnv()->modpending = mp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost any use of Environment::GetThreadLocalEnv() is going to be buggy, because Node.js generally seeks to support embedders who create multiple Environments per thread :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax Thanks, I was unaware of that. I've migrated to using a thread local instead, which should properly support multiple Environments per thread.

Fixes a rare race condition on modpending when two native modules are
loaded simultaneously on different threads by migrating modpending to
be a member of Environment.
@rpetrich rpetrich changed the title src: fix race on modpending by migrating it to Environment src: fix race on modpending by migrating it to a thread local Jun 30, 2018
@gabrielschulhof
Copy link
Contributor

@addaleax modules which rely on mp are not multiple init anyway, because relying on mp is equivalent to relying on library constructors, which are once-per-process-per-module.

Assuming the node environment on such a thread is correctly set up to be indistinguishable from a single-threaded node environment such addons will work correctly because the one thread where they do load is the only thread where they need to work and they will not subsequently load because the library constructors will not fire again.

In contrast, when using a special symbol a module must support multiple init and module-instance-local storage, because such a symbol will be found every time the module is loaded, no matter what thread it's on and no matter that it's multiple times on a single thread.

So, I think landing this PR will help us, and will not cause any more breakage than the current inability to load a module more than once already causes.

src/node.cc Outdated
@@ -185,7 +185,8 @@ static int v8_thread_pool_size = v8_default_thread_pool_size;
static bool prof_process = false;
static bool v8_is_profiling = false;
static bool node_is_initialized = false;
static node_module* modpending;
static uv_once_t init_once = UV_ONCE_INIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this variable should indicate that it refers to thread_local_modpending – so, like, maybe init_modpending_once?

src/node.cc Outdated
@@ -1258,6 +1259,10 @@ inline napi_addon_register_func GetNapiInitializerCallback(DLib* dlib) {
reinterpret_cast<napi_addon_register_func>(dlib->GetSymbolAddress(name));
}

void InitDLOpenOnce() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be InitModpendingOnce, not InitDLOpenOnce.

@gabrielschulhof
Copy link
Contributor

gabrielschulhof commented Jul 19, 2018

I was able to cause node --experimental-worker to abort with the following message:

node[25666]: ../src/node.cc:1783:void node::DLOpen(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `(modpending) == nullptr' failed.
 1: 0x89a0c0 node::Abort() [node]
 2: 0x89a1a5  [node]
 3: 0x89b38a  [node]
 4: 0xb14459  [node]
 5: 0xb14fc9 v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) [node]
 6: 0x685b99041bd 

by running the following code:

/* index.js */
const assert = require('assert');
const {
  Worker, isMainThread, parentPort
} = require('worker_threads');

if (isMainThread) {
  let robinHood;
  (new Worker(__filename)).on('message', (data) => {
    if (data === 'ready') {
      robinHood = JSON.stringify(require('./build/Release/robin_hood'));
    } else {
      console.log(data + '\nvs.\n' + robinHood);
      assert.notStrictEqual(data, robinHood);
    }
  });
} else {
  parentPort.postMessage('ready');
  parentPort.postMessage(JSON.stringify(require('./build/Release/friar_tuck')));
}

where

/* robin_hood.c */
#include <assert.h>
#include <node_api.h>

static napi_value Init(napi_env env, napi_value exports) {
  napi_status status;
  napi_value result;

  status = napi_create_uint32(env, 42, &result);
  assert(status == napi_ok);

  status = napi_set_named_property(env, exports, "answer", result);
  assert(status == napi_ok);

  return exports;
}

NAPI_MODULE(NODE_GYP_MODULE_NAME, Init)

and

/* friar_tuck.c */
#include <assert.h>
#include <node_api.h>

static napi_value Init(napi_env env, napi_value exports) {
  napi_status status;
  napi_value result;

  status = napi_create_string_utf8(env, "good", NAPI_AUTO_LENGTH, &result);
  assert(status == napi_ok);

  status = napi_set_named_property(env, exports, "question", result);
  assert(status == napi_ok);

  return exports;
}

NAPI_MODULE(NODE_GYP_MODULE_NAME, Init)

in a shell loop with

while node --experimental-worker ./index.js; do echo -n ''; done

It's not deterministic though so it doesn't serve as a good test.

@gabrielschulhof
Copy link
Contributor

@addaleax I believe that using NODE_MODULE_INIT() or NAPI_MODULE_INIT() instead of NODE_MODULE() and NAPI_MODULE() constitutes opt-in, especially given the NODE_MODULE_INIT() documentation we recently added.

In conclusion, using NODE_MODULE() or NAPI_MODULE() will result in a single-load module, unless the symbol given as the nm_register_func is the well-known symbol (which is most often not the case in the wild), and using NODE_MODULE_INIT() or NAPI_MODULE_INIT() will result in a contex-aware, multiple-load module. With this change, both scenarios will work correctly AFAICT.

@mhdawson
Copy link
Member

@gabrielf that all makes sense to me. I guess we just need to get this PR updated based on your comments and the it looks good to me.

@gabrielschulhof
Copy link
Contributor

@rpetrich do you think you might be able to find the time to address my comments so we can move the PR forward?

@gabrielschulhof
Copy link
Contributor

@mhdawson, @addaleax I fixed the naming. Could you please have another look?

@rpetrich
Copy link
Contributor Author

@gabrielschulhof Thanks for pushing to the PR. I'd missed the earlier comments about naming.

@gabrielschulhof
Copy link
Contributor

@rpetrich happy to help!

@gabrielschulhof
Copy link
Contributor

@mhdawson could you PTAL?

@gabrielschulhof
Copy link
Contributor

@nodejs/addon-api could someone else please also take a look?

@gabrielschulhof
Copy link
Contributor

@nodejs/collaborators could someone please have a look?

@Trott
Copy link
Member

Trott commented Aug 14, 2018

@TimothyGu TimothyGu removed their request for review August 16, 2018 00:39
@gabrielschulhof
Copy link
Contributor

@gabrielschulhof
Copy link
Contributor

@gabrielschulhof
Copy link
Contributor

@Trott
Copy link
Member

Trott commented Aug 16, 2018

I think FreeBSD is back to green. Let's try a Resume Build: https://ci.nodejs.org/job/node-test-pull-request/16501/

@gabrielschulhof
Copy link
Contributor

Landed in 3814534.

gabrielschulhof pushed a commit that referenced this pull request Aug 17, 2018
Fixes a rare race condition on modpending when two native modules are
loaded simultaneously on different threads by storing it thread-
locally.

PR-URL: #21611
Reviewed-By: Gabriel Schulhof <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
targos pushed a commit that referenced this pull request Aug 19, 2018
Fixes a rare race condition on modpending when two native modules are
loaded simultaneously on different threads by storing it thread-
locally.

PR-URL: #21611
Reviewed-By: Gabriel Schulhof <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
targos pushed a commit that referenced this pull request Sep 3, 2018
Fixes a rare race condition on modpending when two native modules are
loaded simultaneously on different threads by storing it thread-
locally.

PR-URL: #21611
Reviewed-By: Gabriel Schulhof <[email protected]>
Reviewed-By: Anatoli Papirovski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants