Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src,worker: fix race of WorkerHeapSnapshotTaker #44745

Merged
merged 1 commit into from
Oct 6, 2022

Conversation

ywave620
Copy link
Contributor

@ywave620 ywave620 commented Sep 22, 2022

Fix: #44515 #39686
maybe more... there are serval crash reports about worker heap snapshot

With one extra line of code, this concurency bug can be easily demonstrated and reproduced.
Since BaseObjectPtr is not thread safe(no synchronization around checking and manipulating the refcnt), the fix is to ensure WorkerHeapSnapshotTaker is not shared by worker thread and main thread

--- a/src/node_worker.cc
+++ b/src/node_worker.cc
@@ -771,28 +771,45 @@ void Worker::TakeHeapSnapshot(const FunctionCallbackInfo<Value>& args) {
       ->NewInstance(env->context()).ToLocal(&wrap)) {
     return;
   }
   BaseObjectPtr<WorkerHeapSnapshotTaker> taker =
       MakeDetachedBaseObject<WorkerHeapSnapshotTaker>(env, wrap);
 
   // Interrupt the worker thread and take a snapshot, then schedule a call
   // on the parent thread that turns that snapshot into a readable stream.
   bool scheduled = w->RequestInterrupt([taker, env](Environment* worker_env) {
+     // executed in worker thread
+    // now worker thread holds one strong BaseObjectPtr

      heap::HeapSnapshotPointer snapshot {
         worker_env->isolate()->GetHeapProfiler()->TakeHeapSnapshot() };
      CHECK(snapshot);
      env->SetImmediateThreadsafe(
+        // executed in main thread
+       // now main thread holds another strong BaseObjectPtr

         [taker, snapshot = std::move(snapshot)](Environment* env) mutable {
           HandleScope handle_scope(env->isolate());
           Context::Scope context_scope(env->context());
 
           AsyncHooks::DefaultTriggerAsyncIdScope trigger_id_scope(taker.get());
           BaseObjectPtr<AsyncWrap> stream = heap::CreateHeapSnapshotStream(
               env, std::move(snapshot));
           Local<Value> args[] = { stream->object() };
           taker->MakeCallback(env->ondone_string(), arraysize(args), args);
         }, CallbackFlags::kUnrefed);
+      
+    // wait for the main thread to finish executing the callback above
+    // remember to include <thread> and <chrono> in the head of this file
+    std::this_thread::sleep_for(std::chrono::milliseconds(2000));
+    
+    // now the worker thread holds the only one strong BaseObjectPtr refering to the
+    // WorkerHeapSnapshotTaker instance and it is destructed here, as a result, that
+    // WorkerHeapSnapshotTaker instance is destructed here, too.
+    // The destructor try to access the associated JS object, however, since that JS object 
+    // is owned by main thread's Isolate, the v8 complains:
+    // HandleScope::HandleScope Entering the V8 API without proper locking in place
   });
   args.GetReturnValue().Set(scheduled ? taker->object() : Local<Object>());
 }
 
 void Worker::LoopIdleTime(const FunctionCallbackInfo<Value>& args) {

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. worker Issues and PRs related to Worker support. labels Sep 22, 2022
@ywave620
Copy link
Contributor Author

test/sequential/test-watch-mode.mjs failed. However, it is a known issue, fixed in #44739

@ywave620
Copy link
Contributor Author

@addaleax Could you help moving this PR forward?

Any WorkerHeapSnapshotTaker instance should be fully owned
by main thread. Remove buggy access to it from the worker thread.
@ywave620
Copy link
Contributor Author

ywave620 commented Sep 29, 2022

@mojodna

Hey, any idea? :)

@theanarkh theanarkh added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 1, 2022
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 1, 2022
@nodejs-github-bot
Copy link
Collaborator

@ywave620
Copy link
Contributor Author

ywave620 commented Oct 4, 2022

Shall we proceed to get this PR landed?

Copy link
Member

@juanarbol juanarbol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; I'm AFK, unless someone else is faster than me. I will land this one. Thanks for this 💚

@theanarkh theanarkh added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 5, 2022
@ywave620
Copy link
Contributor Author

ywave620 commented Oct 6, 2022

Another 3 failure of running parallel/test-worker-heap-snapshot caused by this bug nodejs/reliability#395

@richardlau richardlau added the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 6, 2022
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 6, 2022
@nodejs-github-bot nodejs-github-bot merged commit 66cedb4 into nodejs:main Oct 6, 2022
@nodejs-github-bot
Copy link
Collaborator

Landed in 66cedb4

danielleadams pushed a commit that referenced this pull request Oct 11, 2022
Any WorkerHeapSnapshotTaker instance should be fully owned
by main thread. Remove buggy access to it from the worker thread.

PR-URL: #44745
Fixes: #44515
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test-worker-heap-snapshot crash
7 participants