-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[async_wrap] fix fatal error during destroy() calls #9753
Conversation
b5fafdd
to
49dc16e
Compare
49dc16e
to
59c2e01
Compare
@@ -6,15 +6,14 @@ const assert = require('assert'); | |||
const async_wrap = process.binding('async_wrap'); | |||
|
|||
const storage = new Map(); | |||
async_wrap.setupHooks({ init, pre, post, destroy }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR remove destroy()
then or...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. it just makes triggering destroy() before exit a pain. and haven't figured out a good way to test it yet.
// List of id's that have been destroyed and need the destroy() cb called. | ||
inline void add_destroy_id(int64_t id); | ||
inline int64_t get_destroy_id(); | ||
inline int64_t destroy_ids_length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: size_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops. thanks.
if (destroy_ids_current_->length < arraysize(destroy_ids_current_->ids)) | ||
return; | ||
node_destroy_ids_list* old_list = destroy_ids_current_; | ||
destroy_ids_current_ = new node_destroy_ids_list{ .prev = old_list }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you explicitly initialize .length = 0
here, too? (edit: gcc doesn’t seem to like this anyway?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. i'll do it the pre c++11 way.
return; | ||
node_destroy_ids_list* old_list = destroy_ids_current_; | ||
destroy_ids_current_ = new node_destroy_ids_list{ .prev = old_list }; | ||
destroy_ids_current_->ids[0] = id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn’t this set id
twice, once in the new instance and once at the end of the prev
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. fixed.
destroy_ids_current_ = prev_list->prev; | ||
delete prev_list; | ||
} | ||
destroy_ids_length_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also set destroy_ids_current_->length = 0
;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yup. fixed.
@@ -266,6 +266,13 @@ struct node_ares_task { | |||
RB_ENTRY(node_ares_task) node; | |||
}; | |||
|
|||
// 4096 bytes on x64 and 4088 on ia32. | |||
struct node_destroy_ids_list { | |||
int64_t ids[510]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teeny-tiny nit: Could you make this the last member of the struct so that length
, prev
and the first few entries all fit into a single cache line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh. nice. done.
@@ -463,6 +472,12 @@ class Environment { | |||
|
|||
inline int64_t get_async_wrap_uid(); | |||
|
|||
// List of id's that have been destroyed and need the destroy() cb called. | |||
inline void add_destroy_id(int64_t id); | |||
inline int64_t get_destroy_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming this get_…
makes this sound a bit like a getter without side effects. How about pop_destroy_id()
or something like that? (And if you like, correspondingly push_destroy_id
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. done.
Fixed things addressed in first review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -181,6 +183,33 @@ static void Initialize(Local<Object> target, | |||
} | |||
|
|||
|
|||
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { | |||
Environment* env = Environment::from_destroy_ids_idle_handle(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a HandleScope and Context::Scope here, the env->async_hooks_destroy_function()
call below leaks a handle now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle. Since that doesn't seem to be the case I'll add it. Though I'll still leave in the HandleScope in the loop so each function call can be cleaned up.
About the first note, should we be using a SealHandleScope in all these locations so we can detect any leaked handles? Currently it's only used around uv_run, but that doesn't seem to catch leaked handles from an asynchronous call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my diff to double check the scenario:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index a06e163..fa91e1a 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target,
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
Environment* env = Environment::from_destroy_ids_idle_handle(handle);
+ v8::SealHandleScope seal(env->isolate());
+ v8::Context::Scope context_scope(env->context());
// This doesn't leak a handle.
Local<Function> fn = env->async_hooks_destroy_function();
int64_t current_id;
This never aborts. To double check I added an Object::New(env->isolate())
right after the Context::Scope()
which did cause it to abort. So I don't believe this is leaking a handle. But if I am to hoist the TryCatch
out of the loop it would be necessary to do anyway.
On the note of always using a (update: see comment below about this being incorrect)SealHandleScope
on the return of an async call. Guess that would only be useful if there was a centralized way to add the wrapper around uv_run
every time it's about to call the callback. You know if there's a way to wrap wrap all callbacks from uv_run
in a SealHandleScope
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I tried this and it surprisingly didn't abort:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index a06e163..3e57ee8 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target,
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
Environment* env = Environment::from_destroy_ids_idle_handle(handle);
+ v8::SealHandleScope seal(env->isolate());
+ v8::Context::Scope context_scope(env->context());
// This doesn't leak a handle.
Local<Function> fn = env->async_hooks_destroy_function();
int64_t current_id;
@@ -194,11 +196,12 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
if (fn.IsEmpty())
return env->erase_destroy_ids();
+ TryCatch try_catch(env->isolate());
+
while (env->destroy_ids_length() > 0) {
current_id = env->pop_destroy_id();
HandleScope scope(env->isolate());
Local<Value> argv = Number::New(env->isolate(), current_id);
- TryCatch try_catch(env->isolate());
MaybeLocal<Value> ret = fn->Call(
env->context(), Undefined(env->isolate()), 1, &argv);
Which surprises me a bit, but leads me to believe it isn't leaking a handle not wrapping the TryCatch
in a HandleScope
. Guess since it's using RAII it doesn't need the HandleScope
to clean up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget everything I said about SealHandleScope
not catching handles from an async call. My test case was incorrect. It does catch all leaked handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.
Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Something that subtle could easily lead to future pain. Will add.
On the side, I'm surprised that both Context::Scope
and TryCatch
don't leak a handle. Think the Context::Scope
would still work properly without the HandleScope
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context::Scope doesn't strictly need a HandleScope because it doesn't really do anything with handles, it just changes the active context. (The context itself is a handle, of course.)
Something similar applies to TryCatch, it merely changes the active landing pad for JS exceptions.
current_id = env->pop_destroy_id(); | ||
HandleScope scope(env->isolate()); | ||
Local<Value> argv = Number::New(env->isolate(), current_id); | ||
TryCatch try_catch(env->isolate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could lift the TryCatch out of the loop, that would be (marginally) more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup. Will do.
return env->erase_destroy_ids(); | ||
|
||
while (env->destroy_ids_length() > 0) { | ||
current_id = env->pop_destroy_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the declaration is done outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
Local<Value> argv = Number::New(env->isolate(), current_id); | ||
TryCatch try_catch(env->isolate()); | ||
MaybeLocal<Value> ret = fn->Call( | ||
env->context(), Undefined(env->isolate()), 1, &argv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache env->context()
in a handle outside the loop, it creates a new handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks. Thought it was a strong persistent conversion, which I believe doesn't create a new handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, never mind.
size_t length; | ||
node_destroy_ids_list* prev; | ||
int64_t ids[510]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you don't use a std::vector<int64_t>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I judge the implementation complexity between using and not using the STL to be comparable then I usually choose to not use the STL. Though the thought of using std::vector doesn't bother me. Can switch it if that's your preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.
@@ -181,6 +183,33 @@ static void Initialize(Local<Object> target, | |||
} | |||
|
|||
|
|||
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { | |||
Environment* env = Environment::from_destroy_ids_idle_handle(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.
Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.
Local<Value> argv = Number::New(env->isolate(), current_id); | ||
TryCatch try_catch(env->isolate()); | ||
MaybeLocal<Value> ret = fn->Call( | ||
env->context(), Undefined(env->isolate()), 1, &argv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, never mind.
size_t length; | ||
node_destroy_ids_list* prev; | ||
int64_t ids[510]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.
@trevnorris We have now seen this bug (same assert in execution.cc) on node v4.5.0, possibly because of our use of weak pointers via the 'weak' module (https://github.com/TooTallNate/node-weak ). Do you think your fix will backport well to node v4.x? Also, just to confirm, your change is not removing weak functionality, I think, so weak pointer functionality from the 'weak' module would still work, at least in node v6 -- is that correct? |
@danscales That issue will be with the module
No. It's simply delaying when the callback runs. |
@bnoordhuis comments should be addressed in commit |
96f10b0
to
7fc2efa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some final comments.
// None of the V8 calls done outside the HandleScope leak a handle. If this | ||
// changes in the future then the SealHandleScope wrapping the uv_run() | ||
// will catch this can cause the process to abort. | ||
v8::Context::Scope context_scope(env->context()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you at least add a SealHandleScope? That said, a HandleScop isn't that expensive to create and it's more robust. Debugging a crash or a leak six months from now because something changed would be highly annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misread the following stack:
FATAL ERROR: v8::HandleScope::CreateHandle() Cannot create a handle without a HandleScope
1: node::Abort() [./node_g]
2: 0x21986e3 [./node_g]
3: v8::Utils::ReportApiFailure(char const*, char const*) [./node_g]
4: v8::Utils::ApiCheck(bool, char const*, char const*) [./node_g]
5: v8::internal::HandleScope::Extend(v8::internal::Isolate*) [./node_g]
6: v8::internal::HandleScope::CreateHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g]
7: v8::internal::HandleScope::GetHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g]
8: v8::internal::HandleBase::HandleBase(v8::internal::Object*, v8::internal::Isolate*) [./node_g]
9: v8::internal::Handle<v8::internal::JSFunction>::Handle(v8::internal::JSFunction*, v8::internal::Isolate*) [./node_g]
10: v8::internal::Isolate::object_function() [./node_g]
11: v8::Object::New(v8::Isolate*) [./node_g]
12: node::AsyncWrap::DestroyIdsCb(uv_idle_s*) [./node_g]
Which is generated by the following diff when running test-async-wrap-check-providers.js
:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index 3887179..0bdde8d 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -190,6 +190,7 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
// will catch this can cause the process to abort.
v8::Context::Scope context_scope(env->context());
Local<Function> fn = env->async_hooks_destroy_function();
+ Local<Object> test = Object::New(env->isolate());
uv_idle_stop(handle);
Now I'm confused as to why the SealHandleScope
would be needed since I get the same stack even if I add the SealHandleScope
before it. Thoughts?
Anyway, I'll add the HandleScope
. Hadn't yet b/c I haven't been able to write a test that would cause the script to crash w/o it. If you have an example of one that would be excellent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a SealHandleScope at the bottom of the stack (node.cc creates one before calling uv_run()) but I figure it's better to create one here explicitly than rely on a detail from elsewhere.
v8::Context::Scope context_scope(env->context()); | ||
Local<Function> fn = env->async_hooks_destroy_function(); | ||
|
||
uv_idle_stop(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move move this to the top of the function? It looks a little incongruous here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. just habit of placing it after the variable declarations.
if (!ran_init_callback()) | ||
return; | ||
|
||
if (env()->destroy_ids_list()->size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh. yup.
7fc2efa
to
dd1601e
Compare
@bnoordhuis fixed those three comments. |
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
dd1601e
to
b49b496
Compare
This'll need to be backported to v7, and possibly v6? @nodejs/lts how far back should land? since it's related to other asyncwrap changes coming though perhaps i'll just backport to v4 to make life easier. |
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: #9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: #9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: #9753 Fixes: #8216 Fixes: #9465 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
@trevnorris Do you have a suggestion on how to fix node-weak to deal with the same problem? As I mentioned, we get a similar problem to what you fixed in node v6 (and less often in node v4) when we use node-weak extensively. Our application needs weak pointers, but we can get away without actually invoking a javascript callback when the weak object is garbage collected, if needed. So, I tried just disabling the code in weakref.cc/TargetCallback that invokes the javascript callback for weak pointers, but that did not help. So, I'm not sure if there is a much deeper problem in using SetWeak() and so there is no simple fix for weakref.cc. Thanks for any help or suggestions. Seems like it would be good to fix node-weak, since it is actually in the node test suite (test/gc/node_modules/weak) |
@danscales I'd suggest that node-weak also implement a similar implementation to d479826 using a |
@trevnorris if you could manually backport that would be rad! |
@thealphanerd Will do, and hot damn! found a bug with this that's fixed by #10400 :P So will wait for that to land before backporting this. |
@trevnorris #10400 has landed on v6.x |
@trevnorris can you backport this? |
ping @trevnorris |
@MylesBorins I’m confused, it looks like this has already landed on v6.x-staging as 59d8255...f3b0cf5? |
Perhaps this was mislabelled. My audit brought up commits, but you are correct that they are there. |
Checklist
make -j8 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
async_wrap
Description of change
Calling JS during GC is a no-no. So intead create a queue of all ids
that need to have their destroy() callback called and call them later.
Removed checking destroy() in test-async-wrap-uid because destroy() can
be called after the 'exit' callback.
Missing a reliable test to reproduce the issue that caused the
FATAL_ERROR.
Fixes: #8216
Fixes: #9465
R=@bnoordhuis