src: don't call into VM from AsyncWrap destructor #9467

bnoordhuis · 2016-11-04T14:49:14Z

See #8216 and #9465, it's making node crash.

It might be possible to retain the destroy hook by maintaining a per-environment list + a uv_idle_t handle or something like that but it's a lot of work and there might be edge cases so I'm opting for simply removing the hook. Whoever disagrees volunteers to do the hard work. :-)

CI: https://ci.nodejs.org/job/node-test-pull-request/4782/

It is not allowed anymore to call JS code when collecting weakly persistent handles, it hits the assertion below: # Fatal error in ../deps/v8/src/execution.cc, line 103 # Check failed: AllowJavascriptExecution::IsAllowed(isolate). Remove the call into the VM from the AsyncWrap destructor. This commit breaks the destroy hook but that cannot be helped. Fixes: nodejs#8216

mscdex · 2016-11-04T15:14:11Z

Shouldn't the other destroy hook-related code be removed also (e.g. setting of the destroy hook in async-wrap.cc and the relevant persistent string in env.h)?

AndreasMadsen · 2016-11-04T16:48:07Z

The destroy hook is a rather important feature in async_wrap. Not just from a performance perspective (the usecase in trace), but tools like dprof would simply not be possible without the destroy hook.

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort. If somebody uses async_wrap, one should also accept the associated risk. In trace for example, I say that using it production is strongly discouraged.

/cc @nodejs/diagnostics

bnoordhuis · 2016-11-04T17:24:59Z

Shouldn't the other destroy hook-related code be removed also (e.g. setting of the destroy hook in async-wrap.cc and the relevant persistent string in env.h)?

I can do that, I'll update the PR. EDIT: https://ci.nodejs.org/job/node-test-pull-request/4787/

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort. If somebody uses async_wrap, one should also accept the associated risk.

Yeah, I don't buy that. People have filed bug reports about it twice now and in both cases it was a module somewhere in their dependency chain, not something they were using directly.

AndreasMadsen · 2016-11-04T17:44:35Z

Yeah, I don't buy that. People have filed bug reports about it twice now and in both cases it was a module somewhere in their dependency chain, not something they were using directly.

I wellcome a diffrent perspective on the policy regarding undocumented API. But trace is not something one would have deep in the dependency chain. It's something that is used directly, it actually doesn't have an API (beyond require('trace')).

holm · 2016-11-04T17:54:01Z

I reported one of the bugs. We use cls-hooked, which in turn uses async-hook, which then uses AsyncWrap. Maybe I didn't read the readme's very carefully, but as far as I can see there is nothing there warning about it being unstable or not fit for production use.

We use it to track transactions through requests, and it's not in anyway optional for us. We haven't observed any issues in production on 6.x.

In my opinion it's not a good idea to have unstable API's enabled by default, even if they are not documented. Similar to V8 experimental features, I would think a flag should be set, so users are clearly aware they are using features not recommended for production use.

addaleax · 2016-11-04T20:28:14Z

Given that async_wrap is an undocumented API I don't think that removing the destroy hook is urgent and would rather consider it a last resort.

#8216 has been open for quite a while with no sign of movement, and as it stands, the number of people willing to work on the async_hooks parts of the codebase seems to be rather overseeable. So, yeah, it sucks, but right now I don’t see a better way than this.

addaleax

LGTM with a green CI

AndreasMadsen · 2016-11-04T20:39:10Z

Could you maybe explain how this fixes the issue? The modules that uses destroy are still going to fail, just for another reason. I don't think the main issue is that destroy doesn't work, it's that people are using undocumented API without being aware of it. I would suggest that we print a warning when setupHooks() is called, similar to deprecated functionality.

edit: I have published a new version of async-hook which prints a warning when used. The documentation already said it, but now indirect users will know it too.

#8216 has been open for quite a while with no sign of movement, and as it stands, the number of people willing to work on the async_hooks parts of the codebase seems to be rather overseeable.

I'm a little sad that @nodejs/diagnostics wasn't cc'ed on this. This makes it difficult for us to function as a working group.

addaleax · 2016-11-04T20:48:33Z

I'm a little sad that @nodejs/diagnostics wasn't cc'ed on this. This makes it difficult for us to function as a working group.

Sorry, yeah. So far the only person I associated with async_hooks are Trevor (and later you); here’s a PR to tell people to @mention the diagnostics team: #9471

Qard · 2016-11-04T21:42:29Z

From APM perspective, this is probably fine. The destructor is nice for tracking handle lifetimes, but doesn't really matter for transaction tracing. I feel like there definitely should've been more care put into async_wrap in regard to runtime warnings and/or flag gating though. If a feature can be used, it will be used, even when not documented. All the people running into issues using async/await flag or generators in early koa days are plenty proof of that.

Fishrock123

Does this remove the destroy hook? Will this impact the async_hooks PR?

In either case, removing the hook renders async_wrap/async_hooks virtually useless in many cases and should be considered a last resort.

I would strongly prefer not to merge this until @trevnorris reviews it. He's out atm I think but I'll try to make sure he's on this at least first thing Monday.

bnoordhuis · 2016-11-04T23:36:53Z

Could you maybe explain how this fixes the issue? The modules that uses destroy are still going to fail, just for another reason. I don't think the main issue is that destroy doesn't work, it's that people are using undocumented API without being aware of it.

The problem is that it's categorically unsafe to call into the VM during GC, which is what the AsyncWrap destructor does when it calls the destroy hook. It's never been safe but it slipped through review before V8 started enforcing it.

Qard · 2016-11-04T23:47:56Z

Triggering the destroy hook in the next tick using uv_idle_t seems like a reasonable approach to me. As long as the handle itself is not made accessible in any way within the destroy hook, I can't see any edge cases to worry about.

Does that sound reasonable? The queued destroy hook would only be aware of the id to notify the JS side about.

jasnell · 2016-11-07T00:03:40Z

+1 to waiting to land until @trevnorris can have an opportunity to review.

trevnorris · 2016-11-07T18:03:45Z

Back. I'll look into this more today, but the short answer is I've already been working on removing destroy on GC for a different issue. Though at that point it's more like "done" or "complete". Which I'm fine changing the name to.

@bnoordhuis for a short term solution we should be able to use the same uv_idle_t that setImmediate uses. The destructor will check if the handle is weak. If so then place on the list, if not execute the callback immediately. Sound conceptually sane?

@trevnorris

see @trevnorris's comment

bnoordhuis · 2016-11-08T11:29:31Z

I don't think "non-weak == can call into the VM" is a safe assumption. Hanging it off a uv_idle_t: seems okay but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

addaleax · 2016-11-08T12:42:06Z

but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

Is there a meaningful difference from the way things are right now? I mean, my impression is that V8 can basically run GC whenever it wants to, so there would be no real way to tell a delayed execution of the hook from a delayed invocation of the GC?

I don't think "non-weak == can call into the VM" is a safe assumption.

Yeah, I’d just queue up all destroy hooks in the uv_idle_t.

trevnorris · 2016-11-08T23:57:28Z

@bnoordhuis

I don't think "non-weak == can call into the VM" is a safe assumption.

Fair enough. All destroy callbacks can be placed in the uv_idle_t. Makes things more uniform anyway.

Hanging it off a uv_idle_t: seems okay but what if env->async_hooks_destroy_function() changes between time of collection and time of dispatch?

Using some flag magic. Each active set of hooks is assigned an id. If a hook is added/removed then increment the id. Likewise have a flag that indicates whether a destructor ran and depends on the current state of hooks. If this flag is set that a destructor has been called when a hook is added/removed then make a clone of the array of hooks. The only pairing needed is the id of the handle calling destroy, and the id of the hook's state. On the next loop, run through the array of hooks and call them for any associated id.

Cost of this if no hooks are active is zero, and not noticeable even if there are active hooks.

@addaleax

[...] there would be no real way to tell a delayed execution of the hook from a delayed invocation of the GC?

Can you elaborate on this?

addaleax · 2016-11-09T00:29:24Z

Can you elaborate on this?

Mh, basically: If we decided to delay the invocation of the destroy hook until the next event loop iteration, how would the async_hooks consumer (or whatever the right word for that is) be able to tell whether the destroy hook was called later because we artificially delayed it or whether the GC run itself “accidentally” occurred at that later point in time?

bnoordhuis · 2016-11-09T09:49:47Z

Is there a meaningful difference from the way things are right now?

I don't think so but I figured I'd bring it up anyway.

trevnorris · 2016-11-09T11:20:18Z

Actually, @bnoordhuis, just realized I combined two angles of approach. Thing is that GC can no longer be allowed to trigger destroy. Because users may have references to the resource. In which case GC won't be able to clean them up, and never allow destroy to fire. So basically it always need to be triggered manually. Makes the name destroy seem poor, but we can change that (how about just done?).

With this in mind, what measurable circumstance would there be for when the destructor can't call into JS? I'd like to setup a test and begin tracing when it may not be appropriate for the destructor to call destroy.

@addaleax

how would the async_hooks consumer (or whatever the right word for that is) be able to tell whether the destroy hook was called later because we artificially delayed it or whether the GC run itself “accidentally” occurred at that later point in time?

If we need to go the route of delaying destroy calls, then we'll delay all of them. This shouldn't affect the utility of the call much because:

The most important reason for destroy was to notify the user when, say, a Map of id's and associated resources could be cleaned up. For this, exact timing is not necessary.
If timing is important then we could simply record the uv_hrtime() when the destructor is called and pass that in as the second argument to destroy. It would only trigger if any hooks were active, so no additional cost for those not using it.

bnoordhuis · 2016-11-15T18:13:19Z

We got another report, #9599. If there isn't any progress on an alternative pull request in the next few days, I'm going to go ahead and land this. I'd really like to see this fixed before the next release.

danscales · 2016-11-16T19:35:24Z

@bnoordhuis Not sure if this is the best place to ask, but is it possible that the changes to V8 that caused this bug (a change in how/when v8 is collecting weakly persistent handles) might also affect the correctness of the node-weak module (https://github.com/TooTallNate/node-weak) for newer versions of v8/node? We are running into segfaults in the garbage collector in our stressful node application that uses node-weak when we run on node 6.9.1, but we didn't have any such issue when running on node v0.12. This is true even if we get rid of any use of node-weak callback functions. We may also be getting such segfaults in node 4.5.0, but if so, they are much, much rarer.

I realize that the gc test code in the node distribution actually uses node-weak, but that is a very simple, unstressful test case.

trevnorris · 2016-11-16T20:13:20Z

@bnoordhuis I didn't consider my last comment as an approved approach. I'd still like to know when the JS callback shouldn't be run when the destructor is manually triggered? After review it seems like we should be able to manually delete classes that are now weak. Because of the safety mechanisms in place, detaching the C++ class won't cause JS to segfault. Thus, we could completely remove weak handles/requests. Thoughts?

EDIT: Couldn't we also attach the call to JS through WeakCallbackInfo::SetSecondPassCallback() for safety?

bnoordhuis · 2016-11-16T21:22:16Z

@danscales Yes, node-weak does the same thing.

@trevnorris Like we discussed in today's meeting, making everything non-weak sounds great.

jasnell · 2016-11-18T16:39:48Z

@trevnorris @bnoordhuis ... I just want to make sure I understand the "making everything non-weak" part of this as it would impact some work that I'm doing: does this mean entirely avoiding the use of MakeWeak() in new AsyncWrap subclasses?

bnoordhuis · 2016-11-18T20:40:04Z

@jasnell Correct.

trevnorris · 2016-11-21T22:24:30Z

@bnoordhuis sorry for the delay. i'm able to remove the weak handles for everything (even non asyncwrap inheriting classes) for everything except for StatWatcher. so i'll implement the fix for this using uv_idle_t to fix this. Afterward i'll submit another PR to remove as much weak handle support that i've already worked on.

trevnorris · 2016-11-23T05:44:18Z

Alternate PR at #9753

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: nodejs#9467 (comment)

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: #9467 (comment) PR-URL: #9471 Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Stephen Belanger <[email protected]> Reviewed-By: Josh Gavant <[email protected]>

addaleax · 2016-12-06T00:27:11Z

This should no longer be necessary since #9753 landed, so I’m closing this

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: #9467 (comment) PR-URL: #9471 Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Stephen Belanger <[email protected]> Reviewed-By: Josh Gavant <[email protected]>

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: nodejs#9467 (comment) PR-URL: nodejs#9471 Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Stephen Belanger <[email protected]> Reviewed-By: Josh Gavant <[email protected]>

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: #9467 (comment) PR-URL: #9471 Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Sam Roberts <[email protected]> Reviewed-By: Stephen Belanger <[email protected]> Reviewed-By: Josh Gavant <[email protected]>

bnoordhuis added async_wrap async_hooks Issues and PRs related to the async hooks subsystem. labels Nov 4, 2016

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Nov 4, 2016

cjihrig approved these changes Nov 4, 2016

View reviewed changes

squash! remove async_hooks_destroy_function

1ffda6b

AndreasMadsen mentioned this pull request Nov 4, 2016

warn about async wrap being an undocumented API Jeff-Lewis/cls-hooked#1

Merged

addaleax approved these changes Nov 4, 2016

View reviewed changes

addaleax mentioned this pull request Nov 4, 2016

doc: add the diagnostics team to cc for async_wrap #9471

Closed

Fishrock123 previously requested changes Nov 4, 2016

View reviewed changes

AndreasMadsen mentioned this pull request Nov 16, 2016

Node.js Foundation Core Technical Committee (CTC) Meeting 2016-11-16 nodejs/CTC#33

Closed

AndreasMadsen mentioned this pull request Nov 21, 2016

Compatibility with node ^7.0 AndreasMadsen/trace#31

Closed

addaleax added a commit to addaleax/node that referenced this pull request Nov 26, 2016

doc: add people to cc for async_wrap

d93c0f8

Add a group of people to the “Who to CC in issues” list as the maintainers of `async_hooks`. Ref: nodejs#9467 (comment)

addaleax closed this Dec 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: don't call into VM from AsyncWrap destructor #9467

src: don't call into VM from AsyncWrap destructor #9467

bnoordhuis commented Nov 4, 2016 •

edited

Loading

mscdex commented Nov 4, 2016

AndreasMadsen commented Nov 4, 2016

bnoordhuis commented Nov 4, 2016 •

edited

Loading

AndreasMadsen commented Nov 4, 2016

holm commented Nov 4, 2016

addaleax commented Nov 4, 2016

addaleax left a comment

AndreasMadsen commented Nov 4, 2016 •

edited

Loading

addaleax commented Nov 4, 2016

Qard commented Nov 4, 2016

Fishrock123 left a comment

bnoordhuis commented Nov 4, 2016

Qard commented Nov 4, 2016

jasnell commented Nov 7, 2016

trevnorris commented Nov 7, 2016

bnoordhuis commented Nov 8, 2016

addaleax commented Nov 8, 2016

trevnorris commented Nov 8, 2016

addaleax commented Nov 9, 2016

bnoordhuis commented Nov 9, 2016

trevnorris commented Nov 9, 2016

bnoordhuis commented Nov 15, 2016

danscales commented Nov 16, 2016

trevnorris commented Nov 16, 2016 •

edited

Loading

bnoordhuis commented Nov 16, 2016

jasnell commented Nov 18, 2016

bnoordhuis commented Nov 18, 2016

trevnorris commented Nov 21, 2016

trevnorris commented Nov 23, 2016

addaleax commented Dec 6, 2016

src: don't call into VM from AsyncWrap destructor #9467

src: don't call into VM from AsyncWrap destructor #9467

Conversation

bnoordhuis commented Nov 4, 2016 • edited Loading

mscdex commented Nov 4, 2016

AndreasMadsen commented Nov 4, 2016

bnoordhuis commented Nov 4, 2016 • edited Loading

AndreasMadsen commented Nov 4, 2016

holm commented Nov 4, 2016

addaleax commented Nov 4, 2016

addaleax left a comment

Choose a reason for hiding this comment

AndreasMadsen commented Nov 4, 2016 • edited Loading

addaleax commented Nov 4, 2016

Qard commented Nov 4, 2016

Fishrock123 left a comment

Choose a reason for hiding this comment

bnoordhuis commented Nov 4, 2016

Qard commented Nov 4, 2016

jasnell commented Nov 7, 2016

trevnorris commented Nov 7, 2016

bnoordhuis commented Nov 8, 2016

addaleax commented Nov 8, 2016

trevnorris commented Nov 8, 2016

addaleax commented Nov 9, 2016

bnoordhuis commented Nov 9, 2016

trevnorris commented Nov 9, 2016

bnoordhuis commented Nov 15, 2016

danscales commented Nov 16, 2016

trevnorris commented Nov 16, 2016 • edited Loading

bnoordhuis commented Nov 16, 2016

jasnell commented Nov 18, 2016

bnoordhuis commented Nov 18, 2016

trevnorris commented Nov 21, 2016

trevnorris commented Nov 23, 2016

addaleax commented Dec 6, 2016

bnoordhuis commented Nov 4, 2016 •

edited

Loading

bnoordhuis commented Nov 4, 2016 •

edited

Loading

AndreasMadsen commented Nov 4, 2016 •

edited

Loading

trevnorris commented Nov 16, 2016 •

edited

Loading