Node.js Collaboration Summit diagnostics discussion minutes #95

watson · 2017-05-04T13:56:56Z

These are my raw notes taken at the Node.js Collaboration Summit in Berlin. Sorry for the mess 😅. I'll update when new stuff comes up. Feel free to add stuff I missed 😃

Day 1: 2017-05-04

Problem: How do we make it easier for APM vendors to instrument node-core and userland modules without having to monkey-patch everything (even after AsyncHooks lands)?

Suggestion: We need to make it easier for module owners to know how to structure and build their modules so they are easier to instrument. We can make a guide on https://nodejs.org/en/docs/guides/ that specifies how module owners should make their modules easy to instrument.

Topic ideas:

How to use the AsyncHooks Embedder API
How to send detailed tracing events to APM vendors using some API

Other notes:

This should be a living document
Discussion on this could be had on a PR that’s opened to the nodejs.org website creating this guide

Problem: Can we standardise the way tracing information is sent from an instrumented module (called module-x below) to an APM agent?

Suggestions:

Maybe we can use the V8 trace events API
- Pros: No need for module-x to know about the APM agents
- Cons:
  - Currently no way to bind the trace event to the current context
  - Might be expensive to cross the barrier into C-land
  - Might not be detailed enough
  - Overhead of running in production?
Alternatively invent a new API in JavaScript land
- This could either be in Node core or a userland module
- If it’s a userland module, then module-x needs to detect if it’s present before it can send events to it. The APM agents need to do the same.
  - What if there’s multiple versions of this module installed?
- Research if other languages have something similar

sam-github · 2017-05-05T12:09:18Z

Hooks should simplify APM code, but there is no way to get context data (fs.open()'s path, HTTP path/method, etc.), so does not remove the need to monkey-patch the API surface for APM's that need such information (hooks alone are sufficient for CLS style code that only needs to track continuations).

The Embedder API now exists - it didn't at time of NINA Austin. It allows thirdparty modules to interact with the hooks to indicate "intent" and provides a uniform solution of the "connection pooling" aka "async queueing" problem (cf. https://docs.google.com/document/d/1tlQ0R6wQFGqCS5KeIw0ddoLbaSYx6aU7vyXOkv-wvlM/edit#). This is great, APMs look forward to this, it decreases the need for continuous adaption of our code bases to the changes in internals of thirdparty modules, because they can maintain their use of the Embedder API themselves.

APM builders present:

danielkhan, dynatrace
thomas watson, opbeat
sam roberts, strong-agent, concurix, appmetrics
jkrems, groupon (which has an internal "APMish" framework)

Promises are problems, interaction with utask queue is hard to track, and we need to patch v8 promises and the bluebird promises (at least)

Q: do/will hooks help with that?

Mongo has an APM API so monkey-patching is not necessary, and they maintain the API themselves. This is a great pattern, perhaps we should more clearly advocate it as something that npm packages that want to support APM reporting can do, perhaps we can even advocate a specific API/API pattern. cf. http://mongodb.github.io/node-mongodb-native/2.0/reference/management/apm/

elasticsearch may get an APM API soon (according to @watson)

Maybe we can get some common push behind an API for this, and put out a blog
with examples showing how to use the Embedder API and also solve the app-specific
info collection problem. The information on the app-specific could perhaps
be emitted as V8 trace events. We were all quite interested in publishing a guide that clearly reflects what APM vendors would like packages to do so that they would be easier to support reporting on by APM builders, and what the intended use of the Embedder API is.

We can could work on this by PRing a guide.

We should start talking about it in the diagnostics meetings.

V8: does it have js APIs? Does it have levels? Is the timing of it acceptable,
will it work us? Can trace events be used?

async hooks:

matthewloring · 2017-05-05T20:43:11Z

I wish I could have attended this! Just to speak quickly to:

Promises are problems, interaction with utask queue is hard to track, and we need to patch v8 promises > and the bluebird promises (at least)

Q: do/will hooks help with that?

I recently worked with the V8 team and @trevnorris to have Promise Hooks (interface, design doc) added to V8 so that async hooks can observe promise lifecycles. I have also prototyped an integration of these promise hooks into async hooks but we decided to wait to add this support until the existing async hooks PR lands as it is already quite complex. So, I guess the answer is hopefully async hooks should help once the initial implementation + promise hook integration follow-on have both landed.

danielkhan · 2017-05-09T13:17:14Z

It was daniel khan, dynatrace :)

I would prefer more of a standard over a blog. It can live under /guides on the Node.js website. I can think of a blog post that describes the problems APM vendors are facing today, though. This would give us something to pass on to customers to baseline their expectations when it comes to Node monitoring. I could sketch something vendor neutral out for you to review, ok?

sam-github · 2017-05-09T15:50:31Z

@danielkhan Sorry Daniel! Updated. I agree, a guide makes more sense (with a blog to point out its existence)

mhdawson · 2017-05-09T21:29:13Z

Guide + intro blog with pointer was what we settled on in the discussions.

joshgav · 2017-05-11T17:22:43Z

As you all were discussing this topic last week (and I sadly couldn't join!) we were busy at Microsoft preparing https://github.com/Microsoft/node-diagnostic-channel for use with a couple of our APM-like products. Take a look at the README, particularly the Why section; I think the diagnostic-channel project may be a good start to addressing the problems discussed here. We're open to developing it further with your help.

Also, I think it would be a small step to integrate diagnostic-channel with trace_events, particularly with the help of @jasongin's JS Tracing API proposal. That would enable us to experiment with both a) sending diagnostic messages to diagnostic-channel directly, and b) sending them to trace_events and then to diagnostic-channel.

mike-kaufman · 2017-12-17T23:42:43Z

Per discussion in Diag WG Meeting on 2017-12-13, we decided to open a seperate issue (#134) to track the latter issue here ("Can we standardise the way tracing information is sent from an instrumented module"), and we decided to close this one.

Plz re-activate If something else in this thread is lost & needs tracked.

AndreasMadsen added the diag-agenda label May 4, 2017

joshgav mentioned this issue Jul 10, 2017

Diag WG Meeting - 2017-07-20 #96

Closed

This was referenced Jul 10, 2017

WG Meeting (insert date) naugtur/wg-meeting-cli#1

Closed

WG Meeting 2017/9/8 naugtur/wg-meeting-cli#2

Closed

This was referenced Oct 26, 2017

Diag WG Meeting - 2017-11-01 #108

Closed

Diag WG Meeting - 2017-11-29 #114

Closed

mike-kaufman mentioned this issue Nov 14, 2017

Diag WG Meeting - 2017-11-15 #113

Closed

This was referenced Dec 13, 2017

Diag WG Meeting - 2017-12-13 #115

Closed

Eliminate Monkey Patching for Diagnostic Instrumentation #134

Closed

mike-kaufman closed this as completed Dec 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node.js Collaboration Summit diagnostics discussion minutes #95

Node.js Collaboration Summit diagnostics discussion minutes #95

watson commented May 4, 2017

sam-github commented May 5, 2017 •

edited

Loading

matthewloring commented May 5, 2017

danielkhan commented May 9, 2017 •

edited

Loading

sam-github commented May 9, 2017

mhdawson commented May 9, 2017

joshgav commented May 11, 2017

mike-kaufman commented Dec 17, 2017

Node.js Collaboration Summit diagnostics discussion minutes #95

Node.js Collaboration Summit diagnostics discussion minutes #95

Comments

watson commented May 4, 2017

Day 1: 2017-05-04

sam-github commented May 5, 2017 • edited Loading

matthewloring commented May 5, 2017

danielkhan commented May 9, 2017 • edited Loading

sam-github commented May 9, 2017

mhdawson commented May 9, 2017

joshgav commented May 11, 2017

mike-kaufman commented Dec 17, 2017

sam-github commented May 5, 2017 •

edited

Loading

danielkhan commented May 9, 2017 •

edited

Loading