Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executive summary: Introducing a CLS-like API to Node.js core #807

Closed
vdeturckheim opened this issue Jan 24, 2020 · 37 comments
Closed

Executive summary: Introducing a CLS-like API to Node.js core #807

vdeturckheim opened this issue Jan 24, 2020 · 37 comments

Comments

@vdeturckheim
Copy link
Member

vdeturckheim commented Jan 24, 2020

After a discussion with @Qard and @puzpuzpuz, it seems that the two current PRs are taking very different approaches and we did not find a common ground to merge them into one single PR.

The following document has been authored by me and reviewed by @Qard, @puzpuzpuz and @Flarna. It aims at giving full context to the TSC regarding both PRs.

Please let us know if there is any point we should clarify to help here.

Context: 3 PRs and 1 consensus

Context Document: Making async_hooks fast (enough)

On the TSC meeting of 2020-JAN-22, the TSC reached consensus
regarding the need to have an Asynchronous Storage API in core.

Three PRs related to this topic are currently open, out of simplicity, we will refer to them by a name as of:

PR Author Name
#30959 @mcollina & @Qard executionAsyncResource
#31016 @puzpuzpuz AsyncLocal
#26540 @vdeturckheim AsyncContext

The AsyncLocal proposal relies on the executionAsyncResource API.
The AsyncContext proposal aims at working without executionAsyncResource, but should be rebased over executionAsyncResource when it is merged. A userland version of this API is available for testing purpose.

The rest of this document aims at comparing the AsyncLocal and the AsyncContext proposals.
Both of these proposal introduce a CLS-like API to Node.js core.

Naming

Both proposals introduce a new class in the Async Hooks module.
One is named AsyncContext and the other is named AsyncLocal.

Also, the name AsyncStorage has been discussed earlier.

This topic can easily be covered as a consensus on any name can be ported to any proposal.

.NET exposes an AsyncLocal class.

Interfaces

AsyncLocals and AsyncContexts expose different interfaces:

AsyncContexts

const asyncContext = new AsyncContext();
// here context.getStore() will return undefined
asyncContext.run((store) => {
    // store is a new instance of Map for each call to `run`
    // from here asyncContext.getStore() will return the same Map as store
    const item = {};
    store.set('a', item);
    asyncContext.getStore().get('a'); // returns item
    asyncContext.exit(() => {
        // from here asyncContext.getStore() will return undefined
        asyncContext.getStore(); // returns undefined
    });
});

AsyncContext also provide synchronous entrypoints but documentation highlights the risks of using them.

AsyncLocal

const asyncLocal = new AsyncLocal();
const item = {};
asyncLocal.get(); // will return undefined
asyncLocal.set(item); // will populate the store
asyncLocal.get(); // returns item
asyncLocal.remove(); // disable the AsyncLocal
asyncLocal.get(); // will return undefined
asyncLocal.set(item); // will throw an exception

Synchronous vs. Asynchronous API

As the examples show, AsyncLocal exposes a synchronous API and AsyncContext
exposes an asynchronous one.

The synchronous API is unopinionated and is very async/await friendly.

The asynchronous API defines a clear scope regarding which pieces of code will have
access to the store and which ones will not be able to see it. Calling run is an asynchronous operation that executes the callback in a process.netxTick call.
This is intended in order to have no implicit behavior that were a major issue according to the domain post mortem. It is expected that the API will be used to provide domain-like capabilities.

A synchronous API has been added to AsyncContext too:

  • enterSync/exitSync which do not enforce scoping
  • runAndReturn(cb)/exitAndReturn(cb) which run the callback synchronously. The store is only available within the callback,

Eventually, an asynchronous API could be added to AsyncLocal if there is a need for it.

Stopping propagation

AsyncContext exposes a method named exit(callback) that stops propagation of the context through the following asynchronous calls.
Asynchronous operations following the callback cannot access the store.

With AsyncLocal, propagation is stopped by calling set(undefined).

Disabling

An instance of AsyncLocal can be disabled by calling remove. It can't be used anymore after this call. Underlying resources are freed when the call is made, i.e. no strong references for the value remain in AsyncLocal and the internal global async hook is disabled (unless there are more active AsyncLocal exist).

AsyncContext does not provide such method.

Store type

AsyncContext
AsyncContext.prototype.getStore will return:

  • undefined
    • if called outside the callback of run or
    • inside the callback of exit
  • an instance of Map

AsyncLocal
AsyncLocal.prototype.get will return:

  • undefined if AsyncLocal.prototype.set has not been called first
  • any value the user would have given to AsyncLocal.prototype.set

Store mutability

AsyncContext propagates it's built in mutable store which is accessible in whole async tree created.

AsyncLocal uses copy on write semantics resulting in branch of parts of the tree by setting a new value. Only mutation of the value (e.g. changing/setting a Map entry) will not branch off.

Overall philosophy

AsyncLocal is a low-level unopinionated API that aims at being used as a foundation by ecosystem packages.
It will be a standard brick upon which other modules are built.

AsyncContext is a high-level user-friendly API that cans be used out of the box by Node.js users.
It will be an API used directly by most users who have needs for context tracing.

Next steps

After an API (AsyncContext, AsyncLocal or another potential API) is merged, this roadmap might be followed:

  1. Releasing the API in the current version of Node.js (as experimental)
  2. Backporting the API to currently supported versions of Node.js (as experimental)
  3. Defining conditions for the API to get out of experimental
  4. Moving the API to its own core module and alias it from Async Hooks (tentatively for Node.js 14)
  5. Move the API out of experimental (tentatively when Node.js 14 becomes LTS)

This will enable us to iterate over Async Hook and maybe bring breaking changes to it
while still providing an API filling most of Node.js users need in term of tracing through
a stable API.

EDITS: Status afyer the original document

Current status on Feb 4th 2020

executionAsyncResource
PR is still blocked as some resource mismatch in init hooks and executionAsyncResource().
This seems to be on a path to resolve.
Also, reused resources are still exposed which can introduce a memory leak if a destroy hook is not used. One of the main point of executionAsyncResource is to get rid of the need for a destroy hook originally.

AsyncLocal
PR is blocked by the executionAsyncResource issue.

AsyncContext
The PR can be merged and rebased over executionAsyncResource later.

There has been a few iterations regarding synchronous entrypoint.
After advise from @Qard comment and @Flarna comment only methods taking callbacks have been kept:

  • asyncContext.runAndReturn(cb): runs the callback synchronously. The context is entered before running the callback and exited when the callback has run.
  • asyncContext.run(cb): works the same as runAndReturn but asynchronously (within a process.nextTick).

The difference between the two entrypoints concerns error management as the asynchronous method will not throw errrors. Also, exit and exitAndReturn can be used to stop propagation.

Exposing unscoped methods (without callback) would introduce the following behavior:

emitter.on('x', () => { enterSync() });
emitter.on('x', => { /* this code is within context */ });
emitter.on('x', () => { exitSync() });
emitter.on('x', => { /* this code is outside context */ });
@sam-github
Copy link
Contributor

AsyncLocal is a low-level unopinionated API that aims at being used as a foundation by ecosystem packages. It will be a standard brick upon which other modules are built.

AsyncContext is a high-level user-friendly API that cans be used out of the box by Node.js users.
It will be an API used directly by most users who have needs for context tracing.

Given this, AsyncContext should be implementable on top of AsyncLocal, unifying the proposals.

If not, that would suggest to me that AsyncLocal is not an unopinonated building block.

Can either be used to implement domains? Concurrent existence of multiple async tracking implementations has been an implementation burden in the past, I'm not sure what the current state is.

I'm curious, there is a relatively finite set of projects currently wanting something like this, their opinions would be the most valuable, IMHO:

  • web frameworks (for example loopback used to use CLS, IIRC @bajtos), and there is a web-frameworks WG spinning up, I wonder if they have an opinion? @wesleytodd
  • apm vendors: have any of the vendors expressed an opinion, or a commitment to migrate to one of these APIs?
  • the continuation-local-storage project
  • any long stack tracing projects
  • domains users... cc: @misterdjules In case this is still an area of interest to you.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jan 24, 2020

Given this, AsyncContext should be implementable on top of AsyncLocal, unifying the proposals.

Yes, that's correct. It's possible to use AsyncLocal to implement AsyncContext on top of it.

Can either be used to implement domains?

That's feasible with AsyncLocal at least, but it would require adding support for onChange callbacks into the API (i.e. extending the API), similar to the one that was build by @Flarna in nodejs/node#27172

@wesleytodd
Copy link
Member

wesleytodd commented Jan 24, 2020

We just had a discussion about this topic in our Web Server Frameworks Team kick off.

Pinging the whole team! @nodejs/web-server-frameworks

@mcollina
Copy link
Member

I would recommend everybody in this thread to read https://docs.google.com/document/d/1g8OrG5lMIUhRn1zbkutgY83MiTSMx-0NHDs8Bf-nXxM/edit?usp=sharing, and possibly include this in the brief on top. It explains the reationale behind executionAsyncResource and I think it's very relevant.

TL;DR the destroy hook is the source of a good part of the overhead, especially in promises-heavy codebases.

@vdeturckheim note that I'm the original author of executionAsyncResource, it just took so long to ship and I had no more time to work on it.


I think a very important factor in making a decision is the overhead introduced by each API. So, I would prefer to have benchmarks implemented, and compare the two.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Jan 24, 2020

@sam-github that's great questions/points!

I will try to answer from my multiple perspectives as I worked on domain, wrote one of the PR and maintain a commercial tool relying on such features.

Given this, AsyncContext should be implementable on top of AsyncLocal, unifying the proposals.

That's correct, however, I don't believe AsyncLocal is the right level of abstraction here:

  • executionAsyncResource will already provide a low level layer to build APIs upon if needed. Here, I see AsyncLocal as a non opiniated sugar over executionAsyncResource
  • As an APM/RASP vendor having userland modules (and their forks) here is a major risk. For instance, the CLS-hooked breaks the CLS if both are present (the CLS stores are just wiped out). This as actually caused issues with some users of my tool.
    From this, I believe the right abstraction is to provide a user-friendly high-level API so the need for ecosystem modules is reduced. This is the only way to really bring stability to that field.

Can either be used to implement domains? Concurrent existence of multiple async tracking implementations has been an implementation burden in the past, I'm not sure what the current state is.

I think domains should not be re-re-writen except if we want to get rid of Async Hooks. I mostly cleaned out the multiple impacts of domain in the codebase to move it over Async Hooks and I think this is a state we can keep for now.
Also, one of the goal of the Asynchronous API of AsyncContext is to prevent the implicit behavior of domains which raised a lot of issues for users.
I would like user to move to a safe domain-like API that can be built over AsyncContext with some breaking changes (much needed for safety).

apm vendors: have any of the vendors expressed an opinion, or a commitment to migrate to one of these APIs?

As an APM/RASP vendor, I built AsyncContext as the API I wished I had in Node.js. I would not rebuild AsyncContext over AsyncLocal in my agent but use executionAsyncResource for sure.

One of the reasons I want such API in core is to align the ecosystem on context tracking.
We want to be able to go at packages creator and tell them that their packages break the standard API and PR their code to fix it. If we go with a lower level implementation and let the ecosystem build higher level modules, we will not be able to do that. This will prevent me as an APM/RASP vendor from providing a bullet proof tool to my users.

the continuation-local-storage project

@othiym23 is pretty positive regarding the AsyncContext PR and I am sure there will be some alignments we can make.

any long stack tracing projects

IMO, this is not the right abstraction level for long stacktrace API as it will still need to rely on Async Hooks

@vdeturckheim
Copy link
Member Author

@mcollina good points. I have updated the doc and added you as author of the executionAsyncResource PR.

Thanks a lot!

@sam-github
Copy link
Contributor

@mcollina 's point about performance is important, its hard, but a significant perf difference between them might be a deciding factor. Particularly a perf diff in when they are not used. If a low-level API has low perf overhead, but every user of it has to add a bunch of code, bringing the combined performance back to that of the high-level api, its not a win. I'm not saying that's what is or will happen, but some kind of concrete numbers would help here.

@vdeturckheim I find your comments in #807 (comment) pretty compelling, but the top of the write up says not everyone was equally convinced... who isn't? Perhaps they (or you if you are up to it) can present the case that the lower-level API is the right one?

I confess that the whole sync vs async distinction that seems to be a major sticking point goes over my head, I'm not coding with the API myself, so I lack background. Probably I have that in common with most collaborators, including TSC members. Sorry.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jan 24, 2020

@mcollina

I think a very important factor in making a decision is the overhead introduced by each API. So, I would prefer to have benchmarks implemented, and compare the two.

I did benchmarking of AsyncLocal vs AsyncContext (the version of it that was based on WeakMap). The result may be seen in nodejs/node#31016

Duplicating the result here:

$ ./node benchmark/async_hooks/async-resource-vs-destroy.js benchmarker=autocannon
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-local" benchmarker="autocannon": 20,277.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-local" benchmarker="autocannon": 15,877.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-context" benchmarker="autocannon": 16,922.41
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-context" benchmarker="autocannon": 11,841.2
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-resource" benchmarker="autocannon": 23,582.4
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-resource" benchmarker="autocannon": 18,407.2

Note 0: my PR includes benchmark for AsyncLocal. Benchmarks for AsyncContext and cls-hooked are available in a separate gist.

Note 1: AsyncLocal could be made even faster by moving from WeakMap to resource properties, but this would require removing .remove() method, which, as I believe, is very valuable.

@vdeturckheim
Copy link
Member Author

@sam-github I can't speak for @puzpuzpuz but my understanding of our discussion of yesterday eveing is that they would like to use AsyncLocal this to build higher level modules and that they come with a package builder perspective.

I have to confess that my APM/RASP vendor perspective is mostly about stability across the ecosystem which has been a major challenge for us (and made us monkeypatch multiple modules like promise libs or queue libs)

@vdeturckheim
Copy link
Member Author

@puzpuzpuz would you have time to re-run these (and share the script for us to run across multiple arch) with the latest versions of the API?

Also @mcollina , I believe that the performances of AsyncContext will be equivalent to the ones of AsyncLocal when rebased over executionAsyncResource.

@mcollina
Copy link
Member

@vdeturckheim performance and overhead is a critical factor, so my recommendation is:

  1. land executionAsyncResource()
  2. rebase both on top of it, and see which one add the least overhead
  3. evaluate if we are just pushing overhead down to the tracers.

Build an API that make the life of everybody easier and improve the performance of Node.js applications.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jan 24, 2020

@sam-github

If a low-level API has low perf overhead, but every user of it has to add a bunch of code, bringing the combined performance back to that of the high-level api, its not a win.

It's easy to use AsyncLocal directly. See this snippet that shows how to build request id tracing into logs with AsyncLocal: https://github.com/nodejs/node/pull/31016/files#diff-9a4649f1c3f167b0da2c95fc38953a1fR604

It's not really low-level, but it's unopinionated. You can build an opinionated API on top of it, but you don't have to.

@vdeturckheim

@puzpuzpuz would you have time to re-run these (and share the script for us to run across multiple arch) with the latest versions of the API?

I don't see any purpose in doing it right now, as the implementation will change a lot once it's migrated to executionAsyncResource. But once that's done, I can certainly re-run benchmarks.

Also @mcollina , I believe that the performances of AsyncContext will be equivalent to the ones of AsyncLocal when rebased over executionAsyncResource.

The overhead of AsyncContext in case of that particular benchmark would be one .nextTick() and one Map allocation per each request. That could make a different in the result, but it has to be measured.

In any case, this issue quickly turns into a debate arena. I'll try to post here as less as possible to avoid overflooding it.

@vdeturckheim
Copy link
Member Author

@mcollina I agree that performance is a critical factor.

For cases where the Map allocation and the nextTick call bring too much overhead, I think executionAsyncResource is the right API to build from. It gives the most control to the end user with a memory-safe base without adding a layer of complexity. In these cases, AsyncLocal and AsyncContext are both already too high level.

@raymondfeng
Copy link

raymondfeng commented Jan 24, 2020

Let me give a difference perspective here from a framework developer's view.

At LoopBack, we gave up on dependencies of lower apis for async context. Instead we create an IoC container that uses a chain of contexts with dependency injection to propagate/receive contextual information for the request/response processing pipeline.

Conceptually, when an http request comes in, we create a RequestContext and use it to instantiate downstream handler instances via DI so that they can receive injections of the current request context. Please note each handler can also bind more data to the RequestContext. Please also note that the RequestContext is a child of Server, which is a child of Application. The context chain allows information to be shared in scopes.

See https://loopback.io/doc/en/lb4/Behind-the-scene.html for more details.

@Flarna
Copy link
Member

Flarna commented Jan 24, 2020

I'm working for an APM tool vendor. I created #27172 as alternative to #26540 after diagnostics summit last year as I was not able to use AsyncContext but agreed that something in this area makes sense in core. That time I verified that it works with our APM tool. I was able to port Domains at least that far that just 3 tests failed. See #27172 for more details is needed.

Main reason why I'm not able to use AsyncContext is the async API. It changes execution flow (sequence of calls) which is not acceptable for a APM tools we create which silently injects into an existing app. There are some other reasons but I don't want to repeat all my comments I left in #26540 here again.

I finally closed #27172 because for two reasons:

  • there are two other PRs so no need for a third one
  • I found meanwhile enough corner cases which pushed me again to use async hooks directly

executionAsyncResource is for sure a good thing for us once it's done. But I think it has a similar problem then async-hooks: internal, undocumented objects are exposed (at least as it is implemented now). Therefore internal changes could result in breaking usercode which is by some means ok for an experimental API but most likely not for a stable one.

Both AsyncLocal and AsyncContext don't have this problem. Also the embedder API (AsyncResource) is fine regarding this.

Therefore whatever variant lands has a much better chance to get out of experimental then async-hooks have. Both will for sure not solve all problems but hopefully trigger that userland moves towards use of AsyncResource which in turn improves all variants.

@Qard
Copy link
Member

Qard commented Jan 25, 2020

I don't intend on executionAsyncResource ever being a "stable" API. It should be consider internals for the higher-level CLS we build on top of it.

Also, agreed that async is problematic. If we do choose AsyncContext over AsyncLocal I would like it to be rebased on top of executionAsyncResource first to eliminate the need for an async run. I'm personally leaning a bit more toward AsyncLocal at the moment as it is already sync and I prefer the more simplistic model of containing an arbitrary value rather than building an object or map that may or may not be necessary. I can imagine many cases where you only want to propagate one thing.

@wesleytodd
Copy link
Member

It sounds to me like the conversation at this point is AsyncLocal vs AsyncContext with executionAsyncResource as a given. So from my view as a perspective consumer, (disregarding the performance unknowns until they are established) there seems to be so little gained from the api of AsyncLocal that I would almost always have a library layer to give an experience more like AsyncContext that having the middle layer is unnecessary.

A few questions I have about AsyncLocal:

  1. Is there an example which would show how you imagine a higher level building on top of AsyncLocal would look?
  2. For the APM vendor and a web framework to use AsyncLocal, would that mean creating two AsyncLocal instances? If the APM vendor could use the AsyncContext which was created by the framework and then use a scoped name in the Map, would that be better than creating multiple AsyncLocal instances?
  3. Is the method for stopping propagation asynclocal.set(undefined)? What if there was partial state you wanted to continue to propagate?
  4. Do I understand correctly that to use this AsyncLocal with something more like an object interface it would look like this?
const obj = asynclocal.get()
obj.myVal = 'MyNewVal'
asynclocal.set(obj)

vs this from AsyncContext

asynccontext.set('myVal', 'MyNewVal')

I still haven't had the chance to dig into the implementation details of the PR's, but I have thought about what an implementation inside of Express might look like. And I think the API offered by AsyncContext makes more sense to me.

I see that there are some contexts there having the extra Map is not necessary, and that the extra process.nextTick is a waste. If the main target uses cases are tracing, APM monitoring and web framework usage, then I am not sure any of these matter as those use cases will almost all want at least one of the features offered by AsyncContext.

To address a few interesting points I find in the above conversation:

having userland modules (and their forks) here is a major risk

This is a double edged sword. It can stop innovation, but constraints can also force even more innovation. In this case I think that the API of AsyncLocal is so minimal I am not sure downstream consumers will understand how to use it, and so might not even pick it up or atleast miss-use it. Unless we see that the additional api is bad for perf or otherwise unfitting, AsyncContext is much more approachable from an end consumer IMO.

Main reason why I'm not able to use AsyncContext is the async API. It changes execution flow (sequence of calls) which is not acceptable for a APM tools we create which silently injects into an existing app.

You are in fact "changing the execution flow" even if you are not in the stack trace. I would much rather have something where I could see "oh, this is where the APM hooks into my code", than not have a trace. Either way, I am not sure this is a valid reason to pick one API over the other. Is there something this changes which I am not aware of for the runtime or your customers?

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jan 25, 2020

@wesleytodd

Let me try to answer your questions on AsyncLocal. Sorry for a long reply in advance, but you've asked many questions, so it's not possible to answer them in a more compact manner.

  1. Is there an example which would show how you imagine a higher level building on top of AsyncLocal would look?

That depends on the concrete API, obviously. Say, building AsyncContext-like API would be something like the following:

class AsyncContext {
   // constructor and .exit() are omitted...

  run(callback) {
    process.nextTick(() => {
      const store = new Map();
      this.asyncLocal.set(store);
      callback(store);
    });
  }

  getStore() {
    return this.asyncLocal.get();
  }

A quite simple and straightforward implementation, at least for me.

But you don't have to deal with a wrapper (you call it "higher level") API. For instance, I'm going to use AsyncLocal directly in my library (https://github.com/puzpuzpuz/cls-rtracer), which implements request tracing middlewares/plugins for many popular web frameworks. To give you an impression how simple it's going to look like, here is an example of a draft Express middleware built on top of AsyncLocal:

const expressMiddleware = ({
  useHeader = false,
  headerName = 'X-Request-Id'
} = {}) => {
  return (req, res, next) => {
    let requestId;
    if (useHeader) {
      requestId = req.headers[headerName.toLowerCase()];
    }
    requestId = requestId || uuidv1();

    // the whole integration - one-liner 
    asyncLocal.set(requestId);
    next();
  }
}
  1. For the APM vendor and a web framework to use AsyncLocal, would that mean creating two AsyncLocal instances? If the APM vendor could use the AsyncContext which was created by the framework and then use a scoped name in the Map, would that be better than creating multiple AsyncLocal instances?

I wouldn't advice sharing AsyncContext (or AsyncLocal) between web framework and APM. Sharing AsyncContext may lead to the following problems:

  1. Sharing instance of AsyncContext will assume tight coupling between web framework and APM, as there is no global registry in AsyncContext API itself.
  2. Store creation requires .run() call, thus one of participants (web framework or APM) must do that call and another party has to rely on that.
  3. Store itself is a plain Map, so nothing forces key isolation or prevents one of parties from calling map.clear().

These problems are quite critical and I don't see any benefits of AsyncContext here.

  1. Is the method for stopping propagation asynclocal.set(undefined)? What if there was partial state you wanted to continue to propagate?

In that case you can simply store an Object or Map and clean some of its keys.

  1. Do I understand correctly that to use this AsyncLocal with something more like an object interface it would look like this?
const obj = asynclocal.get()
obj.myVal = 'MyNewVal'
asynclocal.set(obj)

vs this from AsyncContext

asynccontext.set('myVal', 'MyNewVal')

AsyncLocal snippet is not correct: you don't need to do asynclocal.set(obj), if obj is already stored in AsyncLocal.

Also your snippets are showing plain object in AsyncLocal vs Map in AsyncContext, which is not quite the same. AsyncLocal allows you to deal with any type of value directly if you only need a single value, without having to pay for and manipulate an extra Map.

@Qard
Copy link
Member

Qard commented Jan 25, 2020

I expect most APM vendors would have their own class for managing context, so they would likely want to store an instance of that. With AsyncContext, there's an extra unnecessary layer with the map--you'd typically have a map with only one thing in it, which is wasteful. With AsyncLocal, you can store your context class instance directly.

const apmContext = new Agent.Context(...)
asyncLocal.set(apmContext)
// later...
const apmContext = asyncLocal.get()

Potentially the AsyncContext form could make it easier to share the context across components of the system, like APMs and web frameworks could interact with the same store. However, I personally don't trust the ecosystem to keep in-sync enough for that to be at all stable. It's very easy to accidentally overwrite someone else's data or have data shape incompatibilities due to modules evolving at different rates. The second you start dealing with interop all that data needs to be considered public API surface, and given that the type of data that would typically need to be propagated via a CLS system would tend to be internal data, those things seem somewhat at-odds to me.

My personal feeling is that contexts should always be fully isolated from each other and not shared, and that if there is a need to communicate between system components it makes more sense to create a separate channel for that, which is what Diagnostics Channel was meant to be.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Jan 25, 2020

@Qard how would you feel about a solution which would look like:

new AsyncContext<T>(T); // would call Object.create(T.prototype) to get the store

?
I don't like having a wild type here as I see it as a good way to let people shoot themselves in the foot with unpredictable values.

Another point that worries me is the behavior of set in AsyncLocal:

  • Usually, methods named set have the following signature set(key, value) (see CLS and Map).
  • set might throw arbitrarily if someone calls remove. I agree that in theory this should not happen. However, IMO, a web framework will have to share the context/local object for this to work. I have built and express example (cc @wesleytodd) showing how framewroks will probably leverage such API.

Also, get and set will likely become megamorphic in V8 if there are more than 4 different users of it in the same process.

I don't want us to end up writing another port mortem because an API we shipped and advertised was too permissive and had implicit behaviors.

@wesleytodd
Copy link
Member

wesleytodd commented Jan 25, 2020

Sorry for a long reply in advance, but you've asked many questions, so it's not possible to answer them in a more compact manner.

😄 No worries! I started it.

here is an example of a draft Express middleware built on top of AsyncLocal

This is what I anticipated it would look like, but I feel like if Express is creating a an instance it would make more sense for it to look like this:

const _reqId = Symbol('_reqId')
function mw (req, res, next) {
  // ..
  req.app.context.set(_reqId, req.headers['X-Request-Id'])
  next()
}

tight coupling between web framework and APM

Isn't that the point? How can do do an APM which is not tightly coupled and still get useful metrics out of it?

thus one of participants (web framework or APM) must do that call and another party has to rely on that

And the web framework would always do that for you when it gets the http request in the first pace. Is there another case where this would not be true?

Store itself is a plain Map, so nothing forces key isolation or prevents one of parties from calling map.clear()

This is a good point! From an express point of view we would be offering this as a user facing api so if they decided to call .clear that is on them. From the point of view of an APM, if this is a concern for you all you would have to do is not share the instance. That said, as a user I hate when products try to force prevent me from doing something when I as the app developer know it is the right thing to do. Buy you do you 😛

AsyncLocal snippet is not correct: you don't need to do asynclocal.set(obj), if obj is already stored in AsyncLocal.

Right, since it is a reference. That said I try to avoid modifying objects which are not in my current scope as it is confusing to debug. 🤷‍♂

I personally don't trust the ecosystem to keep in-sync

I agree this is the general concern, but I think both api's if used in this way introduce it so it is really up to the implementation and does not make one better than the other here.

@puzpuzpuz
Copy link
Member

@wesleytodd

here is an example of a draft Express middleware built on top of AsyncLocal

This is what I anticipated it would look like, but I feel like if Express is creating a an instance it would make more sense for it to look like this:

const _reqId = Symbol('_reqId')
function mw (req, res, next) {
  // ..
  req.app.context.set(_reqId, req.headers['X-Request-Id'])
  next()
}

In this case, how would you implement integration with a logger? For such integration context has to be available globally, directly or through Express's app.

Anyway, I see your point, which is a better integration with the framework. It makes sense to me. But my snippet is aimed to show how simple a real-world integration with AsyncLocal can be, so such details are not that important.

tight coupling between web framework and APM

Isn't that the point? How can do do an APM which is not tightly coupled and still get useful metrics out of it?

thus one of participants (web framework or APM) must do that call and another party has to rely on that

And the web framework would always do that for you when it gets the http request in the first pace. Is there another case where this would not be true?

Most (if not any) APMs integrate with http and http2 and with some of web frameworks. I don't think APM vendors are going to extend their integration with a particular framework that happened to expose standard CLS API. But that depends on many factors.

Also I don't think frameworks should enable CLS API by default, as it may impact performance of applications that don't use CLS at all. That setting should be enabled by the user and, thus, APMs shouldn't rely on that setting being enabled.

In any case, looks like my 3rd point was convincing enough and both of us agree that both AsyncLocal and AsyncContext shouldn't be shared between web frameworks and APMs.

@vdeturckheim

Usually, methods named set have the following signature set(key, value) (see CLS and Map).

Single value containers usually have .get()/.set() methods. That's a common practice in many languages, so users won't be surprised. Optionally, it should be changed to asyncLocal.value property accessor methods, like it was done in nodejs/node#27172

set might throw arbitrarily if someone calls remove.

As AsyncLocal clears all underlying resources once .remove() is called, users shouldn't continue modifying its value. That's a fail-fast behavior which is good for a core API, so I don't understand your concerns.

The two items above look like an off-topic for this particular GH issue. They're something that has to be discussed as a part of code review, not as a part of API proposals comparison.

Also, get and set will likely become megamorphic in V8 if there are more than 4 different users of it in the same process.

Could you elaborate on the logical chain that led you to this conclusion? Also, if that stands true (I have some doubts here), it automatically applies to .get()/.set() method in AsyncContext's store, which is a Map.

@othiym23
Copy link

When I started working on what became continuation-local-storage – both to solve the needs of the APM vendor for whom I was working at the time and at the behest of the then-leader of the Node project – it was with the goal of having the API become a part of Node core. It was also intended to provide a minimal API surface and to try to avoid some of the mistakes we knew we had made with domains.

Several people raised concerns about introducing another abstraction as costly, in terms of implementation complexity and performance, as domains, and so @trevnorris and a few others argued successfully that this was something that should be kept in userland, and instead a lower-level API with fewer consequences be implemented in core. I still needed to ship, so I, along with @creationix and a few other people, shipped first async-listener and continuation-local-storage. @trevnorris, meanwhile, implemented AsyncResource, and then he and others worked on what became the current async_hooks API.

I mention this context simply to point out that I have always seen value for this functionality in core, while agreeing that end users should not bear the cost for abstractions for which they have no use. Also, I agree with the goal of creating a simple, clean abstraction that can be used for both the purposes of APM and for developers of web applications who want a hygienic way of passing state throughout a request / response cycle. I think executionAsyncResource does expose too much internal state and should probably remain experimental. One of AsyncLocal and AsyncContext therefore makes a lot of sense to me, but I haven't been working with APM or even Node very closely for a while, and so I don't feel qualified to weigh in on which is superior. I defer to @vdeturckheim and @Qard, but also I support their recommendations because I think their goals and approach are fundamentally identical to my own when I was working on this more actively.

@Flarna
Copy link
Member

Flarna commented Jan 26, 2020

Main reason why I'm not able to use AsyncContext is the async API. It changes execution flow (sequence of calls) which is not acceptable for a APM tools we create which silently injects into an existing app.

You are in fact "changing the execution flow" even if you are not in the stack trace. I would much rather have something where I could see "oh, this is where the APM hooks into my code", than not have a trace. Either way, I am not sure this is a valid reason to pick one API over the other. Is there something this changes which I am not aware of for the runtime or your customers?

@wesleytodd It's not about visiblity on call stack. As monkey-patching/wrapping is the major entry point for APMs they are anyway visible on stack. I'm talking about the sequence of function calls. Consider following request handler:

onRequest(req, res, next) {
  const rc1 = prepareSomething()
  const rc2 = doSomethingAPMsWantNewContext(rc1)
  const rc3 = postProcessResult(rc3)
  ...
}

monkey-patched variant of doSomethingAPMsWantNewContext with sync API from AsyncLocal would look like this:

function wrappedDoSomethingAPMsWantNewContext() {
  try {
    asyncLocal.set(myContext)
    return Reflect.apply(doSomethingAPMsWantNewContext, this, arguments)
  } finally {
    asyncLocal.set(undefined)
  }
}

==> doSomethingAPMsWantNewContext() gets same input and returns same result/exception and is called between prepareSomething() and postProcessResult();

monkey-patched variant of ´doSomethingAPMsWantNewContext()with async API fromAsyncContextis problematic as the callback passed toasyncContext.run()is called viaprocess.nextTick(). Therefore it's actually called _after_ postProcessResult()` and exceptions thrown are uncaught.
If you have full control over the code it's not that hard to modify it and adapt it accordingly but our APM tool injects into an existing application and we really shouldn't change sequence of function calls.

Regarding sharing CLS instance between WebFramework and APMs: Please note that there are a lot frameworks besides HTTP like various frameworks for remote procedure calls and messaging used even in combination with HTTP. Besides that simple apps use plain Node core HTTP and no user land framework. And there are FaaS platforms like AWS lambda or applicaitons issueing database requests based on timers or other triggers (not HTTP).
In short if we go for one big, shared CSL instance it would be needed to be part of node core.

I know that thread locals are not exactly the same than Async CLS but they are similar. As far as I know each language supporting thread locals allows to add more independent entries instead of sharing one singelton TLS entry.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Jan 26, 2020

@Flarna (quick response, probably the first of this thread)
-> As mentionned, I believe AsyncContext can use the sync API (you convinced me of that in April last year) with multiple warnings on doc regarding their safety risks. I actually had several prototypes for such but the stable way to do it for me is to do it on top of executionAsyncResource. Since nothing will be merged before executionAsyncResource, I will rebase over that and add the sync API later this week.

REF in original doc:

Eventually, a synchronous API could be added to AsyncContext when the executionAsyncResource rebase is done. In this case, documentation will clearly state that using run is the prefered method and that synchronous methods have less explicit behaviors.

@Flarna
Copy link
Member

Flarna commented Jan 26, 2020

My comment was more a reply toward @wesleytodd to explain why an async API is not usable in any case.

Wrapping/Extending AsyncLocal to act async is easy.
If sync is added to AsyncContext both are equal in this aspect.

@wesleytodd
Copy link
Member

wesleytodd commented Jan 27, 2020

In short if we go for one big, shared CSL instance it would be needed to be part of node core.

I do not want this to be the solution. I was specifically talking about inside of an http framework like express.

@wesleytodd It's not about visiblity on call stack. As monkey-patching/wrapping is the major entry point for APMs they are anyway visible on stack. I'm talking about the sequence of function calls.

That makes a lot more sense, thanks for clarifying! I can see how this might be problematic for this use case. That said, I strongly believe that designing a core api around monkey patching user code so that your business can function without asking libraries to give you hooks is not the right way forward.

If we can get a sync version in a reasonable way then I am all 👍 for that, and it sounds like we can according to @vdeturckheim. Otherwise we should be focused on what users will want, and the AsyncContext proposal is much closer to this IMO.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Jan 28, 2020

I think right now AsyncLocal might be impacted by nodejs/node#30959 (comment). I will dig into that later tonight.

The current implementation of AsyncContext seems to work with this case, but the rebase on executionAsyncResource has the issue.

@Flarna for future iterations on AsyncContext, here is the rebase with a synchronous API

I try to warn the user regarding using it as it is very easy to shoot oneself in the foot with it, for instance, with events:

emitter.on('event', () => {
    asyncContext.enterSync();
});
emitter.on('event', () => {
    asyncContext.getStore(); // will return  the store.
});

This is the main reason why I also introduced runAndReturn to provide a scoped synchronous entrypoint.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Jan 30, 2020

Current status on Jan 30th 2020

  • executionAsyncResource is currently blocked by an issue when resources are reused
  • AsyncContext has not been rebased on executionAsyncResource. But synchronous entrypoints have been added to it (with documentation clarifying what edgy cases then can introduce).

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Feb 4, 2020

@nodejs/tsc I have added a status part to the original message (at the bottom) to make it easier. Let me/us know if you have any questions here!

@Qard, @Flarna and @puzpuzpuz I did it with my understanding of the current discussions in the PRs, please feel free to correct me if I have missed/missunderstood something

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Feb 4, 2020

@vdeturckheim

executionAsyncResource
PR is still blocked as reused resources are exposed.

That's not correct. The only blocker in that PR is resource mismatch between init hooks and executionAsyncResource() in case of HTTPParser and that issue should be resolved soon.

Resource reuse itself doesn't lead to incorrect context propagation, but may lead stored value leak in certain scenarios, which should be fine for an experimental API. Core side of this problem should be addressed separately, in a separate PR.

@Qard please correct me if I'm wrong.

@vdeturckheim
Copy link
Member Author

Thanks @puzpuzpuz , I updated the status.

In my understanding, executionAsyncResource serves two goals:

Currently, merging it while the issue with reused resources is not covered alwyas defeats at least one or the other goal.

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Feb 5, 2020

@vdeturckheim
There is at least one potential context loss issue in AsyncContext PR (see nodejs/node#26540 (comment)), all of which could be resolved by using executionAsyncResource(). So, it looks like AsyncContext PR should start using executionAsyncResource() and should be marked as blocked by #30959 in your status update. WDYT?

Update. I've removed a couple of links, as they're open questions, not issues.

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Feb 5, 2020

@puzpuzpuz thanks for pointing that out! I don't think the case you raise in the PR can happen because. I'll answer directly in the PR.

(edit) the issue mentionned by @puzpuzpuz is actually not present

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Feb 5, 2020

I've closed nodejs/node#31016 in favor of nodejs/node#26540, as the decision is still pending and I believe that having one or another CLS API in core in the nearest future is more important than having this particular one.

@vdeturckheim
Copy link
Member Author

Thanks a lot @puzpuzpuz . I really appreciate this!

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Feb 12, 2020

As nodejs/node#31746 was created recently, I've reopened nodejs/node#31016. So, it makes sense to reiterate through these three PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants