-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executive summary: Introducing a CLS-like API to Node.js core #807
Comments
Given this, AsyncContext should be implementable on top of AsyncLocal, unifying the proposals. If not, that would suggest to me that AsyncLocal is not an unopinonated building block. Can either be used to implement domains? Concurrent existence of multiple async tracking implementations has been an implementation burden in the past, I'm not sure what the current state is. I'm curious, there is a relatively finite set of projects currently wanting something like this, their opinions would be the most valuable, IMHO:
|
Yes, that's correct. It's possible to use AsyncLocal to implement AsyncContext on top of it.
That's feasible with AsyncLocal at least, but it would require adding support for onChange callbacks into the API (i.e. extending the API), similar to the one that was build by @Flarna in nodejs/node#27172 |
We just had a discussion about this topic in our Web Server Frameworks Team kick off. Pinging the whole team! @nodejs/web-server-frameworks |
I would recommend everybody in this thread to read https://docs.google.com/document/d/1g8OrG5lMIUhRn1zbkutgY83MiTSMx-0NHDs8Bf-nXxM/edit?usp=sharing, and possibly include this in the brief on top. It explains the reationale behind TL;DR the destroy hook is the source of a good part of the overhead, especially in promises-heavy codebases. @vdeturckheim note that I'm the original author of I think a very important factor in making a decision is the overhead introduced by each API. So, I would prefer to have benchmarks implemented, and compare the two. |
@sam-github that's great questions/points! I will try to answer from my multiple perspectives as I worked on domain, wrote one of the PR and maintain a commercial tool relying on such features.
That's correct, however, I don't believe AsyncLocal is the right level of abstraction here:
I think domains should not be re-re-writen except if we want to get rid of Async Hooks. I mostly cleaned out the multiple impacts of domain in the codebase to move it over Async Hooks and I think this is a state we can keep for now.
As an APM/RASP vendor, I built AsyncContext as the API I wished I had in Node.js. I would not rebuild AsyncContext over AsyncLocal in my agent but use executionAsyncResource for sure. One of the reasons I want such API in core is to align the ecosystem on context tracking.
@othiym23 is pretty positive regarding the AsyncContext PR and I am sure there will be some alignments we can make.
IMO, this is not the right abstraction level for long stacktrace API as it will still need to rely on Async Hooks |
@mcollina good points. I have updated the doc and added you as author of the Thanks a lot! |
@mcollina 's point about performance is important, its hard, but a significant perf difference between them might be a deciding factor. Particularly a perf diff in when they are not used. If a low-level API has low perf overhead, but every user of it has to add a bunch of code, bringing the combined performance back to that of the high-level api, its not a win. I'm not saying that's what is or will happen, but some kind of concrete numbers would help here. @vdeturckheim I find your comments in #807 (comment) pretty compelling, but the top of the write up says not everyone was equally convinced... who isn't? Perhaps they (or you if you are up to it) can present the case that the lower-level API is the right one? I confess that the whole sync vs async distinction that seems to be a major sticking point goes over my head, I'm not coding with the API myself, so I lack background. Probably I have that in common with most collaborators, including TSC members. Sorry. |
I did benchmarking of AsyncLocal vs AsyncContext (the version of it that was based on Duplicating the result here:
Note 0: my PR includes benchmark for AsyncLocal. Benchmarks for AsyncContext and Note 1: AsyncLocal could be made even faster by moving from |
@sam-github I can't speak for @puzpuzpuz but my understanding of our discussion of yesterday eveing is that they would like to use AsyncLocal this to build higher level modules and that they come with a package builder perspective. I have to confess that my APM/RASP vendor perspective is mostly about stability across the ecosystem which has been a major challenge for us (and made us monkeypatch multiple modules like promise libs or queue libs) |
@puzpuzpuz would you have time to re-run these (and share the script for us to run across multiple arch) with the latest versions of the API? Also @mcollina , I believe that the performances of AsyncContext will be equivalent to the ones of AsyncLocal when rebased over |
@vdeturckheim performance and overhead is a critical factor, so my recommendation is:
Build an API that make the life of everybody easier and improve the performance of Node.js applications. |
It's easy to use AsyncLocal directly. See this snippet that shows how to build request id tracing into logs with AsyncLocal: https://github.com/nodejs/node/pull/31016/files#diff-9a4649f1c3f167b0da2c95fc38953a1fR604 It's not really low-level, but it's unopinionated. You can build an opinionated API on top of it, but you don't have to.
I don't see any purpose in doing it right now, as the implementation will change a lot once it's migrated to
The overhead of AsyncContext in case of that particular benchmark would be one In any case, this issue quickly turns into a debate arena. I'll try to post here as less as possible to avoid overflooding it. |
@mcollina I agree that performance is a critical factor. For cases where the |
Let me give a difference perspective here from a framework developer's view. At LoopBack, we gave up on dependencies of lower apis for Conceptually, when an http request comes in, we create a See https://loopback.io/doc/en/lb4/Behind-the-scene.html for more details. |
I'm working for an APM tool vendor. I created #27172 as alternative to #26540 after diagnostics summit last year as I was not able to use Main reason why I'm not able to use I finally closed #27172 because for two reasons:
Both Therefore whatever variant lands has a much better chance to get out of experimental then async-hooks have. Both will for sure not solve all problems but hopefully trigger that userland moves towards use of |
I don't intend on Also, agreed that async is problematic. If we do choose AsyncContext over AsyncLocal I would like it to be rebased on top of |
It sounds to me like the conversation at this point is A few questions I have about
const obj = asynclocal.get()
obj.myVal = 'MyNewVal'
asynclocal.set(obj) vs this from asynccontext.set('myVal', 'MyNewVal') I still haven't had the chance to dig into the implementation details of the PR's, but I have thought about what an implementation inside of Express might look like. And I think the API offered by I see that there are some contexts there having the extra To address a few interesting points I find in the above conversation:
This is a double edged sword. It can stop innovation, but constraints can also force even more innovation. In this case I think that the API of
You are in fact "changing the execution flow" even if you are not in the stack trace. I would much rather have something where I could see "oh, this is where the APM hooks into my code", than not have a trace. Either way, I am not sure this is a valid reason to pick one API over the other. Is there something this changes which I am not aware of for the runtime or your customers? |
Let me try to answer your questions on AsyncLocal. Sorry for a long reply in advance, but you've asked many questions, so it's not possible to answer them in a more compact manner.
That depends on the concrete API, obviously. Say, building AsyncContext-like API would be something like the following: class AsyncContext {
// constructor and .exit() are omitted...
run(callback) {
process.nextTick(() => {
const store = new Map();
this.asyncLocal.set(store);
callback(store);
});
}
getStore() {
return this.asyncLocal.get();
} A quite simple and straightforward implementation, at least for me. But you don't have to deal with a wrapper (you call it "higher level") API. For instance, I'm going to use AsyncLocal directly in my library (https://github.com/puzpuzpuz/cls-rtracer), which implements request tracing middlewares/plugins for many popular web frameworks. To give you an impression how simple it's going to look like, here is an example of a draft Express middleware built on top of AsyncLocal: const expressMiddleware = ({
useHeader = false,
headerName = 'X-Request-Id'
} = {}) => {
return (req, res, next) => {
let requestId;
if (useHeader) {
requestId = req.headers[headerName.toLowerCase()];
}
requestId = requestId || uuidv1();
// the whole integration - one-liner
asyncLocal.set(requestId);
next();
}
}
I wouldn't advice sharing AsyncContext (or AsyncLocal) between web framework and APM. Sharing AsyncContext may lead to the following problems:
These problems are quite critical and I don't see any benefits of AsyncContext here.
In that case you can simply store an
AsyncLocal snippet is not correct: you don't need to do Also your snippets are showing plain object in AsyncLocal vs |
I expect most APM vendors would have their own class for managing context, so they would likely want to store an instance of that. With AsyncContext, there's an extra unnecessary layer with the map--you'd typically have a map with only one thing in it, which is wasteful. With AsyncLocal, you can store your context class instance directly. const apmContext = new Agent.Context(...)
asyncLocal.set(apmContext)
// later...
const apmContext = asyncLocal.get() Potentially the My personal feeling is that contexts should always be fully isolated from each other and not shared, and that if there is a need to communicate between system components it makes more sense to create a separate channel for that, which is what Diagnostics Channel was meant to be. |
@Qard how would you feel about a solution which would look like: new AsyncContext<T>(T); // would call Object.create(T.prototype) to get the store ? Another point that worries me is the behavior of
Also, I don't want us to end up writing another port mortem because an API we shipped and advertised was too permissive and had implicit behaviors. |
😄 No worries! I started it.
This is what I anticipated it would look like, but I feel like if Express is creating a an instance it would make more sense for it to look like this: const _reqId = Symbol('_reqId')
function mw (req, res, next) {
// ..
req.app.context.set(_reqId, req.headers['X-Request-Id'])
next()
}
Isn't that the point? How can do do an APM which is not tightly coupled and still get useful metrics out of it?
And the web framework would always do that for you when it gets the http request in the first pace. Is there another case where this would not be true?
This is a good point! From an express point of view we would be offering this as a user facing api so if they decided to call
Right, since it is a reference. That said I try to avoid modifying objects which are not in my current scope as it is confusing to debug. 🤷♂
I agree this is the general concern, but I think both api's if used in this way introduce it so it is really up to the implementation and does not make one better than the other here. |
In this case, how would you implement integration with a logger? For such integration context has to be available globally, directly or through Express's Anyway, I see your point, which is a better integration with the framework. It makes sense to me. But my snippet is aimed to show how simple a real-world integration with AsyncLocal can be, so such details are not that important.
Most (if not any) APMs integrate with Also I don't think frameworks should enable CLS API by default, as it may impact performance of applications that don't use CLS at all. That setting should be enabled by the user and, thus, APMs shouldn't rely on that setting being enabled. In any case, looks like my 3rd point was convincing enough and both of us agree that both AsyncLocal and AsyncContext shouldn't be shared between web frameworks and APMs.
Single value containers usually have
As AsyncLocal clears all underlying resources once The two items above look like an off-topic for this particular GH issue. They're something that has to be discussed as a part of code review, not as a part of API proposals comparison.
Could you elaborate on the logical chain that led you to this conclusion? Also, if that stands true (I have some doubts here), it automatically applies to |
When I started working on what became Several people raised concerns about introducing another abstraction as costly, in terms of implementation complexity and performance, as domains, and so @trevnorris and a few others argued successfully that this was something that should be kept in userland, and instead a lower-level API with fewer consequences be implemented in core. I still needed to ship, so I, along with @creationix and a few other people, shipped first I mention this context simply to point out that I have always seen value for this functionality in core, while agreeing that end users should not bear the cost for abstractions for which they have no use. Also, I agree with the goal of creating a simple, clean abstraction that can be used for both the purposes of APM and for developers of web applications who want a hygienic way of passing state throughout a request / response cycle. I think |
@wesleytodd It's not about visiblity on call stack. As monkey-patching/wrapping is the major entry point for APMs they are anyway visible on stack. I'm talking about the sequence of function calls. Consider following request handler: onRequest(req, res, next) {
const rc1 = prepareSomething()
const rc2 = doSomethingAPMsWantNewContext(rc1)
const rc3 = postProcessResult(rc3)
...
} monkey-patched variant of function wrappedDoSomethingAPMsWantNewContext() {
try {
asyncLocal.set(myContext)
return Reflect.apply(doSomethingAPMsWantNewContext, this, arguments)
} finally {
asyncLocal.set(undefined)
}
} ==> monkey-patched variant of ´doSomethingAPMsWantNewContext() Regarding sharing CLS instance between WebFramework and APMs: Please note that there are a lot frameworks besides HTTP like various frameworks for remote procedure calls and messaging used even in combination with HTTP. Besides that simple apps use plain Node core HTTP and no user land framework. And there are FaaS platforms like AWS lambda or applicaitons issueing database requests based on timers or other triggers (not HTTP). I know that thread locals are not exactly the same than Async CLS but they are similar. As far as I know each language supporting thread locals allows to add more independent entries instead of sharing one singelton TLS entry. |
@Flarna (quick response, probably the first of this thread) REF in original doc:
|
My comment was more a reply toward @wesleytodd to explain why an async API is not usable in any case. Wrapping/Extending AsyncLocal to act async is easy. |
I do not want this to be the solution. I was specifically talking about inside of an http framework like express.
That makes a lot more sense, thanks for clarifying! I can see how this might be problematic for this use case. That said, I strongly believe that designing a core api around monkey patching user code so that your business can function without asking libraries to give you hooks is not the right way forward. If we can get a sync version in a reasonable way then I am all 👍 for that, and it sounds like we can according to @vdeturckheim. Otherwise we should be focused on what users will want, and the |
I think right now The current implementation of @Flarna for future iterations on AsyncContext, here is the rebase with a synchronous API I try to warn the user regarding using it as it is very easy to shoot oneself in the foot with it, for instance, with events:
This is the main reason why I also introduced |
Current status on Jan 30th 2020
|
@nodejs/tsc I have added a status part to the original message (at the bottom) to make it easier. Let me/us know if you have any questions here! @Qard, @Flarna and @puzpuzpuz I did it with my understanding of the current discussions in the PRs, please feel free to correct me if I have missed/missunderstood something |
That's not correct. The only blocker in that PR is resource mismatch between Resource reuse itself doesn't lead to incorrect context propagation, but may lead stored value leak in certain scenarios, which should be fine for an experimental API. Core side of this problem should be addressed separately, in a separate PR. @Qard please correct me if I'm wrong. |
Thanks @puzpuzpuz , I updated the status. In my understanding,
Currently, merging it while the issue with reused resources is not covered alwyas defeats at least one or the other goal. |
@vdeturckheim Update. I've removed a couple of links, as they're open questions, not issues. |
@puzpuzpuz thanks for pointing that out! I don't think the case you raise in the PR can happen because. I'll answer directly in the PR. (edit) the issue mentionned by @puzpuzpuz is actually not present |
I've closed nodejs/node#31016 in favor of nodejs/node#26540, as the decision is still pending and I believe that having one or another CLS API in core in the nearest future is more important than having this particular one. |
Thanks a lot @puzpuzpuz . I really appreciate this! |
As nodejs/node#31746 was created recently, I've reopened nodejs/node#31016. So, it makes sense to reiterate through these three PRs. |
After a discussion with @Qard and @puzpuzpuz, it seems that the two current PRs are taking very different approaches and we did not find a common ground to merge them into one single PR.
The following document has been authored by me and reviewed by @Qard, @puzpuzpuz and @Flarna. It aims at giving full context to the TSC regarding both PRs.
Please let us know if there is any point we should clarify to help here.
Context: 3 PRs and 1 consensus
Context Document: Making async_hooks fast (enough)
On the TSC meeting of 2020-JAN-22, the TSC reached consensus
regarding the need to have an Asynchronous Storage API in core.
Three PRs related to this topic are currently open, out of simplicity, we will refer to them by a name as of:
The AsyncLocal proposal relies on the executionAsyncResource API.
The AsyncContext proposal aims at working without executionAsyncResource, but should be rebased over executionAsyncResource when it is merged. A userland version of this API is available for testing purpose.
The rest of this document aims at comparing the AsyncLocal and the AsyncContext proposals.
Both of these proposal introduce a CLS-like API to Node.js core.
Naming
Both proposals introduce a new class in the Async Hooks module.
One is named AsyncContext and the other is named AsyncLocal.
Also, the name AsyncStorage has been discussed earlier.
This topic can easily be covered as a consensus on any name can be ported to any proposal.
.NET exposes an
AsyncLocal
class.Interfaces
AsyncLocals and AsyncContexts expose different interfaces:
AsyncContexts
AsyncContext also provide synchronous entrypoints but documentation highlights the risks of using them.
AsyncLocal
Synchronous vs. Asynchronous API
As the examples show, AsyncLocal exposes a synchronous API and AsyncContext
exposes an asynchronous one.
The synchronous API is unopinionated and is very
async/await
friendly.The asynchronous API defines a clear scope regarding which pieces of code will have
access to the store and which ones will not be able to see it. Calling
run
is an asynchronous operation that executes the callback in aprocess.netxTick
call.This is intended in order to have no implicit behavior that were a major issue according to the domain post mortem. It is expected that the API will be used to provide domain-like capabilities.
A synchronous API has been added to AsyncContext too:
enterSync/exitSync
which do not enforce scopingrunAndReturn(cb)/exitAndReturn(cb)
which run the callback synchronously. The store is only available within the callback,Eventually, an asynchronous API could be added to AsyncLocal if there is a need for it.
Stopping propagation
AsyncContext exposes a method named
exit(callback)
that stops propagation of the context through the following asynchronous calls.Asynchronous operations following the callback cannot access the store.
With AsyncLocal, propagation is stopped by calling
set(undefined)
.Disabling
An instance of AsyncLocal can be disabled by calling remove. It can't be used anymore after this call. Underlying resources are freed when the call is made, i.e. no strong references for the value remain in AsyncLocal and the internal global async hook is disabled (unless there are more active AsyncLocal exist).
AsyncContext does not provide such method.
Store type
AsyncContext
AsyncContext.prototype.getStore
will return:undefined
run
orexit
Map
AsyncLocal
AsyncLocal.prototype.get
will return:undefined
ifAsyncLocal.prototype.set
has not been called firstAsyncLocal.prototype.set
Store mutability
AsyncContext propagates it's built in mutable store which is accessible in whole async tree created.
AsyncLocal uses copy on write semantics resulting in branch of parts of the tree by setting a new value. Only mutation of the value (e.g. changing/setting a Map entry) will not branch off.
Overall philosophy
AsyncLocal is a low-level unopinionated API that aims at being used as a foundation by ecosystem packages.
It will be a standard brick upon which other modules are built.
AsyncContext is a high-level user-friendly API that cans be used out of the box by Node.js users.
It will be an API used directly by most users who have needs for context tracing.
Next steps
After an API (AsyncContext, AsyncLocal or another potential API) is merged, this roadmap might be followed:
This will enable us to iterate over Async Hook and maybe bring breaking changes to it
while still providing an API filling most of Node.js users need in term of tracing through
a stable API.
EDITS: Status afyer the original document
Current status on Feb 4th 2020
executionAsyncResource
PR is still blocked as some resource mismatch in init hooks and executionAsyncResource().
This seems to be on a path to resolve.
Also, reused resources are still exposed which can introduce a memory leak if a
destroy
hook is not used. One of the main point ofexecutionAsyncResource
is to get rid of the need for adestroy
hook originally.AsyncLocal
PR is blocked by the
executionAsyncResource
issue.AsyncContext
The PR can be merged and rebased over
executionAsyncResource
later.There has been a few iterations regarding synchronous entrypoint.
After advise from @Qard comment and @Flarna comment only methods taking callbacks have been kept:
asyncContext.runAndReturn(cb)
: runs the callback synchronously. The context is entered before running the callback and exited when the callback has run.asyncContext.run(cb)
: works the same asrunAndReturn
but asynchronously (within aprocess.nextTick
).The difference between the two entrypoints concerns error management as the asynchronous method will not throw errrors. Also,
exit
andexitAndReturn
can be used to stop propagation.Exposing unscoped methods (without callback) would introduce the following behavior:
The text was updated successfully, but these errors were encountered: