Add execution context service#102039
Conversation
|
Pinging @elastic/apm-ui (Team:apm) |
pgayvallet
left a comment
There was a problem hiding this comment.
Overall this is looking good.
I have to admit, async state holding is still a little magical to me, especially with hapi's event based mechanism (I'm not even really sure to understand how node is able to retain the async trace in that situation tbh)
A few concerns / questions:
-
How is
AsyncLocalStorageworking regarding garbage collection? My fear is that not being able to properly clear the storage may result in memory leaks, is that an actual concern? The PR is cleaning up the storage state during the server'sresponseevent but- are we sure this covers all responses EOL scenarios?
- what about contexts created outside of the scope of a request handler. I'm thinking about task manager for example. Will the owners of such server-side services have to manually clear the context at the end of an operation?
-
If I do think we want to enable that by default, the perf impact makes me wonder if we shouldn't still add an option to disable the feature? OTOH that would force to re-implement the possibility to read the x-opaque-id from the ES client, which was removed in this PR, so this would complexity the code a bit. Just want to be sure we're all (the team, Product and so on) understanding the perf implication of this feature.
|
|
||
| // the trimmed value in the server logs is better than nothing. | ||
| function enforceMaxLength(header: string): string { | ||
| return header.slice(0, MAX_BAGGAGE_LENGTH); |
There was a problem hiding this comment.
If the header value is a serialized json object, wouldn't truncation cause an invalid object in the end? I see we're try/catching on the server-side when parsing the header, but I wonder if this is good enough?
test/plugin_functional/test_suites/core_plugins/execution_context.ts
Outdated
Show resolved
Hide resolved
src/core/server/http/http_server.ts
Outdated
|
|
||
| private setupContextExecutionCleanup(executionContext?: InternalExecutionContextSetup) { | ||
| if (!executionContext) return; | ||
| this.server!.events.on('response', function () { |
There was a problem hiding this comment.
Is response covering all the request EOL scenarios? e.g is this handler called in case of internal handler error?
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
@pgayvallet
From the nodejs test, we can see that the It makes To make sure we don't introduce a memory leak, I added a long string to the execution context: executionContext?.set({
...parentContext,
requested,
randomString: Math.random().toString().repeat(100_000), // 1.8Mb per a single request!
});and ran load-testing for 2*6 minutes Memory consumption on the Monitoring page: But anyway we should add a flag to disable
fair point. I can put the logic for legacy
yeah, as mentioned in the PR title, with nodejs/node#38577 landing to nodejs v14
@joshdover yes, From mshustov#8: |
| ...this.config.requestHeadersWhitelist, | ||
| ]); | ||
| scopedHeaders = filterHeaders( | ||
| { ...requestHeaders, ...requestIdHeaders, ...authHeaders }, |
There was a problem hiding this comment.
We still pass 'x-opaque-id' header if executionContext service is disabled
|
|
||
| export const config: ServiceConfigDescriptor<ExecutionContextConfigType> = { | ||
| path: 'execution_context', | ||
| schema: configSchema, |
There was a problem hiding this comment.
Note: I didn't pass the config value on the client. I don't see a lot of benefits of making ExecutionContextContainermethods no-ops as they don't add a lot of overhead. Any objections?
There was a problem hiding this comment.
I think it's fine to have the client always be 'enabled' regardless of the config value.
| import { ServiceConfigDescriptor } from '../internal_types'; | ||
|
|
||
| const configSchema = schema.object({ | ||
| enabled: schema.boolean({ defaultValue: true }), |
There was a problem hiding this comment.
We can disable it by default based on the outcome of #102706
In the long term, the service should be enabled by default.
pgayvallet
left a comment
There was a problem hiding this comment.
Don't see anything else, LGTM.
|
|
||
| // the trimmed value in the server logs is better than nothing. | ||
| function enforceMaxLength(header: string): string { | ||
| return header.slice(0, MAX_BAGGAGE_LENGTH); |
There was a problem hiding this comment.
Feels quite complex, so I'd say it's fine keeping it as you did for now. Let's use this initial implementation and see with our usages if the limit is effectively reached for any real usage.
|
|
||
| export const config: ServiceConfigDescriptor<ExecutionContextConfigType> = { | ||
| path: 'execution_context', | ||
| schema: configSchema, |
There was a problem hiding this comment.
I think it's fine to have the client always be 'enabled' regardless of the config value.
test/plugin_functional/test_suites/core_plugins/execution_context.ts
Outdated
Show resolved
Hide resolved
src/core/server/execution_context/integration_tests/tracing.test.ts
Outdated
Show resolved
Hide resolved
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Public APIs missing exports
Page load bundle
History
To update your PR or re-run it, just comment with: |
* add execution context service on the server-side * integrate execution context service into http service * add integration tests for execution context + http server * update core code * update integration tests * update settings docs * add execution context test plugin * add a client-side test * remove requestId from execution context * add execution context service for the client side * expose execution context service to plugins * add execution context service for the server-side * update http service * update elasticsearch service * move integration tests from http to execution_context service * integrate in es client * expose to plugins * refactor functional tests * remove x-opaque-id from create_cluster tests * update test plugin package.json * fix type errors in the test mocks * fix elasticsearch service tests * add escaping to support non-ascii symbols in description field * improve test coverage * update docs * remove unnecessary import * update docs * Apply suggestions from code review Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> * address comments * remove execution context cleanup * add option to disable execution_context service on the server side * put x-opaque-id test back * put tests back * add header size limitation to the server side as well * fix integration tests * address comments Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
* add execution context service on the server-side * integrate execution context service into http service * add integration tests for execution context + http server * update core code * update integration tests * update settings docs * add execution context test plugin * add a client-side test * remove requestId from execution context * add execution context service for the client side * expose execution context service to plugins * add execution context service for the server-side * update http service * update elasticsearch service * move integration tests from http to execution_context service * integrate in es client * expose to plugins * refactor functional tests * remove x-opaque-id from create_cluster tests * update test plugin package.json * fix type errors in the test mocks * fix elasticsearch service tests * add escaping to support non-ascii symbols in description field * improve test coverage * update docs * remove unnecessary import * update docs * Apply suggestions from code review Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> * address comments * remove execution context cleanup * add option to disable execution_context service on the server side * put x-opaque-id test back * put tests back * add header size limitation to the server side as well * fix integration tests * address comments Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com> Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>


Summary
Part of #102626
This PR adds an initial implementation of the
ExectuionContextservice that takes care of propagation runtime meta-information Kibana client App --> Kibana Server --> Elasticsearch server.Design
Client-side
Kibana plugins create
contextand pass it through their application logic to inject it tohttpservice call. Kibana Core will serializecontextobject and inject it as a custom header.Server-side
There are two cases:
contextobject. In this case, the context object is parsed and stored in AsyncLocalStorage. Whenever a plugin or Kibana Core calls Elasticseach server, some meta information from context (type + id) is attached to thex-opaque-idheader. If a search operation takes longer than expected, parameters of the incoming request (includingx-opaque-id) will be logged to thesearch slowlogsfile.executionContext.set(context)to attachcontextobject to the current async "thread". Unlike the logic on the client, the plugin doesn't need to pass the context object through all the layers of the application, nodejs already provides the API to store context through async operations.Elasticsearch
Receives
x-opaque-idheader, which starts withrequestIdfor the BWC with the logic introduced in #71019. It has the following format:x-opaque-id: 1234-5678-9000. ContainsrequestIdonly ifexecution contexthasn't been attached.x-opaque-id: 1234-5678-9000;kibana:tsvb:5b2de169-2785-441b-ae8c-186a1936b17dcontains requestId +kibana:executionContext.type:executionContext.idif the context has been attached.Next steps
In the next iteration, I'm going to add support for nested execution contexts. It can be used to compose execution context relationships across different apps:
Performance impact
Usage of AsyncLocalStorage and AsyncHooks are not free. Keeping track of async context does add some overhead.
I ran DemoJourney of https://github.com/elastic/kibana-load-testing with 100 concurrent users and saw the total 95th percentile of response time increased by a few percent. However, response time in a few scenarios increased by 5-30%
See detailed report
Before:before.tar.gz
After:

after.tar.gz
Right now plan to keep the logic enabled by default for all the users. Before the
v7.15release we should measure the performance overhead of the final solution in #102706 Based on the final result, we might make the service opt-in.Also, there is a PR in nodejs v14 that should improve
async_hooksperformance by 3-4 times.Checklist
For maintainers