-
Notifications
You must be signed in to change notification settings - Fork 164
Instrumentation layers and suppressing duplicates #172
Changes from 2 commits
5641953
0f7e292
273b955
8b81638
d655c9d
9adb92b
27d99cd
ee85a3c
f365dbe
3794ad0
75d5126
bd3eab0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# Instrumentation Layers and Suppression | ||
|
||
This document describes approach for instrumentation layers, suppressing duplicate layers and unambiguously enriching spans. | ||
|
||
## Motivation | ||
|
||
- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to the questions about a definition of layers: there are discussing about actually modelling HTTP request with multiple spans (for e. g. retries or forwards). Should we aim to cover those cases with a suppression proposal? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nope, for now, I assume we have one layer of HTTP spans (if we'll have more for logical + physical, I'd treat them as two different layers) - they can't carry the same information when both are enabled. HTTP instrumentation would need configuration to emit 1) logical or 2) physical or 3) both to avoid redundancy and doubling of costs. I don't see a way to achieve it with suppression |
||
- Give mechanism to suppress instrumentation layers for the same type: e.g. multiple instrumented HTTP clients using each other. | ||
- Give mechanism to enrich specific spans unambiguously: e.g. HTTP server span with routing information | ||
|
||
## Explanation | ||
|
||
### Spec changes proposal | ||
|
||
- Semantic Conventions: Each span MUST follow one convention, specific to the call it describes | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Trace API: Add `SpanKey` API that gets span following specific convention from the context (e.g. `SpanKey.HTTP_CLIENT.fromContextOrNull(context)`). | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Semantic Conventions: instrumentation MUST back off if span of same kind and following same contention is already in the context by using `ContextKey` API. | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation | ||
- OTel SDK SHOULD allow suppression strategy configuration: | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- suppress nested by kind (e.g. only one CLIENT allowed) | ||
- suppress nested by kind + convention it follows (only one HTTP CLIENT allowed, but outer DB -> nested HTTP is ok) | ||
|
||
#### SpanKey API | ||
|
||
SpanKey allows to read/write span to context. | ||
Defines known SpanKey, shared between instrumentations (static singletons): HTTP, RPC, DB, Messaging. | ||
|
||
#### Example | ||
|
||
- HTTP Client 1: | ||
- Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(ctx)` | ||
- No HTTP client span on the context: | ||
- start span | ||
- store span in context: `SpanKey.HTTP_CLIENT.storeInContext(ctx, span)` | ||
- Make `ctx` current | ||
- Http Client 2: | ||
- Gets HTTP CLIENT span from context: `SpanKey.HTTP_CLIENT.fromContextOrNull(Context.current())` | ||
- HTTP client span is already there: do not instrument | ||
|
||
Suppression logic is configurable and encapsulated in `SpanKey` - instrumentation should not depend on configuration, e.g.: | ||
|
||
- suppressing by kind only - context key does not distinguish conventions within kind | ||
- `SpanKey.HTTP_CLIENT.fromContextOrNull` returns CLIENT span | ||
- `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT span context key | ||
- suppressing by kind + convention - context key is per convention and kind | ||
- `SpanKey.HTTP_CLIENT.fromContextOrNull` returns HTTP CLIENT span | ||
- `SpanKey.HTTP_CLIENT.storeInContext` stores span in CLIENT + convention span context key | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Internal details | ||
|
||
Client libraries frequently use common protocols (HTTP, gRPC, DB drivers) to perform RPC calls, which are usually instrumented by OpenTelemetry. | ||
At the same time, client library is rarely a thin client and may need its own instrumentation to | ||
|
||
- connect traces to application code | ||
- provide extra context: | ||
- duration of composite operations | ||
- overall result of all operation | ||
- any extra library-specific information not available on transport call span | ||
|
||
Both, client library 'logical' and transport 'physical' spans are useful. They also rarely can be combined together because they have 1:many relationship. | ||
|
||
So instrumentations form *layers*, where each layer follows specific convention (or describes certain library). | ||
|
||
*Example*: | ||
|
||
- HTTP SERVER span | ||
- DB CLIENT call - 1 | ||
- HTTP CLIENT call - 1 | ||
- DNS CLIENT | ||
- TLS CLIENT | ||
- HTTP CLIENT call - 2 | ||
|
||
There are two HTTP client spans under DB call, they are children of DB client spans. DB spans follow DB semantics only, HTTP spans similarly only follow HTTP semantics. If there are other layers of instrumentation (TLS) - it happens under HTTP client spans. | ||
|
||
### Duplication problem | ||
|
||
Duplication is a common issue in auto-instrumentation: | ||
|
||
- e.g. HTTP clients frequently are built on top of other HTTP clients, making multiple layers of HTTP spans | ||
- Libraries may decide to add native instrumentation for common protocols like HTTP or gRPC: | ||
- to support legacy correlation protocols | ||
- to make better decisions failures (e.g. 404, 409) | ||
- give better library-specific context | ||
- support users that can't or don't want to use auto-instrumentation | ||
|
||
So what happens in reality without attempts to suppress duplicates: | ||
|
||
- HTTP SERVER span (middleware) | ||
- HTTP SERVER span (servlet) | ||
- Controller INTERNAL span | ||
- HTTP CLIENT call - 1 (Google HTTP client) | ||
- HTTP CLIENT call - 1 (Apache HTTP client) | ||
|
||
#### Proposed solution | ||
|
||
Disallow multiple layers of the same instrumentation, i.e. above picture translates into: | ||
|
||
- HTTP SERVER span (middleware) | ||
- Controller INTERNAL span | ||
- HTTP CLIENT call - 1 (Google HTTP client) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can I ask what data is lost in this process of suppression? Is the intent that all the data that would be part of the spans that are now being surpressed, is that being discard, aggregated into the parent span? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We assume that two different HTTP clients send almost identical spans (if they follow OTel HTTP spec). The duration might be slightly different, but essentially they report the same thing. So we'll keep the outer span and fully disregard the inner span. |
||
|
||
To do so, instrumentation: | ||
|
||
- checks if span with same kind + type is registered on context already | ||
- yes: backs off, never starting a span | ||
- no: starts a span and registers it on the context | ||
|
||
Registration is done by writing a span on the context under the key. For this to work between different instrumentations (native and auto), the API to access spans must be in Trace API. | ||
|
||
Same mechanism can be used by users/instrumentations to enrich spans, e.g. add route info to HTTP server span (current span is ambiguous) | ||
|
||
### Configuration | ||
|
||
Suppression strategy should be configurable: | ||
|
||
- backends don't always support nested CLIENT spans (extra hints needed for Application Map to show outbound connection) | ||
- users may prefer to reduce verbosity and costs by suppressing spans of same kind | ||
|
||
So two strategies should be supported: | ||
lmolkova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- suppress all nested of same kind | ||
- suppress all nested of same kind + type (default?) | ||
|
||
### Implementation | ||
|
||
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java | ||
|
||
## Trade-offs and mitigations | ||
|
||
Trace API change is needed to support native library instrumentations - taking dependency on unstable experimental instrumentation API (or common contrib code) is not a good option. Instrumentation API is a good temporary place until we can put it in Trace API, native instrumentation can use reflection to access `SpanKey` in instrumentation API. | ||
|
||
## Prior art and alternatives | ||
|
||
- Terminal context: suppressing anything below | ||
- Exposing spans stack and allowing to walk it accessing span properties | ||
- Suppress all nested spans of same kind | ||
- Make logical calls INTERNAL | ||
|
||
Discussions: | ||
|
||
- https://github.com/open-telemetry/opentelemetry-specification/issues/1767 | ||
- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/903 | ||
- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/465 | ||
- https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1822 | ||
- https://github.com/open-telemetry/opentelemetry-specification/issues/526 | ||
- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/369 | ||
- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/445 | ||
- https://github.com/open-telemetry/opentelemetry-python-contrib/issues/456 | ||
|
||
## Open questions | ||
|
||
- Backends need hint to separate logical CLIENT spans from physical ones | ||
- Good default (suppress by kind or kind + type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you define the term "instrumentation layer" in this document? I'm not sure if it refers to a single span, or to the whole span tree that's created by a single library.