Instrumentation layers and suppressing duplicates #172

lmolkova · 2021-08-26T18:05:59Z

Proposal to create clarity on instrumentation layers interaction and how to suppress duplicate layers (e.g. multiple instrumented HTTP clients layers).

cc @trask

text/trace/0189-instrumentation-layers-suppression.md

lmolkova · 2021-08-27T21:44:36Z

@iNikem I hear your concern to have no suppression at all - I added an open question. I'd still like to understand why we'd need it and how common this need it.

text/trace/0189-instrumentation-layers-suppression.md

Co-authored-by: Johannes Tax <[email protected]>

lmolkova · 2021-09-22T20:42:52Z

Based on the discussions on the instrumentation SIG meeting, I've changed this OTEP to minimize tracing API changes. With the current approach, instrumentation specifies type at span creation time and SDK returns a suppressed span that's very similar to sampled-out, except it never becomes current.

Example: https://github.com/lmolkova/opentelemetry-java/pull/1/files#diff-48014f0437bde243933a6c8902a1f386fab6c78d6777019f24db1951e0070fc4

Open questions:

messaging spans don't work great with it and I suggest working on it in messaging spec (there are more reasons than clear layering and suppression)
adding optional optimization API to let instrumentation check first if span should be started
defaults
naming improvements are always welcome

jmacd · 2021-09-24T00:03:53Z

text/trace/0189-instrumentation-layers-suppression.md

+## Motivation
+
+- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API
+- Give mechanism to suppress instrumentation layers for the same convention: e.g. multiple instrumented HTTP clients using each other.


Why are there cases of multiple instrumentation HTTP clients using each other? That sounds like an odd configuration of clients. I wonder if when combining these clients, one can disable instrumentation of the other?

The DB call on top fo a REST call -- you propose to suppress the REST instrumentation, I take it? (Could the DB be configured with an un-instrumented REST client, instead?)

I wonder if it would be possible to replace InstrumentationType with InstrumentationLibrary.Name. Could you accomplish the same functional behavior, if for all spans with a non-remote parent you knew the InstrumentationLibrary of the nearest not-suppressed ancestor? It might make the proposal both simpler and more general: InstrumentationLibrary is already defined; it means you don't need an expandable enum.

Why are there cases of multiple instrumentation HTTP clients using each other?

One case is manual/native + auto: Client libraries: Manual/programmatic and auto instrumentation duplication problem opentelemetry-specification#1767

Also, HTTP clients test to use each other (python, js, java all have one or another way to suppress HTTP instrumentation)

in some cases, they are pluggable, i.e. higher level don't really know if the lower level is instrumented - this is more popular in SERVER spans

in some cases, I guess higher-level clients instrumentation is not necessary

The DB call on top fo a REST call -- you propose to suppress the REST instrumentation, I take it? (Could the DB be configured with an un-instrumented REST client, instead?)

I suggest that

DB and REST are two layers, i.e. they are both spans (and not DB attributes stamped on HTTP span)

Depending on the configuration we can either

suppress nested HTTP (to reduce noise if users/backends don't want it) - note in this case DB span context will be propagated over HTTP

not suppress HTTP and have full picture

InstrumentationType with InstrumentationLibrary.Name

I'd love that, however InstrumentationLibrary.Name could be GoogleHttpClient or ApacheHttpClient while they both implement HTTP semantic conventions. The alternative could be to set InstrumentationType on the tracer resource in addition to InstrumentationLibrary.Name and version. I'd also put the semantic convention version, it follows, on the resource as well.

InstrumentationType with InstrumentationLibrary.Name

Experimented with this idea: instead of SpanBuilder.setType, I added 'type' parameter to library instrumentation info when tracer is being built:

it makes tracers type-aware, i.e. tracers should not emit spans of different types. Which makes sense, but a new thing.

it's even more subtle than the span creation option. App and lib devs would have a hard time paying attention and will end up with multi-type tracers (i.e. no suppression).

I updated the PR lmolkova/opentelemetry-java#1.

Will bring it up at the Tue Instrumentation SIG meeting.

I also entertained the idea of having a single tracer per type and returning noop tracers if one with the same type is registered - but it depends on the tracer creation time and is not stable.

MovieStoreGuy

I have left a few questions in sections I'd like some more clarification.

My main points come down to, what data would a service potential discard? Can I still get enough information to investigate issues within my service? What is potential reduction in outgoing are we expecting to see?

MovieStoreGuy · 2021-09-29T04:10:31Z

text/trace/0189-instrumentation-layers-suppression.md

+  - During span creation, checks if span should be suppressed (if there is another one for kind + type on the parent `Context`) and if there is, returns a *suppressed* span, which is
+    - non-recording
+    - propagating (carries parent context)
+    - does not become current (i.e. `makeCurrent` call with it is no-op)


Is there ever a case that a user may want this?

Suppressed spans are never reported and are basically duplicates. If we'd make them current, following may happen:

HTTP tracing middleware creates Span-1

HTTP servlet instrumentation creates Span-2 - suppressed. if instrumentation will make it current, it will lead to attributes stamped on the wrong span

users do Span.current().setAttribute - they expect it to be on Span1 and not Span2 (which they never even know was suppressed).

MovieStoreGuy · 2021-09-29T04:21:00Z

text/trace/0189-instrumentation-layers-suppression.md

+  - current is span1
+  - if explicit parent was used (span2), it carries span1 context
+
+Suppression logic is configurable and encapsulated in span builder SDK implementation - specific library instrumentation should not depend on configuration, e.g.:


What should be the implementing libraries / sdks default?

libraries never think about the configuration. I believe suppression by kind + type is a good default since it shows all layers suppressing duplicate instrumentations only.
I don't have too strong an opinion - vendors/users should be able to easily change it.

MovieStoreGuy · 2021-09-29T04:54:18Z

text/trace/0189-instrumentation-layers-suppression.md

+
+- HTTP SERVER span (middleware)
+  - Controller INTERNAL span
+    - HTTP CLIENT call - 1 (Google HTTP client)


Can I ask what data is lost in this process of suppression? Is the intent that all the data that would be part of the spans that are now being surpressed, is that being discard, aggregated into the parent span?

We assume that two different HTTP clients send almost identical spans (if they follow OTel HTTP spec). The duration might be slightly different, but essentially they report the same thing. So we'll keep the outer span and fully disregard the inner span.

MovieStoreGuy · 2021-09-29T04:58:48Z

text/trace/0189-instrumentation-layers-suppression.md

+- backends don't always support nested CLIENT spans (extra hints needed for Application Map to show outbound connection)
+- users may prefer to reduce verbosity and costs by suppressing spans of same kind
+
+So following strategies should be supported:


What I'd really like to know is what information is going to be lost because of this?

For example, during an incident the suppressed data could help provide useful clues into the problematic calls to either downstreams or internal calls that would have made up part of the span.

Yes, e.g. you have DB running over HTTP.

CLIENT-only suppression will cause all inner CLIENT spans to be suppressed. Our conventions make each CLIENT span have enough information to identify which endpoint it calls into. DB spans have endpoint info. None of the HTTP details would be recorded (no status codes, tries, etc), only the overall result will be set on DB span. You'd choose this configuration if you only care about the logical results and don't care much about transport or if you're saving costs at all means. If you do HTTP requests without outer DB client call, they will be recorded as usual.

CLIENT + type suppression would keep DB and nested HTTP spans. If will only suppress duplicate HTTP client spans if multiple instrumentations work together. You'd choose this configuration if you want more details.

I understand your concern about suppression in general, what I hear from users some of them may want to sacrifice a bit of verbosity to save money. And kind + type suppression only suppresses duplicate data for those who can afford it.

MovieStoreGuy · 2021-09-29T05:05:39Z

text/trace/0189-instrumentation-layers-suppression.md

+
+[Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type.
+
+## Trade-offs and mitigation


I would like to more included in this section, I find it hard to suggest this to internal users without understanding the implications to the data.

what do you want to see in this section? How to choose a suppression strategy setting for a service assuming it will make it to the spec or what are the trade-offs of introducing these API changes?

Wooh, sorry. I forgot I had made a comment here.

Something that tells me the risk / potential loss of data that I am expecting.

pyohannes · 2021-10-05T21:52:43Z

text/trace/0189-instrumentation-layers-suppression.md

+### Spec changes proposal
+
+- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. It MUST specify it when creating a span.
+- Tracing Semantic Conventions: Client libraries instrumentation MUST make context current to enable correlation with underlying layers of instrumentation


I assume this doesn't work for manual parenting (without using context). However, I also assume that's not a big deal, as context will be used anyway to parent spans across library boundaries?

It should work just fine with context propagated manually. The suppression should be done through key on the context and how it's propagated is done is irrelevant.

However, I expect the duplication problem to heavily affect magical instrumentation. Apps that do most of the things manually have enough control to prevent it without type hint or suppression.

pyohannes · 2021-10-05T21:54:38Z

text/trace/0189-instrumentation-layers-suppression.md

+
+## Motivation
+
+- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API


Could you define the term "instrumentation layer" in this document? I'm not sure if it refers to a single span, or to the whole span tree that's created by a single library.

pyohannes · 2021-10-05T22:30:56Z

text/trace/0189-instrumentation-layers-suppression.md

+
+### Spec changes proposal
+
+- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. It MUST specify it when creating a span.


As part of this, we'd also need to define which semantic conventions are composable and which are not.

Some semantic conventions define it (e.g. Couch DB wants to have HTTP attributes on DB spans). Unless it's already explicitly defined, we say conventions are not composable.
It looks like we should allow multiple types on the API surface if we want to allow that.

pyohannes · 2021-10-05T22:49:44Z

text/trace/0189-instrumentation-layers-suppression.md

+
+## Motivation
+
+- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API


Related to the questions about a definition of layers: there are discussing about actually modelling HTTP request with multiple spans (for e. g. retries or forwards). Should we aim to cover those cases with a suppression proposal?

nope, for now, I assume we have one layer of HTTP spans (if we'll have more for logical + physical, I'd treat them as two different layers) - they can't carry the same information when both are enabled.

HTTP instrumentation would need configuration to emit 1) logical or 2) physical or 3) both to avoid redundancy and doubling of costs. I don't see a way to achieve it with suppression

tedsuo · 2023-02-06T17:21:05Z

@trask @lmolkova is this something the HTTP semconv working group would like to pick up again?

trask · 2023-02-10T00:46:49Z

@trask @lmolkova is this something the HTTP semconv working group would like to pick up again?

the scope of the current incarnation of the HTTP semconv working group is "just" stability. we discussed this and believe it can be addressed post-stability.

Fixes #3172 (Built on top of #4088) ## Changes - Explains kinds without assuming presence of parent/children - Adds links as another correlation mechanism - Normalizes nested client spans even further - database, messaging, RPC, and LLM semantic conventions require CLIENT kind for logical client operation. - Does not touch INTERNAL kind yet - #4179 * [x] Related issues #3172, open-telemetry/semantic-conventions#674, open-telemetry/oteps#172, open-telemetry/semantic-conventions#1315 * ~~[ ] Related [OTEP(s)](https://github.com/open-telemetry/oteps) #~~ * ~~[ ] Links to the prototypes (when adding or changing features)~~ * [x] [`CHANGELOG.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md) file updated for non-trivial changes * ~~[ ] [`spec-compliance-matrix.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) updated if necessary~~ --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>

Fixes open-telemetry#3172 (Built on top of open-telemetry#4088) ## Changes - Explains kinds without assuming presence of parent/children - Adds links as another correlation mechanism - Normalizes nested client spans even further - database, messaging, RPC, and LLM semantic conventions require CLIENT kind for logical client operation. - Does not touch INTERNAL kind yet - open-telemetry#4179 * [x] Related issues open-telemetry#3172, open-telemetry/semantic-conventions#674, open-telemetry/oteps#172, open-telemetry/semantic-conventions#1315 * ~~[ ] Related [OTEP(s)](https://github.com/open-telemetry/oteps) #~~ * ~~[ ] Links to the prototypes (when adding or changing features)~~ * [x] [`CHANGELOG.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md) file updated for non-trivial changes * ~~[ ] [`spec-compliance-matrix.md`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) updated if necessary~~ --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: Yuri Shkuro <[email protected]> Co-authored-by: Trask Stalnaker <[email protected]>

Instrumentation layers and suppressing duplicates

5641953

lmolkova requested review from a team August 26, 2021 18:06

yurishkuro reviewed Aug 26, 2021

View reviewed changes

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

review

0f7e292

iNikem reviewed Aug 27, 2021

View reviewed changes

lmolkova added 2 commits August 27, 2021 14:41

review and lint

273b955

trailing spaces

8b81638

type -> convention

d655c9d

trask reviewed Aug 27, 2021

View reviewed changes

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

lmolkova added 2 commits August 30, 2021 12:18

Add suppress none and exists API

9adb92b

conventions can include other conventions - allow that

27d99cd

pyohannes reviewed Sep 8, 2021

View reviewed changes

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

pyohannes reviewed Sep 8, 2021

View reviewed changes

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

pyohannes reviewed Sep 8, 2021

View reviewed changes

text/trace/0189-instrumentation-layers-suppression.md Outdated Show resolved Hide resolved

lmolkova and others added 3 commits September 7, 2021 15:21

Update text/trace/0189-instrumentation-layers-suppression.md

ee85a3c

Co-authored-by: Johannes Tax <[email protected]>

Add InstrumenttionType and builder.setType(type) instead of SpanKey

f365dbe

merge

3794ad0

lmolkova added 2 commits September 22, 2021 13:43

lint

75d5126

minor

bd3eab0

lmolkova mentioned this pull request Sep 23, 2021

Sampling decison is too late to gain much performance open-telemetry/opentelemetry-specification#620

Open

jmacd reviewed Sep 24, 2021

View reviewed changes

MovieStoreGuy reviewed Sep 29, 2021

View reviewed changes

pyohannes reviewed Oct 5, 2021

View reviewed changes

lmolkova mentioned this pull request Oct 23, 2021

Fix OpenTelemetry context propagation and duplication issues Azure/azure-sdk-for-java#25012

Merged

Aneurysm9 mentioned this pull request Feb 23, 2022

OTLP*HTTP Exporter: Custom HTTP client open-telemetry/opentelemetry-go#2632

Open

pyohannes mentioned this pull request Nov 3, 2022

Refactor messaging attributes and per-message attributes in batching scenarios open-telemetry/opentelemetry-specification#2763

Closed

trask mentioned this pull request Jan 19, 2023

add SuppressTracing flag open-telemetry/opentelemetry-specification#3103

Closed

tedsuo added triaged area:sem-conv labels Feb 6, 2023

lmolkova mentioned this pull request Mar 6, 2023

Equivalent of "logLevel" for spans open-telemetry/opentelemetry-specification#3205

Open

trask mentioned this pull request Mar 6, 2023

Should HTTP semantic conventions say anything about nested spans? open-telemetry/opentelemetry-specification#3281

Closed

lmolkova closed this May 11, 2023

lmolkova mentioned this pull request Jun 7, 2023

Is a "generic solution" for not duplicating spans still under construction? Is there any reference open-telemetry/opentelemetry.io#2827

Open

lmolkova mentioned this pull request Jul 6, 2023

Allow suppressing nested activities in native instrumentations without dependency on OTel SDK open-telemetry/opentelemetry-dotnet#4641

Closed

lmolkova mentioned this pull request Sep 5, 2023

Use HTTP instead of _OTHER in HTTP span names (if method is unknown) open-telemetry/semantic-conventions#270

Merged

3 tasks

lmolkova mentioned this pull request Jan 30, 2024

DB, messaging, gRPC (over HTTP): clarify nested client spans open-telemetry/semantic-conventions#674

Open

pyohannes mentioned this pull request Feb 8, 2024

Turn off spans from a scope while retaining downstream spans and without breaking traces open-telemetry/opentelemetry-specification#3867

Closed

lmolkova mentioned this pull request Aug 6, 2024

Refactor description of span kind open-telemetry/opentelemetry-specification#4178

Merged

2 tasks

lmolkova mentioned this pull request Aug 21, 2024

Support dynamic enrichment and customization for telemetry collected by instrumentation libraries open-telemetry/sig-developer-experience#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation layers and suppressing duplicates #172

Instrumentation layers and suppressing duplicates #172

lmolkova commented Aug 26, 2021

lmolkova commented Aug 27, 2021

lmolkova commented Sep 22, 2021 •

edited

Loading

jmacd Sep 24, 2021

lmolkova Sep 24, 2021 •

edited

Loading

lmolkova Sep 25, 2021 •

edited

Loading

MovieStoreGuy left a comment

MovieStoreGuy Sep 29, 2021

lmolkova Oct 5, 2021 •

edited

Loading

MovieStoreGuy Sep 29, 2021

lmolkova Oct 5, 2021

MovieStoreGuy Sep 29, 2021

lmolkova Oct 5, 2021

MovieStoreGuy Sep 29, 2021

lmolkova Oct 5, 2021

MovieStoreGuy Sep 29, 2021

lmolkova Oct 5, 2021

MovieStoreGuy Nov 2, 2021

pyohannes Oct 5, 2021

lmolkova Oct 5, 2021

pyohannes Oct 5, 2021

pyohannes Oct 5, 2021

lmolkova Oct 5, 2021

pyohannes Oct 5, 2021

lmolkova Oct 5, 2021

tedsuo commented Feb 6, 2023

trask commented Feb 10, 2023


		[Instrumentation API in Java implementation](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/SpanKey.java) with suppression by type.

		## Trade-offs and mitigation


		## Motivation

		- Provide clarity for instrumentation layers: e.g. DB calls on top of REST API


		### Spec changes proposal

		- Tracing Semantic Conventions: Each span MUST follow a single (besides general and potentially composite), convention, specific to the call it describes. It MUST specify it when creating a span.

Instrumentation layers and suppressing duplicates #172

Instrumentation layers and suppressing duplicates #172

Conversation

lmolkova commented Aug 26, 2021

lmolkova commented Aug 27, 2021

lmolkova commented Sep 22, 2021 • edited Loading

Choose a reason for hiding this comment

lmolkova Sep 24, 2021 • edited Loading

Choose a reason for hiding this comment

lmolkova Sep 25, 2021 • edited Loading

Choose a reason for hiding this comment

MovieStoreGuy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Oct 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsuo commented Feb 6, 2023

trask commented Feb 10, 2023

lmolkova commented Sep 22, 2021 •

edited

Loading

lmolkova Sep 24, 2021 •

edited

Loading

lmolkova Sep 25, 2021 •

edited

Loading

lmolkova Oct 5, 2021 •

edited

Loading