-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default Tracer propagating SpanContext by default or not #208
Comments
Suppose an application has a dependency on an OT web framework or RPC integration, but doesn't load the SDK. Don't we want to support context propagation by default in that case? On the other hand, if there's no mechanism for including the span context in outgoing calls then we shouldn't propagate the context in-process. It might be helpful to frame the problem this way: libraries depend on OT either to:
If an application depends on any library like (2) -- or has an explicit dependency on the API -- it and all other libraries it depends on should get a Does this sound right to you? Zero-config inter-process context prop seems like an important use case to me, but if it turns out that the current defaults mean people are wasting cycles propagating spans internally that are never exported or passed down the call chain then that's a good argument for changing them. |
related to #183 |
It would make things a lot simpler if it was just a no-op tracer and require the SDK or other impl if context propagation is required. As I say in #183 it is very unclear what is expected of this default tracer. And I think it leads to potentially more confusion and even bad data if the default tracer is propagating span context without creating new spans. So the DefaultTracer either needs to be no-op or it needs to be the regular tracer found in the SDK. The only other option I can think of is the default tracer acts the same as the SDK tracer for span start and finish but anything else like recording events, adding attributes, etc would be no-ops. Hm, or, the default tracer could be the same as the SDK tracer only when called with a remote parent. That way when the remote context comes in to |
Hi all, trying to resolve this ticket for the v0.2 release. For now, I suggest we resolve this by saying the the default will be a No-op for the time being. Some reasoning below. Property of least surpriseA noop implementation is the most intuitive result, when no SDK is present. It is not possible to correctly intuit any other behavior, one must read the docs. If one has not installed an SDK or taken other action, what are the chances that one has read the docs? Extra headers added to your requests out of nowhere is surprising behavior indeed. Unexplored edge casesWe have not fully explored all the ways that this could turn sour. Here are some examples:
Some of the above is hand wavy. My intention is to show that enabling this behavior is not a simple choice. In a world where we will be asking many, many libraries to natively integrate our API, avoiding surprising (and potentially dangerous) behavior should be a priority. I suggest that even though there are some advantages to propagation-by-default, for now they are outweighed by the risks and unknowns. |
Regarding your first point ("Untrustworthy observations"): There was discussion in Python (open-telemetry/opentelemetry-python#159) to solve this by generating new Span IDs even in the API-only propagation. Then instead of One IMHO particularly important point from that disussion: The "untrustworthy observation" may also occur if the SDK is installed and configured but the spans for A and C are sampled while the one for B is not. |
+1 to no-op. |
+1 to no-op for now. |
Enabling API everywhere is a long term goal. For now API enablement is quite involved process. Mostly people will enable it together with SDK. So API-only behavior can also be seen as API with the "disabled" SDK behavior. Where propagation is expected from my opinion. Also with the move to separation of context propagation into a separate package in v0.3 that is discussed in oteps - automatic context propagation even in absence of SDK may again be a reality. So I'd recommend to keep propagation behavior in API-only mode and wait for end user feedback. |
Some comments on "untrustworthy observations" only:
FWIW my reading of the spec is that we always have to generate new span IDs, proxies excepted. @SergeyKanzhelev is actually the author of that part of the spec: w3c/trace-context#108. He may be able to give us the word of god here.
It's technically possible, but I don't think this is likely to happen in practice. The client that creates the first span makes the sampling decision, and downstream clients can either respect that decision or choose not to sample. This is the difference between the |
Well said. I agree that we should be careful about making this library too magical and stick to conservative defaults. And that there are a lot of edge cases still to consider. To make sense of these examples we have to distinguish in- and inter-process context propagation. AFAICT, in every language, the API package itself doesn't change outbound headers. For example:
The question here isn't whether we're going to add extra headers to requests, but whether applications need to depend on the SDK (in addition to whichever plugins they need) to propagate the trace context. |
@c24t I agree, I do not see how distributed context propagation can occur without installing plugins... so we are talking about a scenario where the are plugins already installed. Eventually native instrumentation become may become popular, and all software libraries could automatically provide context/header propagation. Until that happens, this scenario concerning automated context propagation – without knowledge or action on the part of the application developer – is pretty much an edge case. I don't think there is an obvious way to enable API-only behavior. More realistically, at this time, I suggest we focus on providing the following for our users:
This combo – a zero-config auto-installation – simplifies the process of organizational adoption. @SergeyKanzhelev I do understand the intent - developers are bad at configuring their systems, and distributed tracing systems benefit from every individual service being properly configured. So it would be great if we could enable context propagation without requiring the application developer to take action. However, for the reasons stated above, I do not believe that an API-only approach would effectively solve this problem any time soon. And since any default behavior creates edge cases which have not been addressed, it seems like this idea needs more time to incubate. Up till now, we've taken a conservative approach to adding new features which have not been validated - I suggest we continue to think about this feature (preferably via an RFC), and for now ship with a no-op implementation. |
Interestingly that the spec is currently already saying this while the decision that was made in
As for the arguments - we all saying the same words and listing the same assumptions about the state of the world. However we getting to the different conclusion somehow. I believe that this project with upcoming improvements like context propagation separation, pre-installed OT collector exporter, introducing logging support is going to the direction when the mere fact of presence of OT API must initiate propagation of context. So earlier we will declare that - easier it will be to explain to end users. This is what every partner in W3C discussions is looking for. End users control less components in their infrastructure and no matter whether they can enable monitoring or not - they want to have an end-to-end visibility. So ideally, the context propagation is a feature of a platform itself. In other words, the feature of an API-only deployments. Anyway, I'd suggest we change the wording in specs to the W3C in API-only. And get the end-user feedback. Thoughts? |
@SergeyKanzhelev that spec only says what to do when crossing process boundaries. Does context propagate within the process when no SDK is enabled? And if it does, how? How here referring to what is the current active context supposed to be when only the API is loaded and a |
@tsloughter it's a good question that definitely need a clarification, even for the SDK-only case. I'd say whatever works with the propagators must read/write contexts all the time. Spans needs to be associated with the context. However for inner spans - tracer may return the same span all the time or reuse active. |
Closing this issue and opening a new one, as propagation went away from the Tracer component. |
Besides the base
Tracer
interface, the Java implementation offers a default, no-opDefaultTracer
(https://github.com/open-telemetry/opentelemetry-java/blob/master/api/src/main/java/io/opentelemetry/trace/DefaultTracer.java) that propagatesSpanContext
.This has the advantage of providing context propagation by default, but adds a little bit of implementation-specific part on the API side (as actual in-process propagation needs to be done, which in turn requires additional dependencies, etc).
Consider having this no-op
Tracer
as a pure no-op instance, without any context propagation at all.(A possible alternative would be offering the no-op + context propagation option in the SDK, where actual implementation code can take place)
cc @bogdandrutu @iredelmeier
The text was updated successfully, but these errors were encountered: