Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W3C TraceState underspecified #759

Closed
dyladan opened this issue Aug 5, 2020 · 18 comments · Fixed by #905
Closed

W3C TraceState underspecified #759

dyladan opened this issue Aug 5, 2020 · 18 comments · Fixed by #905
Assignees
Labels
area:api Cross language API specification issue priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:context Related to the specification/context directory spec:trace Related to the specification/trace directory

Comments

@dyladan
Copy link
Member

dyladan commented Aug 5, 2020

I was looking at the JS implementation of the tracing api as we move towards GA, and the tracestate is underspecified. There currently exists no direction on what API methods to interact with tracestate should exist, if any. The JS implementation currently as get(key): string | undefined, set(key, value): void, unset(key): void, and serialize(): string methods. There are a few problems with this:

  1. tracing system should not need access to arbitrary keys. only the otel key should be readable/writeable
  2. APIs for accessing keys are not specified. This has lead each SIG to implement their own API. None of these APIs are currently in use as far as I am aware.

As a solution, I recommend we specify that there is no public API to modify tracestate, and that the default sdk behavior is to propagate it unmodified for now. This leaves us open to define behavior in the future if we desire, even after GA release. If we do not specify this now, then it will be impossible to specify later without a breaking change in some SIGs.

@dyladan dyladan added the spec:trace Related to the specification/trace directory label Aug 5, 2020
@carlosalberto carlosalberto added the spec:context Related to the specification/context directory label Aug 5, 2020
@Oberon00 Oberon00 added the area:api Cross language API specification issue label Aug 6, 2020
@Oberon00
Copy link
Member

Oberon00 commented Aug 6, 2020

I still think SpanContext is a part of the tracing API that is not very well designed. I also don't think it fits very well with the post-OTEP66 Context. Copying myself from #525 (comment):

I feel like with the OTEP 66 Context API there is an opportunity to elegantly rectify this by removing the whole SpanContext from the API and hiding it away in the context. See

(Note that #510 is still open required-for GA, P2)

As for the TraceState part of the SpanContext: With SpanContext being required to be a final/sealed class (open-telemetry/oteps#58) and in the API spec, TraceState is the only place to store custom information with the SpanContext, which for example, the Dynatrace propagator and also the Dynatrace exporter requires (the propagator reads and writes routing information for the Span which the exporter uses). Note that this would not even be solvable otherwise with a fully custom SDK implementation.

@andrewhsu andrewhsu added the release:after-ga Not required before GA release, and not going to work on before GA label Aug 7, 2020
@arminru
Copy link
Member

arminru commented Aug 11, 2020

@andrewhsu are you sure it's right to classify it as release:after-ga since it's already there in some form but underspecified?

@dyladan mentioned in the issue description that it would make sense to explicitly specify it as not part of the API, which should hopefully be simple and uncontroversial:

As a solution, I recommend we specify that there is no public API to modify tracestate, and that the default sdk behavior is to propagate it unmodified for now. This leaves us open to define behavior in the future if we desire, even after GA release. If we do not specify this now, then it will be impossible to specify later without a breaking change in some SIGs.

@Oberon00
Copy link
Member

simple and uncontroversial

I just mentioned in my above comment that at least in Dynatrace's usage of the SDK, we do need some way to attach custom information to the SpanContext that is accessed in the propagators and the exporter. Whether this custom information is stored in the TraceState (the custom information is indeed stored and intentionally propagated as a W3C tracestate header entry) or only for manual use by the propagator is a secondary concern, but some way to add custom information to the SpanContext is required.

I would agree that the TraceState should not be accessible to instrumentations, but the current design of OpenTelemetry does not really allow this, with SpanContext being final/sealed and created in the SDK.

If the SpanContext was non-final and had an overridable propagate(newSpanId) method, then extractors (or the SDK in case of root spans) could create it and injectors/extractors could cast to their custom derived classes (cf. also open-telemetry/oteps#58).

If #510 (Consider using Context as the unique way to specify parenthood) was implemented, then we could remove the (TraceState from) SpanContext and store custom information in the Context, but only if also the exporter could get access to the full Context in which a Span was created, which may also be difficult to implement.

@Oberon00
Copy link
Member

I also think this is definitely required-for-ga though.

@arminru arminru added release:required-for-ga Must be resolved before GA release, or nice to have before GA and removed release:after-ga Not required before GA release, and not going to work on before GA labels Aug 11, 2020
@arminru
Copy link
Member

arminru commented Aug 11, 2020

From the spec meeting: moving it to GA to specify this

@dyladan
Copy link
Member Author

dyladan commented Aug 11, 2020

per spec SIG meeting: Before GA we need to do one of the following:

  1. Specify a public API for TraceState
  2. Specify that there is no public API to TraceState. This allows it to be specified as an additive, non-breaking change later.

@Oberon00
Copy link
Member

@open-telemetry/technical-committee Can this get a priority? I'd vote for P1 since this affects an API in SpanContext, a class that is very central in the Tracing API.

@iNikem
Copy link
Contributor

iNikem commented Aug 19, 2020

Trying to keep scope as minimal as possible, do I understand the problem correctly?

  1. There is a need to attach arbitrary vendor-specific information to SpanContext which should be accessible by propagators and exporters.
  2. There is an opinion that TraceContext is underspecified.

If above statements are true, then following are my answers to that.

We currently specify that TraceState follows W3C specification. That W3C specification says:

The main purpose of the tracestate HTTP header is to provide additional vendor-specific trace identification information across different distributed tracing systems

So TraceState is the correct place for "arbitrary vendor-specific information" and thus should be used to satisfy the need from point 1. above.

W3C specifies mutating operations on TraceState. It is also natural to assume that there should be an API to read key-values pairs from TraceContext.

I think that by delegating to W3C specification OpenTelemetry does sufficient work of specifying TraceContext API and thus point 2. from above is invalid.

Regarding the concern raised in the original comment that

tracing system should not need access to arbitrary keys. only the otel key should be readable/writeable

W3C specifies that mutating operations are allowed to operate on arbitrary keys. Indeed, by their nature (they are vendor specific), Otel cannot limit access to any keys TraceContext.

Taking all that into account, I think there is no need for any further actions regarding TraceContext API. Related concerns about whether this API should be available in Trace API or only to Trace SDK, and whether SpanContext should be part of the public API or not, can be addressed separately.

@Oberon00
Copy link
Member

Taking all that into account, I think there is no need for any further actions regarding TraceContext API.

I do think we should at least list the operations. We do not need to specify them in detail, but at least we need to make clear what is needed. AFAIK, some languages allow only accessing TraceState as a raw string. They would then at least need to provide utility functions.

Also, TraceState is immuatable in OpenTelemetry, by SpanContext being immutable. This is not a contradiction (you can manipulate a SpanContextBuilder before creating a (child) SpanContext), but might warrant some additional API design hints.

@iNikem
Copy link
Contributor

iNikem commented Aug 20, 2020

I do think we should at least list the operations

You don't think that linking to that list in W3C specification is enough?

@Oberon00
Copy link
Member

No, I don't think that's enough, as is evident by the diverging APIs in different languages. We don't have to describe each operations in detail (link to W3C should be enough) but we should list them at least.

Also I wonder if we should apply all limitations (like dropping old keys if it gets too long) only at process boundaries or already in-process. I would think only at process-boundaries, as TraceState may also be used by non-W3C propagators which may different limitations / encodings of the TraceState or special keys therein.

@iNikem
Copy link
Contributor

iNikem commented Aug 20, 2020

Ok, fair enough.

Then the solution to this issue should be an amendment to the spec, which copies a list of operations on TraceContext from W3C spec and notices that all validation/truncation requirements of W3C spec MAY happen as late as right before putting TraceContext on the wire.

@Oberon00 @dyladan @open-telemetry/specs-trace-approvers agreed?

@dyladan
Copy link
Member Author

dyladan commented Aug 21, 2020

Seems fine to me. The only thing that still needs to be specified is if our API to tracestate should allow you to specify your key. set(value), get() (key configured on the TracerProvider) or set(key, value), get(key) (key decided at call time).

@iNikem
Copy link
Contributor

iNikem commented Aug 23, 2020

As vendors can add arbitrary keys, Otel cannot decide on them.

@Oberon00
Copy link
Member

I'm more and more leaning towards the larger but more future-proof change of saying that:

  1. TraceState is not a part of SpanContext.
  2. TraceState MUST be stored and retrieved by the W3C propagator in the Context, it SHOULD be stored as a single (unparsed) string.
  3. The tracestate string MUST be accessible using a function getTraceState(Context) -> string or parsed form that is available in the same package as the W3C tracestate propagator. Whatever package that is (it might be in the API IIRC, cf. Default global propagators #428), it MUST NOT depend on the SDK (it will typically depend on API, propagator base and maybe some propagator utility package).
  4. The full parent Context of a span (or at least the parent SpanContext + the TraceState in some form) MUST be available to SpanProcessors, Exporters and Samplers
  5. To ensure that this all works out, we implement Consider using Context as the unique way to specify parenthood #510, i.e. a plain SpanContext is no longer possible as parent. Depending on the implementation of point 4, it might still be possible to use a full Span object as parent.

@iNikem
Copy link
Contributor

iNikem commented Aug 23, 2020

I must admit I don't see a need for such significantly more elaborate proposal. But, @Oberon00, if you have a clear vision of such new proposal, are you willing to reassign this issue to yourself and submit a PR? I was assigned this to ensure it is moving forward. Seems that you have the needed motivation to see it through.

@Oberon00
Copy link
Member

Oberon00 commented Aug 25, 2020

@iNikem

I must admit I don't see a need for such significantly more elaborate proposal

I think for just solving this P1 issue, the simpler proposal would also work fine. But the more elaborate proposal would additionally solve the flexibility bottleneck that is SpanContext (as discussed in #759 (comment)). It will be significantly more work however 😞

I think I will try to get at least #510 resolved to not completely close the door for this in the future and in parallel we should go forward with your suggestion #759 (comment) of just specifying an API for SpanContext.

If I/someone else still proposes a PR in time, we can easily adjust that API to operate on Context instead of SpanContext as a small (yet breaking!) spec change together with the other (larger) changes I proposed.

@mtwo
Copy link
Member

mtwo commented Aug 31, 2020

During the Monday maintainers meeting, @iNikem committed to providing a PR for the simple proposal this week. @Oberon00 does that work for you?

We need to solve this by the end of the week to hit our Tracing spec RC milestones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:context Related to the specification/context directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants