Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinguish starting an active span and creating an inactive span #485

Closed

Conversation

tsloughter
Copy link
Member

Includes a note about async callbacks which are a common use case
for creating an inactive span.

Closes #469

@Oberon00
Copy link
Member

Oberon00 commented Feb 22, 2020

Are you suggesting to introduce the possibility of creating unstarted Spans? I'm strongly against that tbh, I think it makes many things more complicated.

@@ -122,24 +122,29 @@ mechanism, for instance the `ServiceLoader` class in Java.

The `Tracer` MUST provide functions to:

- Create a new `Span`
- Start a new active `Span`
- Create a new inactive `Span`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inactive" reads very confusing. I would separate "starting a span" and "making a span active" (even though both can be done by a single function).


The `Tracer` SHOULD provide methods to:

- Get the currently active `Span`
- Make a given `Span` as active

The `Tracer` MUST internally leverage the `Context` in order to get and set the
current `Span` state and how `Span`s are passed across process boundaries.
currently active `Span` and how `Span`s are passed across process boundaries. A
`Span` that is created, as opposed to started, is not tracked in the `Context`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created, as opposed to started

-1, this sounds like a whole new dimension of behavior. If you want to discuss this, please create an OTEP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. The created span still has a start timestamp.

@yurishkuro
Copy link
Member

I would recommend to do away with a duality of "creating" vs. "starting" a span. They are the same thing in all prior art, and I would prefer we use a single verb "starting a span" to avoid confusion.

@tsloughter
Copy link
Member Author

@yurishkuro currently we have started spans and active spans (which are started). I'm attempting to make a clear distinction between functions to create inactive spans (but have start timestamps) and starting active spans.

Right now the API only says you can start an inactive span.

@tsloughter
Copy link
Member Author

@Oberon00 no, created spans still have start timestamps, they simply aren't made active. This is the behaviour currently defined in the API. The change is to specify that start_span function creates a span and makes it active. Which I believe is the default behaviour in actual implementations? So it differs between how the API is written and how implementations I've seen work.

@tsloughter
Copy link
Member Author

To maybe make it more clear why I'm using start and create.

The existing APIs and SDKs use start to create an active span. In Go Start results in the span being set in the context https://github.com/open-telemetry/opentelemetry-go/blob/master/sdk/trace/tracer.go#L66

The API specifies it should not be set active in the context. I wanted to keep the naming the same as what is done today (Start) while adding a new function that acts like the API specifies that "start" is supposed to work (but doesn't in implementations as far as I can tell) so added create.

@yurishkuro
Copy link
Member

yurishkuro commented Feb 22, 2020

I don't think it's a good idea to be using 3 words (start, create, active) to describe 2 orthogonal behaviors (starting a span and making it active). In OT-Java we had to roll-back "making span active" from start_span function, because it was causing issues with try/catch.

Have we considered NOT having tracer make the span active at all? It's merely a convenience, which I personally don't believe is worth it, and it works on shielding the developer from understanding that there is such a thing as context propagation, which has nothing to do with managing spans. For example, in Go:

ctx := tracer.StartSpan(ctx, ...)

is just a convenience over more clear separation of:

span := tracer.StartSpan(...)
ctx := otel.ActiveSpan(ctx, span)

One important benefit of the latter is that starting a span is an implementation-specific action, thus it requires a tracer. Activating the span is NOT implementation specific, all tracers will share the propagation mechanism. The one-liner can always be implemented as a helper function:

ctx := otel.StartSpan(ctx, tracer, ...)

@tsloughter
Copy link
Member Author

I think it would be very confusing to force devs to start having to manage the context explicitly.

Also, the API does specify that it is not made active, so what you are suggesting is the API spec currently.

If the Tracer is responsible for tracking the active span in the context then I think it needs to handle it when the user starts a span unless they specify otherwise, and based on existing implementations of both OpenTelemetry and OpenTracing/Census this is clearly what the user expects.

If the Tracer were not responsible for this then I could see an argument that some other module handles the context, like in your example where the Tracer returns the span and then otel is used to activate it, unrelated to the Tracer.

But that is a much larger change. Today the API spec is confusing to me because it both has the Tracer as responsible for the span being activated and doesn't offer a way to start+activate -- although the implementations do just that for Start.

I think we need both and would suggest, since the use of start and create clearly causes confusion over whether create sets a start time:

  • start_active: Tracer operation to start a new span and update the active span it is tracking.
  • start: Tracer operation to start a span but not touch the tracked active span.

@tsloughter
Copy link
Member Author

I didn't want to touch it in this PR to keep it as small and focused as possible but a related confusion I have is that the spec specifies:

When an active Span is made inactive, the previously-active Span SHOULD be made active.

I agree with this API but it doesn't seem to match that End is now moved to the Span, and I see many implementations do use an End function on the Span.

If the Tracer is responsible for tracking the active Span and for making the previously-active Span active again when deactivating the current active span it seems clear that End should be a Tracer operation.

@tsloughter
Copy link
Member Author

tsloughter commented Feb 23, 2020

otep-66 states:

StartSpan(context, options) -> context When a span is started, a new context is returned, with the new span set as the current span.

It calls this "not final", so maybe there was other conversation on why this shouldn't be the case? otep-66 also uses Tracer::EndSpan to end and deactivate the current span, which appears to only partially be defined in the API spec (it says the tracer should set the previously active span as active again when deactivating a span, but I don't know how that can work if ending a span is an operation done directly on a Span instead of through Tracer).

specification/api-tracing.md Outdated Show resolved Hide resolved

The `Tracer` SHOULD provide methods to:

- Get the currently active `Span`
- Make a given `Span` as active
- Make a given `Span` active
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If providing a method to start an inactive span is a MUST, this should also be one, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. I suppose so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too! So please add it to the MUST section then.

specification/api-tracing.md Outdated Show resolved Hide resolved
specification/api-tracing.md Outdated Show resolved Hide resolved
specification/api-tracing.md Outdated Show resolved Hide resolved
specification/api-tracing.md Outdated Show resolved Hide resolved
@Oberon00
Copy link
Member

A fundamental question: After OTEP 66, is there even such a thing as an inactive span now? See https://github.com/open-telemetry/oteps/blob/49316bc20167a0a6e2214bbf5806e0e7d763b2d0/text/0066-separate-context-propagation.md#observability-api

StartSpan(context, options) -> context When a span is started, a new context is returned, with the new span set as the current span.

It seems the description of StartSpan in the spec is simply wrong now.

@tsloughter
Copy link
Member Author

Even with otep66 there has to be a concept of an inactive span, it is just to useful and in use. But I do think that the default action of a tracer's startspan should make the span active, like the otep has.

@tsloughter
Copy link
Member Author

This might be better discussed tomorrow in the morning meeting. Then I can rework the PR based on the outcome there and discussion can pick up again on the PR.

@bogdandrutu
Copy link
Member

@open-telemetry/specs-approvers please review this. We need to make progress.

Copy link
Member

@arminru arminru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR title so that it matches the content (starting an active vs. inactive span) since you removed the separate create without starting option in the meantime.

@@ -120,31 +120,45 @@ mechanism, for instance the `ServiceLoader` class in Java.

### Tracer operations

The currently active `Span` is the one that is tracked in the current `Context`
by the `Tracer`. An inactive `Span` is not currently tracked in any `Context`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An inactive span can still be tracked in a non-current context, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it would make it active (unless you track via some private mechanism, which would be out of scope here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in Java (and most likely in Python) you can have a Span in a Context that is not the active one:

// ctx contains span1 but ctx itself is not active.
Context ctx = TracingContextUtils.withSpan(span1, Context.current());

// *Now* it is.
try (Scope scope = ContextUtils.withScopedContext(ctx)) {
}

Because of this I suggest changing the second sentence to: "A Span is considered inactive when it is not tracked in the currently active Context." or something along that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider that span to be active for that context, but the context itself is not active.

We could have a different term for it though, "current span" to replace how I've been using "active span". So a span can be "current" to a context without being active because the context is not active.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest, for the sake of moving on with this PR, to not change the wording between active/current, and use whatever is used in the given section (we could do a follow up this in another PR). Likewise, let's use active/current only when the Span is associated with the current Context (else, you'd say its associated with a given Context).

@@ -120,31 +120,45 @@ mechanism, for instance the `ServiceLoader` class in Java.

### Tracer operations

The currently active `Span` is the one that is tracked in the current `Context`
by the `Tracer`. An inactive `Span` is not currently tracked in any `Context`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it would make it active (unless you track via some private mechanism, which would be out of scope here)

The `Tracer` MUST provide functions to:

- Create a new `Span`
- Start a new active `Span`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a requirement on the API, but, for example, in OpenTracing we explicitly removed such ability, in favor of starting inactive span and them manually activating it via Scope. What are the implications of this change?

NB: this kind of change sounds to me like it should go through OTEP with some code examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there was agreement to clearly separate these two operations ("start active Span" and "start Span"), but "start active Span" MUST be optional (in Java, as you mentioned, we can't implement this one correctly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with separate operations as decided in the SIG call, but I'd suggest changing this to:

  1. Start a new Span
  2. Start a new Span as the current instance (optional operation).

2 is optional as, for example, Go already handles this explicitly in 1) depending on any specified Context, and also because in Java we won't implement it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosalberto is this what you are thinking:

The `Tracer` MUST provide functions to:

- Start a new active `Span`

The `Tracer` SHOULD provide methods to:

- Get the currently active `Span`
- Start a new inactive `Span`
- Make a given `Span` active

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or wait, you wanted the start inactive span to be the MUST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, switched it in the PR.


The `Tracer` SHOULD provide methods to:

- Get the currently active `Span`
- Make a given `Span` as active
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks odd that making span active is in MUST section, but getting active span is in SHOULD. I would think they should go in the same category.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #527 might actually hit this one, so I'd suggest we fix that based on that one ;)

current `Span` state and how `Span`s are passed across process boundaries.
currently active `Span` and how `Span`s are passed across process boundaries. A
`Span` that is started but inactive is not tracked in the `Context` by the
`Tracer`, but it still MUST have a start timestamp set at the time of creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the point about the timestamp bundled into the paragraph about context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feeling here, I think we don't need to mention timestamp. Having two separated operations for start-span and start-span-as-current should make this clear enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was thinking that "inactive" may give the impression to someone that it could not have a timestamp. But since it is still called start and this is defined elsewhere it can be removed.

specification/api-tracing.md Outdated Show resolved Hide resolved
selected, or the current active `Span` is invalid.
SHOULD create each new `Span` as a child of its active `Span` unless an explicit
parent is provided or the option to create a span without a parent is selected,
or the current active `Span` is invalid. Last the `Tracer` would check if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or the current active Span is invalid" doesn't seem to belong here (can be in #determining-the-parent-span-from-a-context).

SHOULD create each new `Span` as a child of its active `Span` unless an explicit
parent is provided or the option to create a span without a parent is selected,
or the current active `Span` is invalid. Last the `Tracer` would check if
`Context` has an extracted `SpanContext`. See [Determining the Parent Span from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last the Tracer would check if Context has an extracted SpanContext

I don't follow why this is here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraph is going through the ways a span's parent is set when a span is created. If that part is removed I think the whole thing should be removed and simply link to "Determining the Parent Span".

specification/api-tracing.md Outdated Show resolved Hide resolved
as a separate operation.

The API MUST accept the following parameters:
The API functions for starting a `Span` MUST accept the following parameters:

- The span name. This is a required parameter.
- The parent `Span` or a `Context` containing a parent `Span` or `SpanContext`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bullet is confusing. I think it's mutually exclusive to accept parent Span OR an indicator to create a new root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is outside the changes of this PR. Though I agree there doesn't appear to be a reason to have to explicitly state if a span is a root span.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will also be covered/clarified/improved by #510 so I'd suggest we work on this part there, so we can move on with this PR.

Else, track this clarification in its own issue ;)

SHOULD create each new `Span` as a child of its active `Span` unless an
explicit parent is provided or the option to create a span without a parent is
selected, or the current active `Span` is invalid.
SHOULD create each new `Span` as a child of its active `Span` unless an explicit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a child of THE active" Span? Otherwise, it sounds like each Tracer can have its own active Span, which just adds excessive (and not needed) complexity.

@jmacd
Copy link
Contributor

jmacd commented Mar 18, 2020

As stated in #516, I see "starting an inactive span" as the Tracer's responsibility, and "making a span active" the context library's responsibility.

@tsloughter
Copy link
Member Author

I guess I go the other way since I think Span should go away and the Tracer take the context to do everything, like in open-telemetry/oteps#68 :)

- Get the currently active `Span`
- Make a given `Span` as active
- Make a given `Span` active
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still some vagueness here. I have users at Uber who are absolutely opposed to thread-local based context propagation, and instead insist that in their services all context propagation is done explicitly, similar to Go. That means that while they may still use the same Context implementation, the three SHOULD operations here are against the explicitly passed Context object, rather than building an assumption that a thread-local mechanism is being used and understood by the tracer. In other words, the "Get the currently active Span" is equivalent to this function in OpenTracing Go:

func SpanFromContext(ctx context.Context) Span {
	val := ctx.Value(activeSpanKey)
	if sp, ok := val.(Span); ok {
		return sp
	}
	return nil
}

Which opens up another trail of questions: why does this need to be a functionality of the Tracer? Tracer is an API that can be implemented differently. Accessing spans in the Context is not related to specific tracer implementation, it's a common behavior at the API level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider it part of the Tracer functionality because the Tracer is what knows where its data is stored in the context.

In your example from OpenTracing that would be activeSpanKey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OpenTracing it's not part of Tracer interface, it's a shared static function. If it was part of Tracer, then every tracer implementation would have to implement it identically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming Tracers from multiple Tracer Providers in a single application are meant to be able to read traces from the same contexts, yes they'd have to use the same key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, wouldn't we want tracers from different providers to be able to act separate and not clash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observe this will probably go away with #527 - Maybe hold on till that one is resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really change this PR much since it is mainly about starting the spans and #527 isn't. Having re-read it I'm not opposed to #527 anymore either, I had thought it also moved starting a span to "utils".

The change in this PR is adding an additional function for starting a span, the "make span active" already exists in the spec and is not changed by this PR. So I think this should be considered separate.

And don't think the function to make a span active should hold up this PR since it already exists in the spec and is not being changed here.

@tsloughter
Copy link
Member Author

@carlosalberto carlosalberto added spec:trace Related to the specification/trace directory area:api Cross language API specification issue labels Jun 12, 2020
@reyang reyang added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 10, 2020
@tsloughter
Copy link
Member Author

So what is going on with this PR?

Right now the spec says the API call must only start inactive spans and I don't think that is what implementations are doing -- instead they are providing both like in this PR.

I'm going to update it to resolve the conflicts in a bit, hopefully it can get merged after that?

@tsloughter
Copy link
Member Author

Nevermind. It now says:

Span creation MUST NOT set the newly created Span as the currently
active Span by default, but this functionality MAY be offered additionally
as a separate operation.

I think it should be the other way around (as in start_span set it active and have an alternative function start_inactive_span) but this should be good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarifying definition of the concepts of "started" vs "active" needed
8 participants