Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial tracer API #11

Merged
merged 13 commits into from
Jun 20, 2019
Merged

Add initial tracer API #11

merged 13 commits into from
Jun 20, 2019

Conversation

c24t
Copy link
Member

@c24t c24t commented Jun 4, 2019

This PR adds the tracing API package and proposes some conventions for the package structure, documentation, and style of the project. As it is now, the API only describes the required classes for using the tracer to record spans. Propagation formats, resources, and the context API are all out of scope for this PR.

The API here borrows from:

The OT java API is our current best source of truth. Ideally we can use this PR to prove the java API and push the discussion here up to the spec.

@c24t c24t added discussion Issue or PR that needs/is extended discussion. api Affects the API package. labels Jun 4, 2019

"""
The OpenTelemetry tracing API describes the classes used to generate
distributed traces.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs are WIP, I just wanted a good example module docstring here to test the generated html docs.

opentelemetry/api/trace/__init__.py Show resolved Hide resolved
opentelemetry/api/trace/__init__.py Show resolved Hide resolved
opentelemetry/api/trace/__init__.py Show resolved Hide resolved
@c24t c24t mentioned this pull request Jun 4, 2019

parent = deserialize_span(serialized_span)
# Explicit parent span assignment
with tracer.span("child", parent=parent) as child:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the implicit one and explicit one work together in a mixed mode?
For example, I have a function A which is using implicit in-proc propagation, another function B which is using explicit propagation, if I need to call A from B, what should I do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think functions that want to support both styles (e.g. in libraries) will need to add an optional parameter for the parent Span that defaults to the current one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going down this path, I would imagine either all the libraries end up having parent span as the optional parameter for all the exposed functions, or most libraries will only support implicit context and makes it super difficult for the application developer to use explicit context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting both implicit and explicit context like this is going to make for an ugly implementation. The spec should probably describe what happens for each, and I think there's good reason to separate the span constructors for each. We may need to solve this in the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So honestly adding support for explicit parent at the core API shouldn't look that bad - for libraries (i.e. Django instrumentation), I think it's fine to make it super difficult for the user to override it, or even do not expose it at all IHMO.

do_work(span=child)

Applications should generally use a single global tracer, and use either
implicit or explicit context propagation consistently throughout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about libraries? The application could use explicit propagation consistently, while calling into a library which expects implicit context propagation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely should think about and document best practices for what libs should do. For that question, see https://github.com/open-telemetry/opentelemetry-python/pull/11/files#r290703075. Another question is what tracer they should use (e.g.: Use global tracer? Pass tracer at every call? Provide way to set library-wide tracer? Provide way to set a library-wide callback function that it calls to get the tracer?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best-practices aside, users should have the final thing to say - if they, for some reason, need to 'mix' these modes (as you guys call it), that should be supported IHMO.

(As for the other questions, I'd say using a global Tracer would be a good thing to have)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application could use explicit propagation consistently, while calling into a library which expects implicit context propagation.

This is a problem since it would mean that passing an explicit context should change the implicit one. This would be unexpected in the application, but required if it's calling into a library that uses implicit context.

Provide way to set a library-wide callback function that it calls to get the tracer?

I thought implementations would provide a global tracer, do you want the callback so integrations can change this?

parent: This span's parent.

Raises:
ValueError: if ``name`` is null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only check if name is null or we expect more validations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally advocate for using a generic name here, instead of failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it sounds like implementations should provide a safe default here. In general, do you expect implementations to be more or less permissive than the API? I.e. should implementations be required to throw errors we list in the API? Should they be allowed to throw errors we don't list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, do you expect implementations to be more or less permissive than the API? I.e. should implementations be required to throw errors we list in the API?

Yes, I bet almost all implementations will have more restriction than the API would allow.

Should they be allowed to throw errors we don't list?

Probably not. One design principle we may want to follow is "the monitoring SDK will try to keep silent instead of changing the application behavior". In this spirit I guess we will have exporters doing some downcast/trimming based on their specific need, and not throw exceptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case: do we want to allow implementations to throw here on null/blank name by including it in the API or are we ok with forbidding them from throwing by removing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet almost all implementations will have more restriction than the API would allow

This suggests that implementations will throw errors we don't list in the API, or am I missing something?

FWIW the java API says to raise a NPE here. It might be worth taking this conversation there (or to the spec repo).

Using a safe default span name sounds good to me as long as we get the same behavior in all clients, or decide that this is "unspecified behavior" and that different clients can do this differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm in favor of option 1.

I'm also in favor one. That being said, this can also be mentioned/discussed in the Specification ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also prefer option 1. Option 3 (generating a random span name) is no good IMHO, because span names should be low-cardinality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added open-telemetry/opentelemetry-specification#141 to track this in specs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we should list any error implementations could throw in the api method docstring. Safe default names sound good to me, but let's clarify this in the spec first. I added #18 to track this.

"""
raise NotImplementedError

def create_span(self, name: str, parent: Span=None) -> Span:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to create a root span which has no parent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. In fact, I think its strange that create_span(..., parent=None) should have a non-None parent. One suggestion: create_span should just never affect or use (as parent) the active span. But instead, enhance the context manager returned by span to return the created span at __enter__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of a few options here: use a sentinel default value instead of null and take parent=None to mean that we should create a root span, add a separate method create_root_span, or change the way we create spans and match java's builder pattern more closely.

create_span should just never affect or use (as parent) the active span

That's pretty surprising, I'd expect creating a child of the current active span to be the most common use case.

But instead, enhance the context manager returned by span to return the created span at __enter__

That's what I'd expect the context manager to do now.

@Oberon00
Copy link
Member

Oberon00 commented Jun 5, 2019

After our meeting today, I'm thinking that maybe we should start out with only explicit context propagation/parent span specification. The implicit propagation (which requires thinking about a definition of an active trace context or span per thread/coroutine/"scope"/??? and whether the tracer should be responsible for it or some separate component that is one layer below or next to the tracer) is probably one of the most difficult APIs to get right.

is equivalent to::

span = tracer.create_span(name, parent=tracer.get_current_span())
with tracer.use_span(span):
Copy link
Member

@reyang reyang Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I've missed something, I was thinking a different approach:

Implicit
with tracer.span(name):
    do_work()
Explicit
span = tracer.span(name)
span.start()
do_work()
span.finish()

And in the span class we will have __enter__ and __exit__ (or __aenter__ and __aexit__).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice interface, but what do you do about the fact that we need to modify the context if span is used as a context manager and not otherwise?

raise NotImplementedError

@contextmanager
def use_span(self, span: Span) -> Iterator[None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For languages which don't have the with or using concept, it might be good to have a dedicated API. For Python, with span (Span.__enter__ and Span.__exit__) could be a good replacement for use_span?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree if we can find a nice way to handle modifying the context.

@c24t
Copy link
Member Author

c24t commented Jun 7, 2019

After our meeting today, I'm thinking that maybe we should start out with only explicit context propagation/parent span specification.

One argument for biting the bullet and figuring out implicit propagation now is that it will affect the API for explicit propagation too. Another is that even in the "explicit propagation" case I'm assuming we don't handle a context object in this API; we just make sure not to touch the implicit context.

Another is that hashing out the tracing/context interaction is one of the reasons for this PR. :D

The implicit propagation (which requires thinking about a definition of an active trace context or span per thread/coroutine/"scope"/??? and whether the tracer should be responsible for it or some separate component that is one layer below or next to the tracer) is probably one of the most difficult APIs to get right.

My working assumption is that our context will be a ContextVar or a thin wrapper around it like @reyang's context package in OC. PEP 555 describes these as context-local (as compared to thread-local) in that they're scope is a single call chain that might include async calls. One reason we need explicit context is to work with non-standard execution models like tornado's, and there's reason to believe that frameworks will start to standardize on contextvars now that the language provides one obvious way to do it. Tornado recently switched to asyncio and deprecated their own context module for this reason.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion during the meeting, let's get this merged first. We can make further improves on namespace, package layout, improve span creation and global tracer registration.

@c24t c24t merged commit d1cea92 into open-telemetry:master Jun 20, 2019
@c24t c24t deleted the tracer-api-init branch June 20, 2019 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Affects the API package. discussion Issue or PR that needs/is extended discussion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants