-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement tracing support #163
Comments
#160 is the first attempt at implementing the new API. The main thing I don't like about it is that it's designed around having various
In my opinion it would be better to create a
For "duration" spans:
For "counter" spans:
This is just an idea we should play with. The design is inspired by statsd and zipkin. |
Thanks for your feedback @1st1 Regarding your concerns about the implementation proposed by #160 I partially agree on both of them but I have my reasons starting with such proposal. This first approximation had as a goal implement a generic system that would allow the user to implement on top of it its own logic or implementation, so it would allow the implementation of interfaces that would provide the glue for ZipKin, OpenTracing, AWS Xray, Statsd, and other tools either for tracing or for monitoring. So basically the interface presented wanted to solve both cases: tracing plus monitoring. It's true that having such open interface it might end up allowing the users using bad practices, and yes that the creation of a new duration span with the pair Your proposal, that goes more aligned with the initial conversations, narrows and makes explicit the implementation, so losing freedom but at the same point having the following advantages:
I agree with the concept of Span in terms of an operation that takes some time - as it is already described in the tracing world. But I disagree with the other cases that are not timed operations, such as the counters which IMO they are more related to monitoring. Lets take as an example the following snippet: async def coro(loop):
with uvloop.tracing("coro_span") as span:
connection = await redis.connect("loclahost")
value = await connection.get("foo")
span.finish()
print(span) The
So basically the Spans are a hierarchy of the duration events with their metadata. Having something like this, the goal of tracing gets narrowed and it should mean that future wrappers - such us OpenTracing - must be able to implement something enough generic to allow changes on the metadata related to each Span, allowing them to make a pure translation to their span idiom. Even I guess we might add as metadata some metrics that can be gathered within an ongoing span, such as the counters that we were speaking, for example, the next snippet shows what could implement the callback that is in charge of feed some data from the network def feed(self, data):
if span:
span.rx += len(data) So later the span will incorporate this data automatically. But I'm still thinking that we would need a way to provide pure monitoring KPIs with an interface that is not coupled to the tracing interface, allowing the owners of the services that run applications on top of uvloop to gather the operational metrics such as RX, TX, callbacks, tasks, polling time and so on, that is part of the monitoring field rather than tracing. Regarding the I finally the implementation of the tracing moves into that direction, meaning that uvloop is gonna provide its own span implementation. As soon as there is a POC of this implementation I would try to make some tests and check how much friction imposes its wrapping for systems such as Zipkin, OpenTracing, AWS Xray and so on. |
I've been digging a bit about how easily will be serializing the Take as an example the following code that tries to emulate this serialization for the Jaeger library [1] from jaeger import tracer
@contextmanager
def jaeger_trace(loop):
uvloop_span = loop.trace():
yield
uvloop_span.finish()
root = tracer.start_span(
uvloop_span.name,
tags=uvloop_span.tags
start_time=uvloop_span.start_time
)
for uvloop_child in uvloop_span.children:
child = tracer.start_span(
uvloop_child.name,
child_of=root,
tags=uvloop_child.tags,
start_time=uvloop_child.start_time
)
child.finish(finish_time=uvloop_child.stop_time)
root.finish(finish_time=uvloop_span.stop_time) Basically it leaves to uvloop to make all of the work and once it is done the wrapper builds programatically the Jaeger spans using as a source the uvloop spans. In an ideal world all of the libraries that iplement the OpenTracing standard [2] must be able to serialize traces using a piece of code quite similar with the above one. But some libraries that claim to be OpenTracing compatibles do not have an exact implementation, perhaps Data Dog has a Span object that supports the finish time as a parameter of its constructor while the So summary, providing uvloop its own system to create and handle spans might create some resistence to be adopted by some tracers. On the other hand, having a more generic implemenentation as the callback one provided by #160, libraries should have enough freedom to implement their own span system. Worth mentioning that tracers system place none core fields - id, duration, - under a generic field called tag,meta or other. So if finally uvloop implements its own span system, I would keep the core fields as minimum is possible copying the ones implemented by OpenTracing and putting all of the extra under a namespace such as tags, meta u other. [1] https://github.com/jaegertracing/jaeger-client-python |
I agree, that will be a problem. The idea here is that we need a mechanism to setup a custom Span constructor per trace. Here's my current idea of the API. It's quite close to Jaeger and few other tracing libraries. ########################################################################
# Classes that a tracing library would need to provide to
# support uvloop:
########################################################################
class Span(abc.ABC):
@abc.abstract_method
def get_parent_span(self) -> Span:
"""Return the parent span."""
class TimerSpan(Span):
pass
class CounterSpan(Span):
pass
class Tracer(abc.ABC):
@abc.abstract_method
def start_timer_span(self, name: str) -> TimerSpan:
pass
@abc.abstract_method
def end_timer_span(self, span: TimerSpan):
pass
@abc.abstract_method
def start_counter_span(self, name: str) -> CounterSpan:
pass
@abc.abstract_method
def end_counter_span(self, span: CounterSpan):
pass
@abc.abstract_method
def increment_counter_span(self, span: CounterSpan, inc: int):
pass
########################################################################
# APIs that uvloop will provide:
########################################################################
class TracedContext(abc.ABC):
"""An opaque object that encapsulates information about tracing."""
def _get_tracer(self) -> Tracer:
"""Return the Tracer that was used to start the context."""
# Private uvloop API
def _get_current_span(loop) -> Optional[Span]:
"""Return the current span."""
# Private uvloop API
def _is_tracing(loop) -> bool:
"""Return True if tracing is enabled in the current context."""
# Private uvloop API
def uvloop.start_tracing(loop, tracer: Tracer) -> TracedContext:
"""Start a new TracedContext and set it as current."""
def uvloop.end_tracing(loop, trace_ctx: TracedContext):
"""Stop tracing for the passed TracedContext.""" The actual tracing implementation is deliberately hidden from uvloop users, allowing us to use Open questions:
|
I like zipkin approach to spans since API is minimalistic and very powerful, here I created tracer for asyncio with aiohttp instrumentation (https://github.com/aio-libs/aiozipkin). I have implementation of spans/tracers/context etc. Quick question, zipkin/opentracing and co are all about distributed tracing (as described in dapper paper), it helps to find all tasks spawned by particular request on several distributed machines. Given that what purpose of uvloop/asyncio tracing subsystem? Is this only for debugging only local process or distributed also? Distributed case has some implication, trace or not to trace often decided on separate server per request, if we infer from incoming request that it sampled, subsystem should trace all child tasks and attach it to current span. |
Nice. What do you think about the approach I outlined in #163 (comment)? Would it work for aiozipkin? If no, then how would you change it?
I'd aim for both.
We'll be using |
Regarding your proposal @1st1, TL;DR Ok!. The proposal circumvents the issue of the friction to allow the wrappers being integrated with not so much pain, indeed this proposal goes more into the direction of the callbacks but narrowing the use case making the API super opinionated. On the other hand, having this proposal you know that who will dictate the performance degradation will be the tracer, so the crosstalk between Cython code and Python code will be there by design. None sampled spans end up as real Python objects [1] About the three different spans IMO the metrics/stats must be provided by uvloop in another interface perhaps: ascnc def coro():
with uvloop.stats() as stats:
with uvloop.start_tracing(tracer) as span:
span.set_tag("rx", stats.rx) Regarding your quesitons:
I dont get this question. In any case seeing how other traces implement its counterpart
IMO whatever part of the underlying code that is executed by uvloop that can be isolated and it is interpretable as a span should implement what is needed to be traced, creating the spans as a children of the root span. So the easy example is the DNS resolution, so having ascnc def coro():
with uvloop.start_tracing(tracer) as span:
hosts = await loop.getaddrinfo(..) The Then the no easy part, the Task one. If we have a code such as: async def coro():
with uvloop.start_tracing(tracer) as span:
task = loop.create_task(bar())
await task Im inclined to think that the I would say that this one of the critical parts that would need some answers as soon as possible, having also the option of just leave aside this problematic without implementing them as children spans, so the user losing some granualirity in the trace.
Im wondering why someone that is interested on instrumentalize a piece of code gathering the trace and its spans would be interested on discard some specific tasks. I have the feeling that there is no an easy way to make this and the implementation and inteface can end up much more complicated for just do something that might be used by nobody. And last but not least, did you considere to put the [1] https://github.com/aio-libs/aiozipkin/blob/master/aiozipkin/span.py#L81 |
Hi @jettify gad to see you here
The goal is give more granularity to the user when the user traces an application that runs on top of Asyncio, IMO the annalogy is other frameworks that they already implement tracing system [1]. Having Asyncio implementing tracing under the hood we must be able to enrich current solutions, such as the Aiohttp middleware for AWS Xray [2] giving to the user a more fine granularity for free. The question is why not in Asyncio ? [1] https://github.com/opentracing-contrib |
As as far as As for #163 (comment) not sure what the purpose of TimerSpan vs CounterSpan vs TaskSpan. Say I want to trace subtask tree creation, make zipkin span per task with appropriated child parent relation (this way we can see sequential vs parallel task execution as result find places where we can increase parallelism). How mentioned API fits for this? |
Could we start implementing something like: with tracer.start_span('TestSpan') as span:
tracer_ctx = uvloop.stat_tracing(tracer.uvloop)
.....
uvloop.stop_stracing(tracer_ctx) The
|
Implements a new tracing support for uvloop adding the pair methods `start_tracing` and `stop_tracing` that allows the user to start or stop the tracing. The user must provide a valid `uvloop.tracing.Tracer` implementation that would be used internally by uvloop to create spans, each span must also meet the `uvloop.tracing.Span` class contract. This POC only implements the creation of spans during underlying calls to the `getaddrinfo` function. This POC relates to the conversation in MagicStack#163.
Regarding #171, the pending discussions that it raises are:
Support for other spans within uvloop. But before doing this, we would need a kind of agreement of behind what circumstances the spans are created. A good example is the
That it would mean that when there is no tracing enabled it will create a noop span, the benefits of this global wrapping make the code cleaner and easier to maintain but the drawbacks are quite clear. Also worth mentioning that going to a pattern that uses a global context would mean that we will be creating a span when indeed in some failure scenarios we connected to nothing and without yielding the loop. |
any feedback on this @1st1? I've been a bit out lately and I don't think that I will have time till the second week of August. But any discussion that helps us to reach some conclusions will be worth it. Regarding the last summary that I've written down related with the #171 PR I left two open questions which I have some thoughts.
The tracer provider already provides a known API and the user is used to use it, so IMO we must allow the user to keep using the tracer interface that is used to use it and not forcing it to use a new interface. Therefore, the
I'm inclined to think that the span has to be created only when the loop is yielded. For example, for the |
This is a meta issue to discuss how new asyncio tracing API should look like. We will implement it first in uvloop, and later, if necessary, port it to asyncio.
Requirements
Base on the new
contextvars
(PEP 567) API, so that async tasks and callbacks they schedule are traced consistently.Traces should be hierarchical, i.e. it should be possible to see what and how exactly one Task did, what Tasks that it spawned did, etc.
Low performance overhead. Ideally we should be able to run tracing in production.
Low-level and detailed: we want to be able to measure DNS requests, read/write callbacks performance, throughput, number of callbacks/tasks created etc.
Flexibility: traces should be in easy to work with format, so that it's possible to submit them to existing systems like zipkin.
Extensibility: the new API should be easy to maintain on both ends: in the event loop and in applications that use it. This means that adding a new kind of trace should not require people to write any glue code.
[TBD]
The text was updated successfully, but these errors were encountered: