Standard(s) for in-process propagation #23

yurishkuro · 2016-12-04T01:10:37Z

OpenTracing 1.x leaves the question of in-process context propagation entirely in the hands of the applications themselves, which substantially reduces the usefulness of the standard since different frameworks instrumented with OT do not have a way to exchange context.

Standards for in-process context propagation inevitably have to be specific to individual languages. This is more of a placeholder issue. It is also somewhat similar to #22 (possible part of it), but can be done independently.

wu-sheng · 2016-12-04T04:17:44Z

In-process propagation, and support different frameworks instrumented, seems very hard, as I known.
And different platforms have different characteristics.

In Java, same Context in ThreadLocal, and provide inter-thread propagation spec, maybe a effective way.

Keep me in the loop. :)

codefromthecrypt · 2016-12-04T04:35:29Z

Last july @Xylus and @eirslett did a couple day spike on context. Assets from those meetings are here; https://drive.google.com/drive/u/1/folders/0B0tSnQT3uGdAfndwS3RKVnMtM0ZKMWU4R1dicTNEXy1NSlBJaTlJQ05abGVneTlucW5HdFk Eirik ended up spiking an effort towards that later, DistributedTracing/continuation-local-storage-jvm#1 Ironically, last december, iirc, distributed context propagation was the first name of the OpenTracing group, so not too surprising it is here. Since then, I think uber had some work towards it naming the group https://github.com/openctx More recently, recently google are changing their own context to decouple storage from it. grpc/grpc-java#2461 Also, other tools like Log4J2 have added hooks to interop with other's context storage engines. As mentioned in another issue, there's some layering concern, for example if tracing is the right place to define context (as they are often multi-purpose and below tracing). That said, whatever hooks there are tracing is at least a primary consumer, and interesting to see a tracking issue arise.

yurishkuro · 2016-12-04T05:29:25Z

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation. For example, TChannel (which is a form of RPC framework) can be configured with a TracingContext, as a way of abstracting the actual mechanism of in-process propagation, but the mere fact that a context object is not passed to RPC methods but instead extracted via that abstraction essentially means the abstraction must be implemented via thread-locals.

NB: the TracingContext in Jaeger and TChannel is to a Span what gRPC Storage is to a Context. I.e. the same separation of how something is propagated vs. what is being propagated. They are just smaller in scope and only propagate a single Span.

codefromthecrypt · 2016-12-04T05:43:11Z

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation.

interesting point for JVM tech. yeah different types of thread locals, wrappers of thread locals, or thread-local references to something else.. all have thread locals in there somewhere. Maybe there's some subtle (or native) option I'm also unaware of. @raphw @tmontgomery you know magic ( or at least where bodies are often buried ) .. do you know something available on JVM that can be used to store incidental trace state which isn't directly or indirectly a thread local?

raphw · 2016-12-05T10:39:34Z

As a matter of fact, I have written a small thread-local-ish utility for this exact purpose. The problem with thread locals is that it only allows access from within a Thread. It is not possible to set an object from without a Thread even if the latter is known.

Also, I have started migrating away from the above DetachedThreadLocal and rather use the WeakConcurrentMap. The latter allows to define a form of key object that represents a state. This state is automatically garbage collected once the key is eligible for collection. Propagation works by sharing such keys where the context is removed once a context has become obsolete.

All in all, the problem always boils down to managing a form of global-state what is always tedious.

wu-sheng · 2016-12-05T14:21:08Z

@adriancole what is incidental trace state? Why should be stored in something like threadlocal?

wu-sheng · 2016-12-05T14:31:54Z

To all, threadlocal can't solve all problems, somthing like async process model will bring us more.

For example, Disruptor is a very useful in-memory async lib. If application use this, all context base on threadlocal will fail without doubt.

This is very important for my exp.

codefromthecrypt · 2016-12-05T14:44:05Z

@adriancole <https://github.com/adriancole> what is incidental trace state? Why should be stored in something like threadlocal?

sorry poor choice of words. Most users aren't aware of trace state as it is implicitly passed around on their behalf (of course some are). The usual implicit mechanism in Java is a thread local, though it is problematic in async libraries. For example, usually you have to manually re-attach the state when some accident of thread scheduling loses it. Ex the below from Brave + RxJava public Action0 onSchedule(final Action0 action) { final ServerSpanThreadBinder binder = brave.serverSpanThreadBinder(); final ServerSpan span = binder.getCurrentServerSpan(); return new Action0() { @OverRide public void call() { binder.setCurrentSpan(span); action.call(); } }; } https://github.com/smoketurner/dropwizard-zipkin/blob/master/zipkin-rx/src/main/java/com/smoketurner/dropwizard/zipkin/rx/BraveRxJavaSchedulersHook.java One of the reasons I asked about alternatives to thread locals was to figure out what current practices exist, as it is helpful to scope what might end up in a spec or library here or elsewhere. make sense?

wu-sheng · 2016-12-05T14:58:32Z

@adriancole, yes, that is my concern.

wu-sheng · 2016-12-05T15:03:23Z

But I think only change threadlocal to something similar is not explicit enough to someone, who is not a tracer developer.

Provide some APIs, like carrier, inject, extract for rpc, are better choice for me. They will be easy to understand.

cwe1ss · 2016-12-06T13:07:13Z

In terms of the specification, the more interesting question for me is the contract of this API:

Should it be possible to store just one span in the context?
- This would probably be e.g. the RPC server span and all lower-level components would be child spans of this one span without any nesting.
Should it be implemented as a stack?
- How do you know on POP that you got "your" span? (in case a child component failed to close/pop its span or called pop too often)
- How does a component know that it is the outermost component and that it should close any open spans. Example: The RPC server creates a "root span" and adds it to the stack, a lower-level component (e.g. a DB access library) opens a child span but the code failed with an exception. The request exception handler now needs to know that the DB span must be closed.
Should it be implemented as a list or a tree to allow for more reference types?

It might be easier to decide which language-specific storage construct might be the right one, when we know which features we want it to support.

sjoerdtalsma · 2016-12-06T19:27:55Z

I am creating a library called 'context-propagation' meant purely to propagate any (threadlocal) context implementation from a parent-thread to any background thread in a stack-like fashion, using the Autocloseable try-with-resources mechanism to allow verified stack unrolling (and automatic skipping if a pop is missed).
It revolves around a Context and ContextManager interface (each with only two methods) and a utility to create a snapshot from all known ContextManager implementations at once.
If you're interested, please take a look here and let me know what you think:
https://github.com/talsma-ict/context-propagation

(we needed it for security propagation, but are now looking into adding opentracing and would like to re-use the mechanism that is currently in-place that serves us well).

yurishkuro · 2016-12-06T19:56:16Z

@cwe1ss I think the only requirement is "I need to get the current span". The fact that Java implementation often uses stack is a side-effect of thread-local implementation, since you're using the same API to retrieve different spans at different times, and stack is the only thing that makes sense for that.

cwe1ss · 2016-12-06T21:48:50Z

@yurishkuro there's also the logical requirement to "set the current span". Do you think this is implicit - meaning that every span that is created is automatically "the current span" or do you see this as an explicit feature/step?

In other words, do you think that the context should be part of the tracer interface in which a call to StartSpan would automatically store the span in the context and something new like tracer.CurrentSpan would return it from the context? Or would you prefer a new interface like traceContext that must be called explicitly by the user after a span has been created - e.g. traceContext.SetCurrentSpan and traceContext.GetCurrentSpan?

The former would make for a nice user-friendly API and the latter seems to be closer to what we have now in jaeger-context and my C# contrib library. But both have advantages and disadvantages that need further discussions of course.

yurishkuro · 2016-12-06T22:24:01Z

Sorry, you're right, it needs get and set fort current span.

bhs · 2016-12-07T05:34:31Z

@sjoerdtalsma thanks for sharing the context propagation lib... nice to see something that's already in use and not pre-coupled to a distributed tracing system per se.

@yurishkuro @cwe1ss I am prototyping a few approaches to make this pluggable... I would like OT to support in-process context managers without getting into the business of doing all of the work (since there are so many approaches and each has its advantages/disadvantages).

@cwe1ss as far as the stack concept is concerned: in Dapper we basically had each in-process Context object keep a reference to its parent Context; when finish() was called (or its moral equivalent... the naming was different), the Context would re-install its parent Context in the thread-local storage. So there was essentially a linked list of Context objects within the process.

sjoerdtalsma · 2016-12-07T07:06:30Z

@bensigelman For what it's worth, I heavily depend on the the java.util.ServiceLoader to make my library pluggable.
The abstract threadlocal implementation has the same 'linked stack' structure, except it tries to correct out-of-order closing as well.

Details on the out-of-order closing: If the closed context isn't 'current', current isn't touched.
While the parent is already-closed, it is 'skipped' in the search for the restored 'current'.
I am aware this should never happen in properly nested try-with-resources Autocloseable contexts.

Later today I'll be committing a 'java-globaltracer' project to github and planning to offer it to opentracing-contrib. There will be three main classes with only a single public class GlobalTracer (currently with 3 methods) following this design: http://bit.ly/2gkrRuo

wu-sheng · 2016-12-08T06:58:23Z

@sjoerdtalsma , can you post the site of java-globaltracer project. I have suggest earlier in opentracing/opentracing-java#63. ServiceLoader to get a tracer maybe a good way. Even though, some tracer need arguments to init, like Jaeger Tracer by @yurishkuro told, they are not suitable for the module.

But I like this type of java tracer factory/manager very much.

sjoerdtalsma · 2016-12-08T08:04:27Z

@wu-sheng @adriancole , I pushed an initial attempt (with a working unit test using the MockTracer) to github yesterday evening:
https://github.com/talsma-ict/java-globaltracer

I would love to donate it to the 'contrib' git repository and have it published in Maven central under 'io.opentracing.contrib' but only after a thorough review process. Since I'm relatively new to the opentracing community I have no experience on the best way to go forward from here...

wu-sheng · 2016-12-08T09:01:00Z

@sjoerdtalsma @adriancole , maybe we should create a new issue to discuss this?

I review the project generally, let's leave the detail discuss later.

But at first, jdk level should be fixed. Using Optional<T> led to needing jdk 1.8+, but many applications based on jdk 1.6+. I think you should implement your design on jdk 1.6, maybe the code is less grace, but it will easier to use.

codefromthecrypt · 2016-12-08T09:38:55Z

@sjoerdtalsma <https://github.com/sjoerdtalsma> @adriancole <https://github.com/adriancole> , maybe we should create a new issue to discuss this?

One of the reasons we discussed thread local was yuri's question of whether we can assume thread locals are used or not. as far as I can tell, we cannot make that assumption. If that steers the api in a particular way, then the examples were fruitful. I agree that it is probably helpful to keep weeds or investigations on different threads as doing so here might be overwhelming. probably best in the actual repo of the code in question?

sjoerdtalsma · 2016-12-08T09:44:47Z

@wu-sheng @adriancole I welcome feedback on the design and implementation choices by github issues.
https://github.com/talsma-ict/java-globaltracer
I'm also willing to transfer this repository to https://github.com/opentracing-contrib if you think this contribution is worthwhile.

wu-sheng · 2016-12-08T10:03:15Z

@sjoerdtalsma , transfer to https://github.com/opentracing-contrib, LGTM. @yurishkuro @adriancole @bensigelman , what's your opinion?

Maybe accept it, discuss on the repo, and let's see what is going on later?

bhs · 2016-12-08T19:39:25Z

@sjoerdtalsma my preference would be to

create (empty, ASF2.0) repos on opentracing-contrib for the pieces you'd like to move over
do a PR for the initial contents where we discuss the finer points of the code itself (and I would def rather not try to do that all here in this thread!)
merge those PRs as soon as they're ready, etc.

wu-sheng · 2016-12-09T01:38:38Z

@bensigelman , can you add me to the https://github.com/opentracing-contrib org. I am interested in @sjoerdtalsma's project.

wu-sheng · 2016-12-09T01:49:13Z

@sjoerdtalsma , let's me know, after you create a repo, and do a PR

sjoerdtalsma · 2016-12-09T08:37:23Z

@bensigelman I agree on all three of your points.
re. 1: I cannot create this repo in opentracing-contrib, so it would be great if anyone with the power to do so could create a new ASF2.0 repo called 'java-globaltracer' or something similar. I'll take care of step 2 from there.

wu-sheng · 2016-12-09T09:43:08Z

@bensigelman , maybe you should create a repo for @sjoerdtalsma

objectiser · 2017-03-09T10:03:22Z

@bhs If transferring a Span from one thread to another, by creating a SpanClosure which is activated in the destination thread, how is the Span disassociated from the first thread (in situations where this is desirable)?

bhs · 2017-03-11T23:37:40Z

@wu-sheng

What is the difference between Snapshot and SpanClosure?

Definitely similar in spirit... The SpanClosure actually has an interface to implement, though, since it makes it clear that a SpanClosure/Snapshot may only be activated/deactivated once.

bhs · 2017-03-11T23:37:44Z

@objectiser

If transferring a Span from one thread to another, by creating a SpanClosure which is activated in the destination thread, how is the Span disassociated from the first thread (in situations where this is desirable)?

Yes, that's a good point. I had addressed that in a previous version of this (that I didn't send out) where everything was reference-counted and finish() was implicit (when the refcount got to zero).

I've added a new commit that includes a new method, SpanBuilder.startAndActivate(), that returns a SpanClosure. I also made SpanClosure autocloseable just to see how that felt syntactically.

The async code looks like this:

        // Create a parent SpanClosure for all of the async activity.
        try (SpanManager.SpanClosure parentSpanClosure = tracer.buildSpan("parent").startAndActivate();) {

            // Create 10 async children.
            for (int i = 0; i < 10; i++) {
                final int j = i;
                futures.add(otExecutor.submit(new Runnable() {
                    @Override
                    public void run() {
                        // START child body

                        try (SpanManager.SpanClosure childSpanClosure =
                                     tracer.buildSpan("child_" + j).startAndActivate();) {
                            Thread.currentThread().sleep(1000);
                            childSpanClosure.span().log("awoke");
                            Runnable r = new Runnable() {
                                @Override
                                public void run() {
                                    Span active = tracer.activeSpanManager().active();
                                    active.log("awoke again");
                                    // Create a grandchild for each child.
                                    Span grandchild = tracer.buildSpan("grandchild_" + j).start();
                                    grandchild.finish();
                                    active.finish();
                                }
                            };
                            subfutures.add(otExecutor.submit(r));
                        } catch (Exception e) { }

                        // END child body
                    }
                }));
            }
        } catch (Exception e) { }

        try {
            for (Future<?> f : futures) {
                f.get();
            }
            for (Future<?> f : subfutures) {
                f.get();
            }
        } catch (Exception e) { }

It's worth noting that existing synchronous start()/finish() code can work just as it does today.

Stepping back a bit...

This PR moves things in the direction of formalizing the difference between Span as an API to essentially record data about a span and SpanClosure and SpanManager as APIs to manage the relationship between specific spans and specific execution contexts / threads / etc. I think that's a good thing / direction, but just wanted to point it out.

sjoerdtalsma · 2017-03-13T06:57:01Z

One question; are we 'okay' now with automatically/implicitly becoming child of the active span ?
I remember rather strong opinions on either side of that decision when discussing the java-spanmanager PR.

edit: I am personally okay with either choice as long as it is documented very clearly

objectiser · 2017-03-13T10:05:38Z

@bhs There may still be an issue if the parent scope is being managed by two separate event handlers, if the event handler creating the SpanClosure cannot store it in some specific context that is available to the end event handler.

bhs · 2017-03-13T22:37:56Z

@sjoerdtalsma

One question; are we 'okay' now with automatically/implicitly becoming child of the active span ?
I remember rather strong opinions on either side of that decision when discussing the java-spanmanager PR.

I'm definitely ok with that. We did so in Dapper and – while it has some drawbacks – it's a way better default than not-becoming the child :)

Two thoughts, though:

If.f. the programmer explicitly provides references, those would replace (not supplement) any automatic references, and
We can introduce a new Reference type, maybe INFERRED_CHILD_OF or similar; this will allow tracing systems / UIs / etc to distinguish these relationships

@objectiser

There may still be an issue if the parent scope is being managed by two separate event handlers, if the event handler creating the SpanClosure cannot store it in some specific context that is available to the end event handler.

IMO, if different executors (or whatever) start and finish a Span, they would not share a single SpanClosure activation/deactivation. Either there would be zero or two of those pairs. Note that in my proposal it's still possible to start a Span without meddling with SpanManager at all.

bhs · 2017-03-19T04:53:40Z

A little later than I'd hoped, but here is a sincere attempt to address this issue for OT-Java: opentracing/opentracing-java#111

Please add comments there... I'm making special note of it because more people are watching this issue.

beberlei · 2017-06-13T11:22:37Z

The java implementation scares me, since this kind of complexity is absolutely not needed in a request scoped language such as PHP. Would it make more sense to agree that each language that does not have explicit in process context propagation such as Go, should provide a solution that is idiomatic for this language?

felixfbecker · 2017-06-13T11:26:59Z

Since I hear this a lot, not every PHP project is request scoped. PHP can be used just like any other language to build long-running service processes, that e.g. can serve RPCs calls over TCP, even with an event loop like in NodeJS. Take as an example https://github.com/felixfbecker/php-language-server

yurishkuro · 2017-06-13T14:40:20Z

@beberlei yes, the propagation mechanism is specific to each language, although it is preferable to keep similar concepts as much as possible, such as "active span".

@felixfbecker if you look at a proposed mechanism for Python, it is actually a pluggable approach depending on which async framework in Python is used, because they require different ways of propagating the context via event loops. I assume the same will be true in PHP. For request-scoped PHP the implementation of active span source could be just a global variable.

felixfbecker · 2017-06-13T15:03:01Z

For request-scoped PHP that is not using async functions with callbacks (like async curl or async DB queries) and no event loops, it can be a global variable.

A runtime also doesn't necessarily need to have a concept of "active span" imo. In Node's event loop continuation-local-storage is super slow to propagate an active span, and it is so much easier to e.g. just pass the span around as the last parameter to functions.

dkuebric · 2017-06-13T15:11:14Z

@felixfbecker without an active span getter, it will be more cumbersome for users to combine their own instrumentation points with those built in to standard libraries (eg. web frameworks, db clients, etc), and to compose those with each other.

tedsuo · 2017-06-13T18:58:04Z

@yurishkuro has the right nuance here. In-process propagation is language dependent, but for languages that share paradigms we should try to avoid unnecessary inconsistency between API implementations. For example, I would not want the ruby implementation to stray too far from the python one.

@felixfbecker I think Javascript is a special case, where you do really want an ActiveSpan concept but the currently available mechanisms are insufficient/unofficial enough that it would be concerning to bake support for them into the official API. But if JS were to sort out continuation local storage in an effective manner, we would immediately want it for the reasons @dkuebric pointed out.

PHP concerns me because I'm ignorant, and don't understand the various possible execution contexts and how they are exposed to the PHP runtime, and to what degree the tracing you need to do is actually mixed up with foreign function calls to C code.

yurishkuro added this to the OpenTracing 2.0 milestone Dec 4, 2016

yurishkuro mentioned this issue Dec 6, 2016

Can we define a shared thread-local storage format for Spans? opentracing/opentracing-java#24

Closed

pavolloffay mentioned this issue Mar 16, 2017

Multiple jax-rs instrumentations opentracing-contrib/meta#15

Open

objectiser mentioned this issue Mar 17, 2017

Transferring managed span stack from one thread to another opentracing-contrib/java-spanmanager#12

Open

bhs mentioned this issue Mar 19, 2017

WIP: Add the notion of an "active" Span and io.opentracing.Scheduler opentracing/opentracing-java#111

Closed

felixbarny mentioned this issue Mar 19, 2017

Support for Vert.x stagemonitor/stagemonitor#253

Open

pavolloffay added a commit to pavolloffay/opentracing-java that referenced this issue Mar 29, 2017

in-process propagation resolves opentracing/specification#23

ff879fb

pavolloffay added a commit to pavolloffay/opentracing-java that referenced this issue Mar 30, 2017

in-process propagation resolves opentracing/specification#23

8cf3178

pavolloffay added a commit to pavolloffay/opentracing-java that referenced this issue Mar 30, 2017

in-process propagation resolves opentracing/specification#23

d10ea16

tedsuo mentioned this issue Apr 4, 2017

Direct import of global tracer opentracing/opentracing-javascript#71

Closed

cwe1ss mentioned this issue Apr 26, 2017

In-process propagation - Logic for activating/deactivating spans opentracing/opentracing-csharp#35

Closed

beberlei mentioned this issue Jun 15, 2017

Port ActiveSpanSource implementation from opentracing-python. tideways/opentracing-php#3

Open

jcchavezs mentioned this issue Jun 15, 2017

Adds activeSpanSource jcchavezs/opentracing-php#22

Closed

jcchavezs mentioned this issue Jul 20, 2017

In process propagation opentracing/opentracing-php#2

Closed

natemurthy mentioned this issue Jul 23, 2017

Context Propagation in Java grpc/grpc.github.io#443

Open

ocharles mentioned this issue Sep 30, 2017

What's the story with threads? ocharles/haskell-opentracing#3

Open

jcchavezs mentioned this issue Nov 16, 2017

Implement in-process context propagation opentracing/opentracing-php#43

Closed

jacktuck mentioned this issue Jun 7, 2018

💥 add opentracing support to Remit jpwilliams/remit#84

Merged

pauldraper mentioned this issue Aug 17, 2018

Scope Manager should use continuation passing #126

Open

Standard(s) for in-process propagation #23

Standard(s) for in-process propagation #23

Comments

yurishkuro commented Dec 4, 2016

wu-sheng commented Dec 4, 2016

codefromthecrypt commented Dec 4, 2016 via email

yurishkuro commented Dec 4, 2016 • edited Loading

codefromthecrypt commented Dec 4, 2016 via email

raphw commented Dec 5, 2016

wu-sheng commented Dec 5, 2016

wu-sheng commented Dec 5, 2016

codefromthecrypt commented Dec 5, 2016 via email

wu-sheng commented Dec 5, 2016

wu-sheng commented Dec 5, 2016

cwe1ss commented Dec 6, 2016

sjoerdtalsma commented Dec 6, 2016

yurishkuro commented Dec 6, 2016

cwe1ss commented Dec 6, 2016

yurishkuro commented Dec 6, 2016

bhs commented Dec 7, 2016

sjoerdtalsma commented Dec 7, 2016 • edited Loading

wu-sheng commented Dec 8, 2016

sjoerdtalsma commented Dec 8, 2016

wu-sheng commented Dec 8, 2016

codefromthecrypt commented Dec 8, 2016 via email

sjoerdtalsma commented Dec 8, 2016

wu-sheng commented Dec 8, 2016

bhs commented Dec 8, 2016

wu-sheng commented Dec 9, 2016

wu-sheng commented Dec 9, 2016

sjoerdtalsma commented Dec 9, 2016 • edited Loading

wu-sheng commented Dec 9, 2016

objectiser commented Mar 9, 2017

bhs commented Mar 11, 2017

bhs commented Mar 11, 2017 • edited Loading

Stepping back a bit...

sjoerdtalsma commented Mar 13, 2017 • edited Loading

objectiser commented Mar 13, 2017

bhs commented Mar 13, 2017

bhs commented Mar 19, 2017

beberlei commented Jun 13, 2017

felixfbecker commented Jun 13, 2017

yurishkuro commented Jun 13, 2017

felixfbecker commented Jun 13, 2017

dkuebric commented Jun 13, 2017

tedsuo commented Jun 13, 2017

yurishkuro commented Dec 4, 2016 •

edited

Loading

sjoerdtalsma commented Dec 7, 2016 •

edited

Loading

sjoerdtalsma commented Dec 9, 2016 •

edited

Loading

bhs commented Mar 11, 2017 •

edited

Loading

sjoerdtalsma commented Mar 13, 2017 •

edited

Loading