Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reactor instrumentation to pick up spans from reactor context #4159

Merged
merged 9 commits into from
Oct 12, 2021

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Sep 17, 2021

I'm trying to instrument my async library APIs that return e.g. Mono and never subscribe/block. Internally they do HTTP requests and logs. I'm looking for a way for HTTP auto-instrumentation and loggers MDC to pick up context from my library instrumentation.

Let's say that's the library call I'm trying to instrument:

public Mono<String> call() {
    return Mono.defer( () -> {
        checkCurrenSpan("internal", true);

        Request request = new Request.Builder()
                .url("http://example.com")
                .build();

        httpClient.newCall(request).execute();
        return Mono.just("done");
    });
}

And let's say here's how I trace it

 private <T> Mono<T> traceMono(Mono<T> publisher, String spanName) {
        // this hack forces 'publisher' to run in the onNext callback of `TracingSubscriber`
        // created for this publisher and with current() span that refer to span created here
        // without it it runs in the parent scope
        return DUMMY
                .flatMap(i -> publisher)
                .doOnEach((signal) -> {
                    if (signal.isOnError()) {
                        recordException(signal.getContextView(), signal.getThrowable());
                        endSpan(StatusCode.ERROR, signal.getContextView());
                    } else if (signal.isOnComplete()){
                        endSpan(StatusCode.UNSET, signal.getContextView());
                    }
                })
                .contextWrite(ctx -> startSpan(ctx, spanName));
}

private reactor.util.context.Context startSpan(reactor.util.context.Context subscriberContext, String name) {
        io.opentelemetry.context.Context parent =
                subscriberContext.getOrDefault("otel-context-key", io.opentelemetry.context.Context.current());

        Span span = tracer.spanBuilder(name).setParent(parent).startSpan();

        System.out.printf("Starting span '%s', Parent span id %s, started id - %s\n",
                name,
                Span.fromContext(parent).getSpanContext().getSpanId(),
                span.getSpanContext().getSpanId());

        return subscriberContext.put("otel-context-key",  parent.with(span));
}

If I do this with agent enabled, I will have my HTTP spans unrelated to traceMono span, since I never made traceMono current. But there is no good way (?) to make traceMono span current without leaking context.

LoggingSpanExporter - 'HTTP GET' : 73202ef91912cd88593276ccaa38b806 5aeffa52ef554cea CLIENT 
LoggingSpanExporter - 'traceMono' : 34b491175e64ec59f19c64e81be82283 b341875781ae7a78 INTERNAL

Proposal

Let reactor instrumentation capture trace context from current or from explicit reactor context. Libraries/user code can explicitly pass context to instrumentation and it will propagate to upstream calls.

This way I don't ever need to make traceMono span current myself and instrumentation will make it current in all callbacks.

Here are more examples to play with https://gist.github.com/lmolkova/156a9e6453f5d3f8242689cfe6a6c07a.

*This change only adds context APIs, but no convenience tracing methods yet

@iNikem
Copy link
Contributor

iNikem commented Sep 18, 2021

Our current reactive framework instrumentations (usually) try to execute all callbacks in the context that was active during subscribe. Do you propose to replace that or augment that with the ability to specify span context explicitly?

@lmolkova
Copy link
Contributor Author

Our current reactive framework instrumentations (usually) try to execute all callbacks in the context that was active during subscribe. Do you propose to replace that or augment that with the ability to specify span context explicitly?

I'm proposing to support both - full change is in this pr and adds explicit context support.

@trask
Copy link
Member

trask commented Sep 19, 2021

I see this problem not as changing when context is captured (I think it should still be on subscribe), but rather allowing downstream publishers to influence the otel context of upstream publishers.

E.g. here's a more complex example that I would love to see create the trace parent -> readItem -> httpGet somehow:

public class Example {

    private static final Tracer tracer = GlobalOpenTelemetry.getTracer("test");

    private static final Mono<String> READ_ITEM = readItem();

    public static void main(String[] args) {
        Span span = tracer.spanBuilder("parent").startSpan();
        try (Scope ignored = span.makeCurrent()) {
            String item = READ_ITEM.block();
            System.out.println(item);
        }
    }

    static Mono<String> readItem() {
        Mono<String> readItem = httpGet("https://mydatabase").map(v -> "item: " + v);
        return traceMono(readItem, "readItem");
    }

    static Mono<String> httpGet(String url) {
        Mono<String> httpGet = Mono.just("http response from " + url);
        return traceMono(httpGet, "httpGet");
    }

    static <T> Mono<T> traceMono(Mono<T> publisher, String spanName) {
        ???
    }
}

@lmolkova

This comment has been minimized.

@lmolkova
Copy link
Contributor Author

So as a summary:

  1. there is a problem of not being able to make context current in reactor - this PR is a solution
  2. another problem of current not being available when the supplier is called - this PR does not address it

While I'd like to solve p2 as well, and will look for the solution - p1 is still very valuable for the reactor instrumentation. I hope we can start with it and see how p2 can be fixed for the implicit case.

In our libraries, I'll still use explicit context with reactor context whenever possible.

@lmolkova

This comment has been minimized.

@lmolkova
Copy link
Contributor Author

So, can we discuss it at one of the instrumentation meetings? Can someone more familiar with the reactor help to find a more elegant approach?

And just to clarify: this PR, even though it does not solve current context issue for the most important case, still feels like an improvement (we can propagate spans through reactor without making them current to at least onNext callback).

@trask
Copy link
Member

trask commented Sep 20, 2021

So, can we discuss it at one of the instrumentation meetings?

definitely, I just added it to the agenda for Thu, and will try to understand things a bit better before then

Can someone more familiar with the reactor help to find a more elegant approach?

cc: @HaloFour 👀

@HaloFour
Copy link
Contributor

I've been thinking about a general API for tracing Reactor that didn't entirely rely on the instrumentation/operator being installed. It's something I'd be interested in adding to the instrumentation library, but it would work a bit differently than what is proposed here.

Beyond that I'm not sure I understand the purpose of this PR. The TracingOperator is supposed to capture the current Context at the point of assembly, so the following should work fine and not leak:

private Mono<?> traceMono(Mono<?> publisher) {
    Span span =  tracer.spanBuilder("traceMono").startSpan();

    try (Scope scope = Context.current().with(span)) {
        return publisher
            .doOnError(t -> span.recordException(t))
            .doFinally(t -> span.end());
    }
}

Apart from that I'm concerned that the proposed PR captures the Span separately from the Context and then tries to tie them back together. If we were to put anything into the Reactor context it feels like it should be the full OTel Context.

As for creating the span on subscribe, which is certainly necessary for cold and reusable publishers, that is something probably better served with a separate tracing operator that can be applied with transform. Apart from working better with the fluent Reactor APIs it would also make it very easy to trace each subscription.

@anuraaga
Copy link
Contributor

As for creating the span on subscribe, which is certainly necessary for cold

Unfortunately I don't think there is a way for code to know if a Mono is going to be subscribed in the future or not, so we couldn't ever use an approach in auto instrumentation that doesn't trace on subscription. While manual code could probably do it, it feels hairy to have both approaches interacting with each other.

One issue we often have is when a framework does not provide a filter-like API for interception. For example, tracing seems best implemented in doOnSubscribe

https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Mono.html#doOnSubscribe-java.util.function.Consumer-

The javadoc describes perfectly what we want to do, add behavior when the Mono is surprised. But without something like delegate.next(), we cannot make the context current, which would otherwise be picked up by our instrumentation to then allow propagation throughout the reactive stream.

Reactive frameworks could be nice to us and provide such a callback which I think could help here. But I suspect their general stance would be, use the reactive context. So in this sense allowing reading from either Context.current() or the reactor context in our instrumentation does seem like a reasonable feature. I can't think of an alternative for @lmolkova's use case, which is a valid one. My main concern is whether it's always correct to prioritize the reactor context, or is there a case where the threadlocal is still the correct parent despite there being something in the reactor context.

Also, agree with @HaloFour that we want to expect the OTel Context object in the reactor context, not just a Span. And rather than expose a String, we should expose static methods for 1) create new reactor context with an OTel context and 2) read OTel context from reactor context.

@HaloFour
Copy link
Contributor

One issue we often have is when a framework does not provide a filter-like API for interception. For example, tracing seems best implemented in doOnSubscribe.

I'd propose an API utilize Mono#transform/Flux#transform + Operators#lift which allows you to basically do what TracingOperator does and decorate the subscriber. That gives easy access to all of the callbacks necessary to start the span on subscription, end that span on complete/error/cancel, propagate context onto the subscriber/publisher thread, and access the Reactor context which is otherwise tricky. For the consumer it could look as simple as:

Mono<T> mono = ...;
Mono<T> traced = mono.transform(withSpan("MySpan"));

I started to work on a PoC of this when I was toying with multiple subscriptions via @WithSpan.

My main concern is whether it's always correct to prioritize the reactor context, or is there a case where the threadlocal is still the correct parent despite there being something in the reactor context.

This is definitely my biggest concern here as well. It's quite likely that the Context captured on assembly is the better option but that could be difficult to discern. I've not fully thought it through but perhaps the Reactor operator could put a cookie into the OTel context that is captured in the Reactor context. The operator could compare the cookies within the current/captured OTel context with that in the Reactor context to determine if the former is based on the latter and would be a better context to use. But that probably doesn't solve every case.

@HaloFour
Copy link
Contributor

@trask

here's a more complex example

Here's a real quick&dirty implementation of the TracingOperator API I suggested that can do this: https://gist.github.com/HaloFour/6cac6653120da96fc1e8e1effca7b5b5

The method TracingOperator#figureOutParentContext is where the operator needs to figure out the correct context to use and I don't know that there is a cut&dry way to do that. The precedence of which Context to use might change depending on the style, especially OnAssembly vs. Current. And it might be challenging to discern when to favor one of those Contexts vs. one stored in the Reactor subscriber context. Perhaps an API could offer a way to hint which Context to use.

@lmolkova

This comment has been minimized.

@lmolkova
Copy link
Contributor Author

lmolkova commented Sep 22, 2021

Another discussion point on exposing APIs on TracingOperator:

A little copying is better than a little dependency.

  1. It's useful and looks great, but using it requires dependency on reactor instrumentation library. We, in Azure SDKs, would never depend on it, however beautiful it is (because of dependency hell), and would rather create spans ourselves and set them in the context. I guess other libraries would have the same concern
  2. End users would probably rarely discover it unless it's in an annotations package (that'd have to depend on reactor - bummer)

I think the main (and huge!) value comes if it's an example and guidance on how to instrument reactor - may be a blog post or md file in this repo? I promise to steal from it a lot.

@HaloFour
Copy link
Contributor

@lmolkova

A little copying is better than a little dependency.

I understand the sentiment but I do think that copied code will not be trivial. It also seemingly doesn't make sense to have a proposal here for behavior in the Reactor instrumentation if it is a goal to not use the Reactor instrumentation. Something needs to extract that explicit Reactor context into the current Context for the rest of the SDK and instrumentation to work with it.

@lmolkova
Copy link
Contributor Author

@lmolkova

A little copying is better than a little dependency.

I understand the sentiment but I do think that copied code will not be trivial. It also seemingly doesn't make sense to have a proposal here for behavior in the Reactor instrumentation if it is a goal to not use the Reactor instrumentation. Something needs to extract that explicit Reactor context into the current Context for the rest of the SDK and instrumentation to work with it.

contextWrite seems to be this minimal sufficient API that allows me to pass context, so as long as I can rely on key not changing, I'm happy.
I'm saying there is no strong need to expose any API (except enabling instrumentation) as long as one in the agent works. And guidance is to how to write my code so it works properly with this auto-instrumentation

@HaloFour
Copy link
Contributor

HaloFour commented Sep 23, 2021

@lmolkova

`I'm saying there is no strong need to expose any API (except enabling instrumentation) as long as one in the agent works. And guidance is to how to write my code so it works properly with this auto-instrumentation

Ah, I get you now. It's not that you're avoiding the instrumentation, it's that you don't want a dependency on the library APIs. That does make more sense. I assume that would also preclude any helper methods that would avoid leaking a String key to the consumer.

I am still quite concerned about how this feature would interact with other instrumentation around Reactor or libraries that utilize Reactor. It seems like it would be really easy for the OTel Context stored in the Reactor Context to not be the correct context.

@lmolkova
Copy link
Contributor Author

lmolkova commented Sep 23, 2021

It seems like it would be really easy for the OTel Context stored in the Reactor Context to not be the correct context.

Do you have any examples of it? I'd like to play and find some reasonable solutions. My point here is that we know for sure current trace context does not work (see my example above). Or try running the same example with @WithSpan annotation - it does not work well.

It sounds like reactor context works at least better than current() and all confusion comes from the fact that we have two different contexts.

So I hope to find some reasonable solution here.
The one I have in mind is to mostly rely on reactor context: anything that instruments reactor calls has to follow some rules: e.g.

  • when starting spans and resolving parent context, give preference to reactor context over current()
  • start span and always store it in the reactor context behind standard known key

It involves some manual steps, but people who create spans are probably willing to put a bit extra effort to make their instrumentation work (and current() doesn't).

I'm new to the reactor world (which I'm working on), so I'm looking for some examples here, based on the community expertise, where this solution does not work (or works worse than current()) and am happy to adjust it.

@HaloFour
Copy link
Contributor

Do you have any examples of it?

I've not tried to put together a proof of concept, but if your traced Reactor chain used flatMap to invoke other instrumented code that would result in child spans which would not be reflected in that Reactor context. Any subsequent instrumentation in that reactive flow would pick up that Reactor context and not the correct context containing those child spans.

It sounds like reactor context works at least better than current() and all confusion comes from the fact that we have two different contexts.

I disagree that the Reactor context would be better.

Or try running the same example with @WithSpan annotation - it does not work well.

Ultimately I think the problem is not one of explicitness, it's that the TracingOperator/TracingSubscriber provided by auto-instrumentation is not handling certain use cases. I think it'd be more ideal to identify these cases and see how the operator can be modified so that they work "out of the box", that way you wouldn't have to write any code to wrangle the context.

Comment on lines 93 to 110
public static reactor.util.context.Context storeInContext(
reactor.util.context.Context context, Context traceContext) {
return context.put(TRACE_CONTEXT_KEY, traceContext);
}

/**
* Gets Trace {@link io.opentelemetry.context.Context} from Reactor {@link
* reactor.util.context.Context}.
*
* @param context Reactor's context to get trace context from.
* @param defaultTraceContext Default value to be returned if no trace context is found on Reactor
* context.
* @return Trace context or default value.
*/
public static Context fromContextOrDefault(
reactor.util.context.Context context, Context defaultTraceContext) {
return context.getOrDefault(TRACE_CONTEXT_KEY, defaultTraceContext);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of:

  • storeInContext() -> storeOpenTelemetryContext()
  • fromContextOrDefault() -> getOpenTelemetryContext()

also, I think (future) renaming of the class might help:

  • TracingOperator -> ReactorTracing
  • TracingOperatorBuilder -> ReactorTracingBuilder

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to rename all of TracingOperator to ReactorTracing, it sort of matches the naming we have started applying to other instrumentation (e.g. GrpcTracing/GrpcTracingBuilder), and it's probably nice to have a single entry point for the instrumentation instead of two.

other than that looks great to me.

@lmolkova
Copy link
Contributor Author

lmolkova commented Oct 7, 2021

I was thinking to rename all of TracingOperator to ReactorTracing, it sort of matches the naming we have started applying to other instrumentation (e.g. GrpcTracing/GrpcTracingBuilder), and it's probably nice to have a single entry point for the instrumentation instead of two.

other than that looks great to me.

Yeah, I can do it. I thought that as we'll add more tracing stuff (start/end span convenience) having configuration (register/reset hooks) on the same class may get noisy, so I thought about separating tracing.

* @param context Reactor's context to store trace context in.
* @param traceContext Trace context to be stored.
*/
public static reactor.util.context.Context storeOpenTelemetryContext(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exposing store/get, can we expose runInScope (runWithContext)? It seems to be the better API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it implies using a hack with dummy mono/flux even when it's not needed (not a big deal). But I think we ultimately don't need runWithContext - having something like tracePublisher(Publisher, SpanBuilder, AttributeExtractors?, whatnot) will eliminate the need for it and provide convenience.

For those who want low-level control (e.g. if I want to start/end a span conditionally or in some funky way), get/set context is the minimal needed functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if I need to inject headers or enrich span, I'd prefer explicit context from reactor (i.e. get method) over Context.current(). Since current can change in any user callback, I never know what it refers to. There are no guarantees with explicit reactor context either, so it's rather a preference.

@iNikem
Copy link
Contributor

iNikem commented Oct 11, 2021

For existing customers of the javaagent who monitors their Reactor-Netty call, will anything change after this PR? I mean, will the produced telemetry change?

@anuraaga
Copy link
Contributor

anuraaga commented Oct 11, 2021

@iNikem As far as I know, the only existing instrumentation affected by this change is @WithSpan.

@lmolkova Do you think it's worth having a test including @WithSpan and a reactor-netty call to confirm the propagation works between them? Seems like it should, but is a good "real world" test case probably. We'd probably add to https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/reactor-netty/reactor-netty-1.0/javaagent/src/test/groovy/io/opentelemetry/javaagent/instrumentation/reactornetty/v1_0/ReactorNettyConnectionSpanTest.groovy#L22 (can rename it to be more generic I think)

@trask
Copy link
Member

trask commented Oct 11, 2021

@HaloFour if you have time, would be great to get your review on this also

@lmolkova
Copy link
Contributor Author

For existing customers of the javaagent who monitors their Reactor-Netty call, will anything change after this PR? I mean, will the produced telemetry change?

Confirming what @anuraaga said - WithSpan will start adding trace context to reactor context to support some cases where pure current() propagation doesn't work.

@lmolkova
Copy link
Contributor Author

Do you think it's worth having a test including @WithSpan and a reactor-netty call to confirm the propagation works between them?

agreed, trying to add one and stuck with something unrelated

@lmolkova
Copy link
Contributor Author

lmolkova commented Oct 11, 2021

@lmolkova Do you think it's worth having a test including @WithSpan and a reactor-netty call to confirm the propagation works between them? Seems like it should, but is a good "real world" test case probably. We'd probably add to https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/reactor-netty/reactor-netty-1.0/javaagent/src/test/groovy/io/opentelemetry/javaagent/instrumentation/reactornetty/v1_0/ReactorNettyConnectionSpanTest.groovy#L22 (can rename it to be more generic I think)

@anuraaga You sent me to the rabbit hole :) Now that I'm back, I added ReactorNettyWithSpanTest to overcome some test instrumentation issues ( thanks @trask for sharing it) and since test HTTP server is fully synchronous, it needs a bit of hacking.

import static io.opentelemetry.api.trace.SpanKind.INTERNAL
import static io.opentelemetry.api.trace.SpanKind.SERVER

class ReactorNettyWithSpanTest extends InstrumentationSpecification implements AgentTestTrait {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@HaloFour
Copy link
Contributor

@trask

Looks good to me! The approach with the "dummy" publisher and flatMap is interesting and a clever way to force the OTel context.

@trask trask merged commit 1dfae9f into open-telemetry:main Oct 12, 2021
RashmiRam pushed a commit to RashmiRam/opentelemetry-auto-instr-java that referenced this pull request May 23, 2022
…pen-telemetry#4159)

* Allow reactor instrumentation to pick up spans from reactor context in addition to current tracing context

* And fix nested WithSpan with defer

* up

* Review comments

* move ReactorTracing to Operation and rename to ContextPropagationOperator

* fix build

* Add reactor-netty client test nested under WithSpan

* Add link to the issue in comments

* clean up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants