Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

sondemar · 2024-07-04T19:01:24Z

Hi, I am trying to propagate the tracing context with the Micrometer Observation API while using the AWS SDK SqsAsyncClient, which operates on the Netty Event Loop Model.

Even though I have registered ObservationThreadLocalAccessor and ObservationAwareSpanThreadLocalAccessor with ContextRegistry, and instrumented executors (configuration details of SqsAsyncClient I described in this discussion):

executor.setTaskDecorator(new ContextPropagatingTaskDecorator());

I am still unable to access the currently opened scope because the executors are invoked from the Netty EventLoop thread.

I am seeking any possible workaround until the issue is resolved.

The only solution I can think of involves using an implementation of ContextAccessor and a custom ObservationContextHolder:

executor.setTaskDecorator(runnable -> factory.captureAll(ObservationContextHolder.storedValues()).wrap(runnable));

where ObservationContextHolder properly stores values for the keys ObservationThreadLocalAccessor.KEY and ObservationAwareSpanThreadLocalAccessor.KEY.

However, this solution is not thread-safe.

Could you please advise on a proper workaround or solution?

The text was updated successfully, but these errors were encountered:

jonatan-ivanov · 2024-07-04T20:14:16Z

For the sake of completeness, there is also awspring/spring-cloud-aws#646 and netty/netty#8546. What you can do as a user is adding a 👍🏼 on the issue description and a comment that you need this. (I saw your comments on the spring-cloud-aws issue. 👍🏼)

Reactor Netty is instrumented, as far as I know it has support on the event loop/network level, maybe that's something you can enable and reuse?

Also, as far as I can understand, if you want to instrument SQS, you might be on the wrong level for that (network/http level). If you instrument the event loop and add tracing information to the HTTP request that you send to AWS, I don't think that information will be propagated to the client. I think what you should do instead is adding tracing information to the SQS message header (that will be sent to AWS over HTTP) and those are sent to the client when the SQS message is delivered. So instead of instrumenting the low-level HTTP client you should (wrap? and) instrument the SQS client itself.
Does this make sense?

Don't get me wrong, instrumenting Netty could be also useful but I think that's step two and you might not need to do it eventually.

sondemar · 2024-07-05T06:15:28Z

For the sake of completeness, there is also awspring/spring-cloud-aws#646 and netty/netty#8546. What you can do as a user is adding a 👍🏼 on the issue description and a comment that you need this. (I saw your comments on the spring-cloud-aws issue. 👍🏼)

I am already engaged in awspring/spring-cloud-aws#646 with this PR.

Reactor Netty is instrumented, as far as I know, it has support on the event loop/network level, maybe that's something you can enable and reuse.

In the discussion, I pointed out that the AWS SDK project manages the Reactor API under the hood and only exposes the SqsAsyncClient based on asynchronous CompletableFuture.

If you instrument the event loop and add tracing information to the HTTP request that you send to AWS, I don't think that information will be propagated to the client. I think what you should do instead is add tracing information to the SQS message header (that will be sent to AWS over HTTP) and those are sent to the client when the SQS message is delivered. So instead of instrumenting the low-level HTTP client you should (wrap? and) instrument the SQS client itself.
Does this make sense?

I am already propagating tracing information to SQS with message headers (by using a custom version of SenderContext), but the key is that I would also like to maintain observability of the entire process of Spring Cloud AWS API invocation (including tracing, logging, and metrics).

github-actions · 2024-07-13T01:49:40Z

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

chemicL · 2024-07-15T08:46:10Z

Hey, @sondemar ! I've been thinking a bit about your case and here are a few thoughts:

It sounds like you assume the underlying AWS implementation always uses the Executor wrapper you provide for all the CompletableFuture/CompletionStage processing. I don’t know whether that’s common.

Also, I can’t understand the ObservationContextHolder workaround - how would this info be properly populated and the ContextPropagatingTaskDecorator would not do its job? Can you explain?

As far as I understand, the usage is that at some point, within an Observation’s scope, you use the client. Then the client delegates to Netty and the ThreadLocal values might be gone. They might also not be gone if the processing reuses the same thread (!). Or even worse, if the current thread is utilized by Netty to do some other processing in the meantime. Then, when Netty delivers some signal and you execute the Completable* stack with the intended Executor wrapper you expect some TLs to be in the context. That can only work if there’s a carrier for that, like we have in Reactor - Context from which the TLs get restored.

In Reactor-Netty there is a way to restore TLs for every single task that Netty executes, but it's not recommended due to performance reasons. It could work for debugging purposes, though. The reason that can happen is because Reactor-Netty attaches the Subscriber's Context to the Channel when working in a continuation-style manner so it can be restored later for the user's chain.

sondemar · 2024-07-26T17:23:09Z

Hi @chemicL ,

Thank you for your comprehensive answer!

Regarding your thoughts and questions:

It sounds like you assume the underlying AWS implementation always uses the Executor wrapper you provide for all the CompletableFuture/CompletionStage processing. I don’t know whether that’s common.

I believe it's not common, but possible, as mentioned in the documentation. This executor seems to be executed by threads used by EventLoopGroup.

Also, I can’t understand the ObservationContextHolder workaround - how would this info be properly populated and the ContextPropagatingTaskDecorator would not do its job? Can you explain?

I wanted to incorporate the concept of the Reactor context view, as mentioned in this article. In my case, ObservationContextHolder would maintain a pseudo-context managed by a dedicated ContextAccessor, inspired by ReactorContextAccessor.

As far as I understand, the usage is that at some point, within an Observation’s scope, you use the client. Then the client delegates to Netty and the ThreadLocal values might be gone

This is exactly the issue—I lose the Observation scope stored in the ThreadLocal at this point.

In Reactor-Netty there is a way to restore TLs for every single task that Netty executes, but it's not recommended due to performance reasons. It could work for debugging purposes, though. The reason that can happen is because Reactor-Netty attaches the Subscriber's Context to the Channel when working in a continuation-style manner so it can be restored later for the user's chain.

I believe this refers to ReactorContextAccessor, which I mentioned above.

Unfortunately, I do not see a straightforward solution. The last idea that comes to mind is to create a ContextSnapshot and attach it to ExecutionAttributes. This can then be read with a dedicated ExecutionInterceptor and used to populate the ThreadLocal.

jonatan-ivanov added waiting for feedback question Further information is requested labels Jul 4, 2024

sondemar mentioned this issue Jul 5, 2024

Add support for Micrometer's Observation API for the SQS pipeline awspring/spring-cloud-aws#1164

Open

9 tasks

github-actions bot added the feedback-reminder label Jul 13, 2024

github-actions bot removed the feedback-reminder label Jul 16, 2024

shakuzen added the feedback-provided label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

sondemar commented Jul 4, 2024

jonatan-ivanov commented Jul 4, 2024

sondemar commented Jul 5, 2024

github-actions bot commented Jul 13, 2024

chemicL commented Jul 15, 2024 •

edited

Loading

sondemar commented Jul 26, 2024

Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

Comments

sondemar commented Jul 4, 2024

jonatan-ivanov commented Jul 4, 2024

sondemar commented Jul 5, 2024

github-actions bot commented Jul 13, 2024

chemicL commented Jul 15, 2024 • edited Loading

sondemar commented Jul 26, 2024

chemicL commented Jul 15, 2024 •

edited

Loading