Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix vertx otel sdk reload in devmode #28792

Merged
merged 1 commit into from
Dec 5, 2022

Conversation

brunobat
Copy link
Contributor

@brunobat brunobat commented Oct 24, 2022

Part of #26444
Currently OTel will not live reload, working only on first start.

This PR includes:

  • Centralise instrumentation related classes in the Instrumentation processor and recorder.
  • Create an OpenTelemetryWapper that will detect when the global OTel SDK has changed in development mode.

Caveat:

  • On REST requests, the first instrumentation call after reload doesn't send span because OTEL is reloaded after the return of the call. The REST call starts with the old OTel SDK object in the vertx instrumentation and ends with the new one.

@quarkus-bot

This comment was marked as resolved.

@brunobat brunobat changed the title fix vertx otel sdk reload in devmode Fix vertx otel sdk reload in devmode Oct 24, 2022
@brunobat brunobat force-pushed the otel-autoconfigure-3 branch 4 times, most recently from c72884c to dfd650b Compare October 28, 2022 10:13
@brunobat brunobat marked this pull request as ready for review October 28, 2022 10:15
@brunobat brunobat mentioned this pull request Oct 28, 2022
5 tasks
@quarkus-bot

This comment has been minimized.

Copy link
Member

@radcortez radcortez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to avoid the check on each tracer method because this would only be useful for DEV mode, but we still are doing the check for runtime with no gain.

My recommendation is to produce a CDI Bean of VertxTracer, so we have updated Instrumenters with the correct OpenTelemetry instance. Then in the OpenTelemetryVertxTracingFactory, we just have a delegate to the CDI VertTracer and keep a reference to the one we pass to the factory, so we can update it by replacing the new CDI Bean on reload.

To make the span also works on the reload request, I think we can recreate the start of the span on VertxHttpHotReplacementSetup before the root handler dispatches the current request. We have the instance of VertxTracer, so we can call the receiveRequest method, and we should have everything available for it.

The tricky part is that the current request holds the state of the Trace object (created initially by the request that triggered the reload). In this case, we would need to access the request internal state directly to replace the Trace object. There are no API's for that, but I think that some bytecode transformations for this particular case (and only for DEV mode) are not going to hurt.

@stuartwdouglas
Copy link
Member

I think we need to avoid the check on each tracer method because this would only be useful for DEV mode, but we still are doing the check for runtime with no gain.

My recommendation is to produce a CDI Bean of VertxTracer, so we have updated Instrumenters with the correct OpenTelemetry instance. Then in the OpenTelemetryVertxTracingFactory, we just have a delegate to the CDI VertTracer and keep a reference to the one we pass to the factory, so we can update it by replacing the new CDI Bean on reload.

The current check is basically just a volatile read. The CDI approach you are talking about will almost certainly be way more expensive per request.

@brunobat
Copy link
Contributor Author

brunobat commented Nov 22, 2022

I think we need to avoid the check on each tracer method because this would only be useful for DEV mode, but we still are doing the check for runtime with no gain.
My recommendation is to produce a CDI Bean of VertxTracer, so we have updated Instrumenters with the correct OpenTelemetry instance. Then in the OpenTelemetryVertxTracingFactory, we just have a delegate to the CDI VertTracer and keep a reference to the one we pass to the factory, so we can update it by replacing the new CDI Bean on reload.

The current check is basically just a volatile read. The CDI approach you are talking about will almost certainly be way more expensive per request.

I would prefer to avoid anything CDI related, if possible and keep the current strategy... Maybe simplify the detection to setup instrumenters into helper method.
The resetIfChanged() in prod mode is a constant anyway... I bet this will get optimised.

@radcortez
Copy link
Member

But you can completely avoid that check. And it is not only about the check. Since you need to add it to every tracing method, we must remember to always add that check to all implementations moving forward. Also, I believe instruments shouldn't have to deal with code that is only relevant for Dev mode.

The issue is that since Vert.x is always the same instance from restart to restart, the VertxTracer which contains the original instrumenters has to be updated. I think it is easier and more straightforward to just replace the current VertxTracer with a new one originating from the restart operation. You avoid changing the instrumenters and additional wrappers for the OpenTelemetry instance. This could be done in a way only to be applied to Dev mode, and regular Runtime would keep its current implementation.

As for CDI, the issue is that we should remove calls to GlobalOpenTelemetry.get, which is no longer recommended. Yes a different issue, but somehow related to how we are starting things.

@brunobat
Copy link
Contributor Author

brunobat commented Nov 22, 2022

@stuartwdouglas , in the end, @radcortez proposal about creating delegates on the the Tracer Factory makes more sense because it will not require future changes in the instrumenters themselves.
I avoided CDI altogether and I'm using a simplified factory for production code.
Roberto helped with the code and he has also a solution for the first request after reload, but given its complexity, it will be done in a future PR.

@brunobat
Copy link
Contributor Author

brunobat commented Dec 5, 2022

@gsmet any prediction on when this can be merged?

@radcortez
Copy link
Member

I'll merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenTelemetry instrumentation stops working for resteasy-reactive resources and JDBC after dev mode reload
5 participants