Skip to content

Conversation

@achen2289
Copy link
Contributor

@achen2289 achen2289 commented Oct 20, 2022

Test plan -

  • Set Open Telemetry tracer and ensured query state changes are tracked as well as query completes execution successfully.
  • Ensured spans of same query are grouped under same parent span / traceId and any prior context, if provided, is properly propagated
  • Ensured baggage propagation is supported as well as set as tags of new spans
  • Tested trace exporting (export to Datadog agent)
  • Configured without open telemetry as well as with SimpleTracer / NoopTracer to ensure backwards compatibility
  • Tested edge cases with system config not matching plugin config
== RELEASE NOTES ==

General Changes
* Introduced a new plugin type for tracing. The legacy tracer module can be replaced by the new plugin for loading customized tracing infrastructure.

Open Telemetry Changes
* Introduced a new Open Telemetry tracer implementation. Users are able to enable the tracer by installing the ``presto-open-telemetry`` plugin and updating the application configuration (``config.properties``). Open Telemetry tracer can take in propagated context (only B3 specification currently supported) and baggage (W3C specification) headers, if provided, and inject into new traces / spans. Traces can be exported to any specified backend with the ``OTEL_EXPORTER_OTLP_ENDPOINT`` environment variable.

@achen2289 achen2289 requested a review from a team as a code owner October 20, 2022 22:17
@achen2289 achen2289 requested a review from presto-oss October 20, 2022 22:17
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 20, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: achen2289 / name: Alex Chen (21888a73662611aaee0fd34fc6ee20aa48646f34, f1099bde939e1aa1cc1568d93faa9f33a8285ace, 1eab0578bfbad5c047744cf68c9ad5e4fdd828d5, 9c47baad2aaeeca5a41fe418ccfce1400932cfc2)

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 8 times, most recently from 119f1c7 to 49d6041 Compare October 26, 2022 00:16
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from f070e75 to 3fd2443 Compare October 26, 2022 16:52
@cliandy
Copy link
Contributor

cliandy commented Oct 27, 2022

@highker could we get a first pass on this?

Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the first commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take care of any unfinished Span?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understanding the closing logic of opentelemetry here? Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang Addressed by ending any unended spans at the time of ending trace.

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 2 times, most recently from a2a2097 to e932e63 Compare October 28, 2022 04:30
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-level comments only

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 9 times, most recently from 5173a23 to 35a6292 Compare November 3, 2022 17:42
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 2 times, most recently from 936cf9e to ee19b16 Compare November 23, 2022 02:28
@achen2289
Copy link
Contributor Author

Hey @highker, I want to try to get my changes pushed through by the end of next week if possible. If there's anything in this PR that still needs further discussion or you still want revised, I'm happy to remove it from this PR and address it as a new one for the future if that works.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash all commits into one.

I have one major comment in the end. Please check that one first.

Comment on lines 55 to 57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these headers to presto-open-telemetry. Check my other comment

Comment on lines 1233 to 1237
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the header instead of session property to control the flags?

Comment on lines 219 to 224
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor the code a bit so we don't have to use session property or the header explicitly. Check my other comment

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to presto-open-telemetry module. Check my other comment

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

@Nullable
private TracerProvider tracerProvider;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check non null

Comment on lines 66 to 71
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly return instead of creating openTelemetry

Comment on lines 16 to 25
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] this is my major comment.

Similar to many other "handle" classes in our codebase, let's create a handle interface in com.facebook.presto.spi.tracing:

public interface TracerHandle
{
}

Change the signature of getNewTracer to

Tracer getNewTracer(TracerHandle handle);

Create a new method of TracerProvider:

/**
 * The method returns a function to take a set of HTTP headers and generate the tracer handle
 */
Function<Set<String>, TracerHandle> getHandleGenerator();

In your OpenTelemetry implementation, you should have

public class OpenTelemetryTracerHandle
        implements TracerHandle
{
    private final String traceToken;
    private final String contextPropagator;
    private final String propagatedContext;
    private final String baggage;

    ... 
}

In your OpenTelemetryTracerProvider, you should have

@Override
public Tracer getNewTracer(TracerHandle handle)
{
    OpenTelemetryTracerHandle tracerHandle = (TracerHandle) handle;
    return new OpenTelemetryTracer(tracerHandle.getXXX, ...);
}

In your getHandleGenerator implementation, move all the logics you have in presto-main here so we can hide all those headers and specialized extraction methods.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

Comment on lines 18 to 26
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 6 times, most recently from d171ea9 to 1e44c8f Compare November 29, 2022 19:02
@achen2289
Copy link
Contributor Author

@highker Trace handle implementation done

@highker highker self-requested a review November 30, 2022 04:32
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome change! All nits. Otherwise, good to go

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a bit anti pattern; but in order to keep the logic so we don't pollute the traceToken below. Check my comment below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new branch here: else if (trimEmptyToNull(servletRequest.getHeader(PRESTO_TRACE_TOKEN)) != null) or something like that. And inside the body, we can either create a SimpleTracerProvider and get the token or directly assign traceToken from PRESTO_TRACE_TOKEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated trace token to replicate similar logic as before

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use immutableMap builder

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not critical but good to follow:

we don't usually name a function or var with word "Map"; instead, we use getRequestHeaders. Same for the variable headerMap --> headers. And everything else in the PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: requestHeaders

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else; it's redundant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a generic error; check my other comment

Comment on lines 92 to 87
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two options:

  • you can create your own error code class within the module with a brand new base code; check any other implementation of ErrorCodeSupplier; or
  • you can just throw runtime error or put generic code to be a catch-all to save the effort, which is not recommended but fine

@achen2289
Copy link
Contributor Author

@highker Addressed nits, and made one further change for open telemetry baggage propagation. Put it in new commit so it's easier to see and will squash if it looks good.

@highker highker self-requested a review November 30, 2022 22:57
@highker
Copy link

highker commented Nov 30, 2022

Thanks. Please squash all commits into one.

@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from 11edf44 to 19a4184 Compare November 30, 2022 23:07
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more nits; otherwise LGTM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ImmutableMap<String, String> -> Map<String, String>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Optional.of(tunnelTraceId)

it's not null already

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ImmutableMap<String, String> -> Map<String, String>

Comment on lines 30 to 34
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a base so we don't collide with other error codes. Find a unique one closest to an existing one. I assume it's starting with 0x05... something
Screen Shot 2022-11-30 at 3 56 58 PM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not expert in open telemetry but are we sure the logic is correct by modifying static variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be fine. If we have a different context propagator, we need to rebuild the open telemetry instance for all tracers that may exist.

@highker highker self-assigned this Dec 1, 2022
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch 3 times, most recently from d227668 to 6ad17a0 Compare December 1, 2022 04:22
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this class from this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

support simple tracer in new open telemetry implementation

bump okhttp3 version to 4.5.0

use system val if client val not set

add support for context extraction from b3 propagation and overall interface

revert back to okhttp3 version 3.9.0 for this PR

implement W3C baggage header extraction

tracer handle implementation, remove context propagator from system property, general refactor

address nits and open telemetry baggage updates
@achen2289 achen2289 force-pushed the achen/integrate-and-implement-open-telemetry-tracing branch from 6ad17a0 to 54c6530 Compare December 1, 2022 05:32
@highker highker changed the title achen/integrate and implement open telemetry tracing integrate and implement open telemetry tracing Dec 1, 2022
@highker highker merged commit 2c1d402 into prestodb:master Dec 1, 2022
@achen2289
Copy link
Contributor Author

@highker Thanks for the helpful reviews and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants