Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Jaeger remote sampler to use grpc lite #4043

Merged

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Jan 4, 2022

Resolves #4014

Notable changes:

  • added some methods to CodedInpotStream to support additional data types
  • Jaeger remote sampler support only grpc light via okhttp
  • switched tests from in-process grpc to armeria

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, looks great! Hope you could learn something about gRPC and enjoy it while writing the PR. If not, then good job anyways :)

@@ -3,6 +3,8 @@ plugins {
id("otel.publish-conventions")

id("otel.animalsniffer-conventions")

id("com.squareup.wire")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need as many as the OTLP tests but can we add one test suite with the grpc dependencies used?

https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/otlp/trace/build.gradle.kts#L31

It's ok to just copy the test code instead of abstract

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added support only for "grpc light" approach via okhttp.

Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay pavolloffay force-pushed the jaeger-remote-sampler-grpc-lite branch from 7aa6250 to 4442800 Compare January 5, 2022 10:31
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
@codecov
Copy link

codecov bot commented Jan 5, 2022

Codecov Report

Merging #4043 (d5a38b5) into main (96b7895) will increase coverage by 0.01%.
The diff coverage is 81.94%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #4043      +/-   ##
============================================
+ Coverage     90.12%   90.13%   +0.01%     
- Complexity     4374     4486     +112     
============================================
  Files           518      530      +12     
  Lines         13303    13788     +485     
  Branches       1276     1321      +45     
============================================
+ Hits          11989    12428     +439     
- Misses          909      926      +17     
- Partials        405      434      +29     
Impacted Files Coverage Δ
...xporter/otlp/internal/grpc/OkHttpGrpcExporter.java 81.52% <ø> (ø)
...er/otlp/internal/okhttp/OkHttpExporterBuilder.java 85.41% <0.00%> (-1.63%) ⬇️
...autoconfigure/spi/AutoConfigurationCustomizer.java 0.00% <0.00%> (ø)
...nfigure/AutoConfiguredOpenTelemetrySdkBuilder.java 89.00% <33.33%> (-0.19%) ⬇️
.../metrics/internal/descriptor/MetricDescriptor.java 84.21% <40.00%> (-15.79%) ⬇️
...metry/exporter/otlp/internal/CodedInputStream.java 57.89% <50.00%> (+14.71%) ⬆️
...er/sampler/MarshallerRemoteSamplerServiceGrpc.java 71.42% <71.42%> (ø)
...r/sampler/SamplingStrategyResponseUnMarshaler.java 71.96% <71.96%> (ø)
...ace/jaeger/sampler/JaegerRemoteSamplerBuilder.java 91.17% <78.57%> (-8.83%) ⬇️
...race/jaeger/sampler/DefaultGrpcServiceBuilder.java 79.66% <79.66%> (ø)
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96b7895...d5a38b5. Read the comment docs.

@pavolloffay
Copy link
Member Author

@anuraaga the coverage is low, but it seems like the "grpc lite" classes in otlp:common are not covered either.

Signed-off-by: Pavol Loffay <[email protected]>
pavolloffay and others added 4 commits January 10, 2022 17:48
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pavolloffay - I have added back code from my commit. I had to revert the change of the method signature to void. There is some more cleanup that can be done but that can happen later

@pavolloffay
Copy link
Member Author

Do not merge yet, the tests are flaky.

@pavolloffay
Copy link
Member Author

I am going to look at fixing them.

regarding the return type of the service, it would be appropriate propagate the result of the call to the caller.

Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay
Copy link
Member Author

@anuraaga I could not reproduce the failing test (unimplemented_error_server_response) locally.

while ./gradlew --no-build-cache  :sdk-extensions:jaeger-remote-sampler:cleanTest :sdk-extensions:jaeger-remote-sampler:check      ; do :; done  

Now it seems that it is passing on CI.

Signed-off-by: Pavol Loffay <[email protected]>
Signed-off-by: Pavol Loffay <[email protected]>
@pavolloffay
Copy link
Member Author

pavolloffay commented Jan 11, 2022

The second to last CI job failed on the flaky test. @anuraaga any ideas why the logs are not being captured?

JaegerRemoteSamplerTest > unimplemented_error_server_response() FAILED
    org.opentest4j.AssertionFailedError: Contain the string <Server responded with UNIMPLEMENTED.> ==> None of the 7 captured log events matched the filter predicate
        at app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:39)
        at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
        at app//io.github.netmikey.logunit.api.LogCapturer.lambda$assertContains$3(LogCapturer.java:174)
        at [email protected]/java.util.Optional.orElseGet(Optional.java:364)
        at app//io.github.netmikey.logunit.api.LogCapturer.assertContains(LogCapturer.java:173)
        at app//io.github.netmikey.logunit.api.LogCapturer.assertContains(LogCapturer.java:121)
        at app//io.opentelemetry.sdk.extension.trace.jaeger.sampler.JaegerRemoteSamplerTest.unimplemented_error_server_response(JaegerRemoteSamplerTest.java:351)

@anuraaga
Copy link
Contributor

anuraaga commented Jan 12, 2022

@pavolloffay I added another commit 0bf83a1 to workaround issue with logunit. We've generally been able to avoid problems by using private static final Logger in classes (previous issues were when we had private final Logger). Not sure what's going on this time but worked around it anyways

@anuraaga
Copy link
Contributor

Hmm had confidence in my workaround but somehow it's not working, need to look more

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of additional comments. Looking pretty good though 👍

requireNonNull(retryPolicy, "retryPolicy");
this.retryPolicy = retryPolicy;
return this;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RetryPolicy implementation is experimental and is part of the OTLP spec. I think you should remove the methods which are not exposed in JaegerRemoteSamplerBuilder from io.opentelemetry.sdk.extension.trace.jaeger.sampler.GrpcServiceBuilder:

  • addRetryPolicy
  • addHeader
  • setCompression
  • setTimeout

Also consider removing setChannel(ManagedChannel) if you don't anticipate people using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a soon followup we can add all except addRetryPolicy to the builder. I don't think it hurts too much keeping them.

setChannel we hope is not used but since it was public let's stick with having it for now and deprecating all of our setChannel methods in one PR

@anuraaga
Copy link
Contributor

anuraaga commented Jan 17, 2022

@pavolloffay FYI I added some commits, please don't lose them :)

For the record, I will be on the hook if the build is flaky after merging this.

I am hopeful for d5a38b5 my rough hypothesis is that due to HTTP/2 details, it depends on the state of the connection if there is an empty body sent and the trailer includes the status, or the headers include the status and there's no body and body read fails, which previously skipped our status handling logic. The reason I had this idea is because I saw the "could not consume server response." being logged even though none of our test cases are meant to bail on that line.

@anuraaga anuraaga merged commit 5d521f5 into open-telemetry:main Jan 19, 2022
@anuraaga
Copy link
Contributor

Thanks @pavolloffay for helping with this tedious PR!

@pavolloffay
Copy link
Member Author

it was fun to look at the proto and grcp! Thanks for guidance as well :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch Jaeger remote sampler to "grpc" lite mechanism
3 participants