Skip to content

Conversation

@codefromthecrypt
Copy link
Contributor

In many places, the trace context of callbacks was accidentally set to
the client span, not the invocation context. I noticed a hack trying to
work around this. This code fixes all the problems around context. It
also removes some sporadic logging, which was only applied to a few
hooks.

Finally, this adds Brave tests which would have caught the problems
earlier. Notably, there is still more work to do as this will not help
with duplicate instrumentation, which is normal when reactor-netty is
the WebClient's HTTP connector.

In many places, the trace context of callbacks was accidentally set to
the client span, not the invocation context. I noticed a hack trying to
work around this. This code fixes all the problems around context. It
also removes some sporadic logging, which was only applied to a few
hooks.

Finally, this adds Brave tests which would have caught the problems
earlier. Notably, there is still more work to do as this will not help
with duplicate instrumentation, which is normal when reactor-netty is
the WebClient's HTTP connector.
MonoWebClientTrace trace = new MonoWebClientTrace(next, wrapper.buildRequest(),
this, span);
// TODO: investigate why this commit leaks a scope:
// 8f5bcdabd7af23df443e771432eb85597f3b3076
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixed one part but not the others.. the proper fix is this PR

@codecov
Copy link

codecov bot commented Feb 14, 2020

Codecov Report

Merging #1562 into 2.2.x will increase coverage by 0.01%.
The diff coverage is 83.92%.

Impacted file tree graph

@@             Coverage Diff             @@
##             2.2.x    #1562      +/-   ##
===========================================
+ Coverage       60%   60.01%   +0.01%     
- Complexity     864      865       +1     
===========================================
  Files          159      159              
  Lines         4720     4674      -46     
  Branches       536      524      -12     
===========================================
- Hits          2832     2805      -27     
+ Misses        1637     1626      -11     
+ Partials       251      243       -8
Impacted Files Coverage Δ Complexity Δ
...nt/web/client/TraceWebClientBeanPostProcessor.java 82.53% <83.92%> (+5.21%) 11 <0> (ø) ⬇️
.../sender/ZipkinRestTemplateSenderConfiguration.java 79.36% <0%> (+3.17%) 4% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ca616...6b2541b. Read the comment docs.

@codefromthecrypt
Copy link
Contributor Author

@simonbasle @robotmrv I noticed in #1126 we started handling both onNext and onComplete, to finish a span. The code that handles onComplete when onNext isn't called is dead code. At least in our tests it isn't invoked.

I would like to remove the onComplete without onNext edge case as it isn't tested and also it is very confusing. Is there any reason why we should be so defensive?

@codefromthecrypt
Copy link
Contributor Author

build fail is likely the ever annoying hook registration leak thing, best I can guess. Not repeatable locally of course.. I'll try to use the exact commands circle uses..

@codefromthecrypt
Copy link
Contributor Author

I used the same test command as circleci locally and passed..

@codefromthecrypt
Copy link
Contributor Author

always when a flake it passes when build with no cache..

@simonbasle
Copy link
Contributor

@simonbasle @robotmrv I noticed in #1126 we started handling both onNext and onComplete, to finish a span. The code that handles onComplete when onNext isn't called is dead code. At least in our tests it isn't invoked.

I would like to remove the onComplete without onNext edge case as it isn't tested and also it is very confusing. Is there any reason why we should be so defensive?

When you look at the Flux or Mono level, not the Subscriber level, pipelines which end results appear empty are not that uncommon. Any reactive method that returns a Mono<Void> for instance...

That said, Sleuth works by wrapping Subscribers, which are the steps in one activation of the pipeline. The cases where such a step is empty are probably fewer. For instance, the Subscriber generated by ignoreElements(), despite producing a Flux that appears empty to the end consumer, still sees onNext() calls from its source. It just doesn't propagate them. If there is another operator after that, its Subscriber will only see an onComplete().

Another way of triggering this code path is by having an empty source:

Flux.empty()
    .map(i -> i)
    .filter(i -> true)
    .flatMap(i -> Mono.error(new RuntimeException("won't be thrown because no onNext"));

Hope this helps improving the tests/understanding when onComplete might matter...

It is probably true that the vast majority of Subscribers will not onComplete() without an onNext(), but when one considers the whole sequence (ie. the Flux or Mono) this cannot be considered an edge case. All Mon

@codefromthecrypt codefromthecrypt merged commit 51c9158 into 2.2.x Feb 24, 2020
@codefromthecrypt codefromthecrypt deleted the webclient-propagation branch February 24, 2020 06:34
@codefromthecrypt
Copy link
Contributor Author

@marcingrzejszczak PS since these tests are junit 4 we may need the junit-vintage-engine when you update to 5?

@codefromthecrypt
Copy link
Contributor Author

Thanks @simonbasle moved your comment here so it isn't lost #1570

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants