Fixes WebClient propagation errors and adds Brave Tests #1562

codefromthecrypt · 2020-02-14T19:50:28Z

In many places, the trace context of callbacks was accidentally set to
the client span, not the invocation context. I noticed a hack trying to
work around this. This code fixes all the problems around context. It
also removes some sporadic logging, which was only applied to a few
hooks.

Finally, this adds Brave tests which would have caught the problems
earlier. Notably, there is still more work to do as this will not help
with duplicate instrumentation, which is normal when reactor-netty is
the WebClient's HTTP connector.

In many places, the trace context of callbacks was accidentally set to the client span, not the invocation context. I noticed a hack trying to work around this. This code fixes all the problems around context. It also removes some sporadic logging, which was only applied to a few hooks. Finally, this adds Brave tests which would have caught the problems earlier. Notably, there is still more work to do as this will not help with duplicate instrumentation, which is normal when reactor-netty is the WebClient's HTTP connector.

codefromthecrypt · 2020-02-14T19:51:34Z

.../org/springframework/cloud/sleuth/instrument/web/client/TraceWebClientBeanPostProcessor.java

-		MonoWebClientTrace trace = new MonoWebClientTrace(next, wrapper.buildRequest(),
-				this, span);
-		// TODO: investigate why this commit leaks a scope:
-		// 8f5bcdabd7af23df443e771432eb85597f3b3076


this fixed one part but not the others.. the proper fix is this PR

codecov · 2020-02-14T20:02:49Z

Codecov Report

Merging #1562 into 2.2.x will increase coverage by 0.01%.
The diff coverage is 83.92%.

@@             Coverage Diff             @@
##             2.2.x    #1562      +/-   ##
===========================================
+ Coverage       60%   60.01%   +0.01%     
- Complexity     864      865       +1     
===========================================
  Files          159      159              
  Lines         4720     4674      -46     
  Branches       536      524      -12     
===========================================
- Hits          2832     2805      -27     
+ Misses        1637     1626      -11     
+ Partials       251      243       -8

Impacted Files	Coverage Δ	Complexity Δ
...nt/web/client/TraceWebClientBeanPostProcessor.java	`82.53% <83.92%> (+5.21%)`	`11 <0> (ø)`	⬇️
.../sender/ZipkinRestTemplateSenderConfiguration.java	`79.36% <0%> (+3.17%)`	`4% <0%> (ø)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ca616...6b2541b. Read the comment docs.

codefromthecrypt · 2020-02-14T20:12:49Z

@simonbasle @robotmrv I noticed in #1126 we started handling both onNext and onComplete, to finish a span. The code that handles onComplete when onNext isn't called is dead code. At least in our tests it isn't invoked.

I would like to remove the onComplete without onNext edge case as it isn't tested and also it is very confusing. Is there any reason why we should be so defensive?

codefromthecrypt · 2020-02-14T21:49:47Z

build fail is likely the ever annoying hook registration leak thing, best I can guess. Not repeatable locally of course.. I'll try to use the exact commands circle uses..

codefromthecrypt · 2020-02-14T21:55:48Z

I used the same test command as circleci locally and passed..

codefromthecrypt · 2020-02-14T22:30:15Z

always when a flake it passes when build with no cache..

simonbasle · 2020-02-15T09:45:06Z

@simonbasle @robotmrv I noticed in #1126 we started handling both onNext and onComplete, to finish a span. The code that handles onComplete when onNext isn't called is dead code. At least in our tests it isn't invoked.

I would like to remove the onComplete without onNext edge case as it isn't tested and also it is very confusing. Is there any reason why we should be so defensive?

When you look at the Flux or Mono level, not the Subscriber level, pipelines which end results appear empty are not that uncommon. Any reactive method that returns a Mono<Void> for instance...

That said, Sleuth works by wrapping Subscribers, which are the steps in one activation of the pipeline. The cases where such a step is empty are probably fewer. For instance, the Subscriber generated by ignoreElements(), despite producing a Flux that appears empty to the end consumer, still sees onNext() calls from its source. It just doesn't propagate them. If there is another operator after that, its Subscriber will only see an onComplete().

Another way of triggering this code path is by having an empty source:

Flux.empty()
    .map(i -> i)
    .filter(i -> true)
    .flatMap(i -> Mono.error(new RuntimeException("won't be thrown because no onNext"));

Hope this helps improving the tests/understanding when onComplete might matter...

It is probably true that the vast majority of Subscribers will not onComplete() without an onNext(), but when one considers the whole sequence (ie. the Flux or Mono) this cannot be considered an edge case. All Mon

codefromthecrypt · 2020-02-24T06:36:46Z

@marcingrzejszczak PS since these tests are junit 4 we may need the junit-vintage-engine when you update to 5?

codefromthecrypt · 2020-02-24T06:38:09Z

Thanks @simonbasle moved your comment here so it isn't lost #1570

codefromthecrypt commented Feb 14, 2020

View reviewed changes

spring-projects-issues added the waiting-for-triage label Feb 14, 2020

format

be05066

subscription can cancel, and then it wouldn't hit the other hooks

0511e4f

defensively uses hook resetting

6b2541b

codefromthecrypt merged commit 51c9158 into 2.2.x Feb 24, 2020

codefromthecrypt deleted the webclient-propagation branch February 24, 2020 06:34

codefromthecrypt mentioned this pull request Feb 24, 2020

Backfill test about reactor edge case #1570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes WebClient propagation errors and adds Brave Tests #1562

Fixes WebClient propagation errors and adds Brave Tests #1562

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt Feb 14, 2020

Uh oh!

codecov bot commented Feb 14, 2020 •

edited

Loading

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

simonbasle commented Feb 15, 2020

Uh oh!

codefromthecrypt commented Feb 24, 2020

Uh oh!

codefromthecrypt commented Feb 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes WebClient propagation errors and adds Brave Tests #1562

Fixes WebClient propagation errors and adds Brave Tests #1562

Uh oh!

Conversation

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt Feb 14, 2020

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

codefromthecrypt commented Feb 14, 2020

Uh oh!

simonbasle commented Feb 15, 2020

Uh oh!

codefromthecrypt commented Feb 24, 2020

Uh oh!

codefromthecrypt commented Feb 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 14, 2020 •

edited

Loading