fix(tracing): use globalErrorHandler when flushing fails #1622

johanneswuerbach · 2020-10-24T19:10:31Z

Short description of the changes

Use the global error handler #1514 when span flushing fails in the BatchSpanProcessor instead of causing an unhandled rejection.

vmarchaud

I think you should add it there too https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/SimpleSpanProcessor.ts#L65 (for those we don't use the batch processor)

codecov · 2020-10-24T19:17:51Z

Codecov Report

Merging #1622 into master will increase coverage by 0.00%.
The diff coverage is 90.00%.

@@           Coverage Diff           @@
##           master    #1622   +/-   ##
=======================================
  Coverage   91.21%   91.22%           
=======================================
  Files         165      165           
  Lines        5064     5069    +5     
  Branches     1038     1039    +1     
=======================================
+ Hits         4619     4624    +5     
  Misses        445      445

Impacted Files	Coverage Δ
...telemetry-tracing/src/export/BatchSpanProcessor.ts	`92.18% <83.33%> (+0.25%)`	⬆️
...elemetry-tracing/src/export/SimpleSpanProcessor.ts	`85.18% <100.00%> (+1.85%)`	⬆️

johanneswuerbach · 2020-10-24T19:32:24Z

@vmarchaud something like this? You link points to the shutdown method, but the change in BatchSpanProcessor only affects the export. I also wrapped the export in the SimpleSpanProcessor or should I also wrap shutdown in both processors?

vmarchaud · 2020-10-24T19:51:39Z

@johanneswuerbach Yeah sorry i linked to wrong line :/
I dont think we need to wrap shutdown right now, at least no one complained about this behavior, we'll see in the future.

I'm good with the PR now, even though i would like that we report the actual error that the exporter had but for now this is fine.

vmarchaud · 2020-10-24T19:53:37Z

@open-telemetry/javascript-approvers I agree with @johanneswuerbach that this is important to fix and release asap, it's common best practices to exit the process when there is an unhandledRejection so most app will crash any time an exporter fails :/

obecny

I think we are changing the api of BatchSpanProcessor here which might not cover all user cases. If method returns the promise it should handle both cases resolve and reject. Forcing to use global handler is not necessary something someone might want. Not even mentioning that the result from export is gone. Using global handler should be something that user might opt to turn on or off instead of dropping "reject" from promise. If someone already build a mechanism to retry again in case result is FAILED_RETRYABLE the whole logic will be gone. I'm against of changing the "natural" api of Promise which is returned by this method as then this method will become useless for other cases (for example to build auto retry for FAILED_RETRYABLE result etc.)

johanneswuerbach · 2020-10-26T21:14:46Z

@obecny the problem in this case is that outside of forceFlush no method is actually exposing that state, so I would doubt that somebody build a (working) retry mechanism at this layer.

One call site where this method is currently called is:
https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L93, which is called by onEnd, which returns nothing https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L60. Another call site is https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L122, where the promise rejection is completely ignored and the result also not returned.

While I'm not sure what the desired future is, there is #1569, which sounds like retry should be implemented at the exporter layer and not here.

obecny · 2020-10-26T22:13:37Z

@obecny the problem in this case is that outside of forceFlush no method is actually exposing that state, so I would doubt that somebody build a (working) retry mechanism at this layer.

One call site where this method is currently called is:
https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L93, which is called by onEnd, which returns nothing https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L60. Another call site is https://github.com/open-telemetry/opentelemetry-js/blob/master/packages/opentelemetry-tracing/src/export/BatchSpanProcessor.ts#L122, where the promise rejection is completely ignored and the result also not returned.

While I'm not sure what the desired future is, there is #1569, which sounds like retry should be implemented at the exporter layer and not here.

then why not in line 93 do something like this:

this._flush().catch(e => {
  globalErrorHandler(.............
});

and then line 122 should also have the same what line 93.

the flush is used also in shutdown - after your change the shutdown will never raise an error.

johanneswuerbach · 2020-10-28T08:01:16Z

@vmarchaud / @dyladan would that also be okay? We generally see tracing as best-effort so we don't care about errors, but I'm happy to wrap the catch those errors in our apps.

dyladan · 2020-10-28T13:06:41Z

@vmarchaud / @dyladan would that also be okay? We generally see tracing as best-effort so we don't care about errors, but I'm happy to wrap the catch those errors in our apps.

I don't think @obecny is suggesting you wrap the error in your app, but wrap it in shutdown where _flush is called.

I think the way the PR has it now is fine. The global error handler changed error behavior in a lot of places already and this is just one more.

If someone already build a mechanism to retry again in case result is FAILED_RETRYABLE the whole logic will be gone.

@obecny note that the spec for span processors does not return the result type https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#interface-definition

Also see in the spec https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#exportbatch

Any retry logic that is required by the exporter is the responsibility of the exporter.
Returns: ExportResult:

ExportResult is one of:

Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server.

Failure - exporting failed. The batch must be dropped. For example, this can happen when the batch contains bad data and cannot be serialized.

The RETRYABLE vs NON_RETRYABLE failure has been removed from the spec completely and should be removed in another PR.

obecny · 2020-10-28T14:22:07Z

@vmarchaud / @dyladan would that also be okay? We generally see tracing as best-effort so we don't care about errors, but I'm happy to wrap the catch those errors in our apps.

I don't think @obecny is suggesting you wrap the error in your app, but wrap it in shutdown where _flush is called.

shutdown is called by "end user". Because the shutdown returns Promise the user should expect that this will be either resolved or rejected with information about error . If we change the logic that the shutdown will never be rejected then we are changing the api that someone might be already using differently or someone might want to take advantage of resolve/reject when building its own solution - we don't know it yet. So I'm against changing this behaviour just to silence the error.
It might be not the scope of this PR but maybe we should have a discussion how exactly we want to handle such cases. And then also think about our api changes if we want this or not. I don't want to decide in this particular case, that's why I'm suggesting to resolve this in a way that the original api of shutdown will still behave the same but for unhandled cases (line 93 the global error handler will be used).

As a user I would be really surprised why suddenly I don't see a bug from a method I just called that wasn't successful.

dyladan · 2020-10-28T14:25:11Z

@vmarchaud / @dyladan would that also be okay? We generally see tracing as best-effort so we don't care about errors, but I'm happy to wrap the catch those errors in our apps.

I don't think @obecny is suggesting you wrap the error in your app, but wrap it in shutdown where _flush is called.

shutdown is called by "end user". Because the shutdown returns Promise the user should expect that this will be either resolved or rejected with information about error . If we change the logic that the shutdown will never be rejected then we are changing the api that someone might be already using differently or someone might want to take advantage of resolve/reject when building its own solution - we don't know it yet. So I'm against changing this behaviour just to silence the error.
It might be not the scope of this PR but maybe we should have a discussion how exactly we want to handle such cases. And then also think about our api changes if we want this or not. I don't want to decide in this particular case, that's why I'm suggesting to resolve this in a way that the original api of shutdown will still behave the same but for unhandled cases (line 93 the global error handler will be used).

As a user I would be really surprised why suddenly I don't see a bug from a method I just called that wasn't successful.

👍 seems reasonable to me.

johanneswuerbach · 2020-10-28T16:57:05Z

Updated the PR, let me know if that looks better :-)

obecny

lgtm, thx for changes

johanneswuerbach · 2020-10-30T18:27:26Z

Would it be possible to get this into v0.12.1 patch release (happy to cherry-pick) or do we need to wait for v0.13.0?

johanneswuerbach requested review from dyladan, legendecas, markwolff, mayurkale22, mwear, naseemkullah, obecny, OlivierAlbertini and vmarchaud as code owners October 24, 2020 19:10

johanneswuerbach mentioned this pull request Oct 24, 2020

BatchSpanProcessor ExportResult other than SUCCESS causes UnhandledPromiseRejectionWarning: 1 #1617

Closed

vmarchaud reviewed Oct 24, 2020

View reviewed changes

johanneswuerbach force-pushed the tracing-unhandled-rejection branch from 741ab49 to 6364bc6 Compare October 24, 2020 19:31

fix(tracing): use globalErrorHandler when flushing fails

c52d478

johanneswuerbach force-pushed the tracing-unhandled-rejection branch from 6364bc6 to c52d478 Compare October 24, 2020 19:33

vmarchaud approved these changes Oct 24, 2020

View reviewed changes

dyladan approved these changes Oct 26, 2020

View reviewed changes

johanneswuerbach added 2 commits October 26, 2020 18:00

Merge branch 'master' into tracing-unhandled-rejection

b18e581

fix(tracing): issue after merge

06e8e77

obecny suggested changes Oct 26, 2020

View reviewed changes

dyladan mentioned this pull request Oct 28, 2020

Remove retryable failure from exporter.export #1629

Closed

fix(tracing): incorporate feedback

bfbdbf0

obecny approved these changes Oct 29, 2020

View reviewed changes

Merge branch 'master' into tracing-unhandled-rejection

64088df

obecny added the enhancement New feature or request label Oct 30, 2020

dyladan merged commit b523dab into open-telemetry:master Oct 30, 2020

johanneswuerbach deleted the tracing-unhandled-rejection branch October 30, 2020 15:11

pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this pull request Dec 15, 2023

chore(deps): update dependency prettier to v2.8.8 (open-telemetry#1622)

154b30b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tracing): use globalErrorHandler when flushing fails #1622

fix(tracing): use globalErrorHandler when flushing fails #1622

johanneswuerbach commented Oct 24, 2020

vmarchaud left a comment

codecov bot commented Oct 24, 2020 •

edited

Loading

johanneswuerbach commented Oct 24, 2020

vmarchaud commented Oct 24, 2020

vmarchaud commented Oct 24, 2020

obecny left a comment

johanneswuerbach commented Oct 26, 2020 •

edited

Loading

obecny commented Oct 26, 2020

johanneswuerbach commented Oct 28, 2020

dyladan commented Oct 28, 2020

obecny commented Oct 28, 2020

dyladan commented Oct 28, 2020

johanneswuerbach commented Oct 28, 2020

obecny left a comment

johanneswuerbach commented Oct 30, 2020

fix(tracing): use globalErrorHandler when flushing fails #1622

fix(tracing): use globalErrorHandler when flushing fails #1622

Conversation

johanneswuerbach commented Oct 24, 2020

Short description of the changes

vmarchaud left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 24, 2020 • edited Loading

Codecov Report

johanneswuerbach commented Oct 24, 2020

vmarchaud commented Oct 24, 2020

vmarchaud commented Oct 24, 2020

obecny left a comment

Choose a reason for hiding this comment

johanneswuerbach commented Oct 26, 2020 • edited Loading

obecny commented Oct 26, 2020

johanneswuerbach commented Oct 28, 2020

dyladan commented Oct 28, 2020

obecny commented Oct 28, 2020

dyladan commented Oct 28, 2020

johanneswuerbach commented Oct 28, 2020

obecny left a comment

Choose a reason for hiding this comment

johanneswuerbach commented Oct 30, 2020

codecov bot commented Oct 24, 2020 •

edited

Loading

johanneswuerbach commented Oct 26, 2020 •

edited

Loading