Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Global Error Handler to SDK #696

Closed
MitchellDumovic opened this issue Jul 9, 2020 · 14 comments
Closed

Add Global Error Handler to SDK #696

MitchellDumovic opened this issue Jul 9, 2020 · 14 comments
Assignees
Labels
area:sdk Related to the SDK priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA

Comments

@MitchellDumovic
Copy link
Contributor

Users should be able to configure custom error handlers for errors all throughout their OpenTelemetry projects.

There is already demand for this feature - see an identical issue for the Go SDK here along with the corresponding implementation.

@carlosalberto carlosalberto added spec:trace Related to the specification/trace directory area:sdk Related to the SDK area:error-reporting Related to error reporting labels Jul 9, 2020
@arminru
Copy link
Member

arminru commented Jul 10, 2020

@carlosalberto This is not about instrumentations reporting application errors but about OpenTelemetry itself reporting usage and internal errors to instrumentations/users, something that's easy to mix up in this domain. We had a separate label error-handling for that on #141 and #153 but it was deleted apparently. Should we bring it back?

@arminru arminru removed area:error-reporting Related to error reporting spec:trace Related to the specification/trace directory labels Jul 10, 2020
@carlosalberto carlosalberto added release:after-ga Not required before GA release, and not going to work on before GA release:required-for-ga Must be resolved before GA release, or nice to have before GA and removed release:after-ga Not required before GA release, and not going to work on before GA labels Jul 10, 2020
@tedsuo
Copy link
Contributor

tedsuo commented Jul 14, 2020

At the spec meeting this morning, there was consensus that this would be an excellent feature, and we would like to require it for GA. The request was to simply define an error reporting delegate pattern, which defaults to logging to standard out, or the equivalent default in every language.

There will be other details we may want to standardize on, but error handling is different enough between languages that we would like to start with this simple specification, and add details later as it becomes clear what the value would be.

@MitchellDumovic MitchellDumovic changed the title Add Global Error Handler to Tracing SDK Add Global Error Handler to SDK Jul 16, 2020
@carlosalberto carlosalberto added the priority:p2 Medium priority level label Jul 24, 2020
@MitchellDumovic
Copy link
Contributor Author

Discussed this issue again at the spec SIG this morning. We agreed that this would be valuable, but want to be sure that

(1) We aren't reinventing a logging library where one already exists (use whatever is idiomatic in the language)
(2) We only leverage this error handler on errors that it is not possible for us to recover from

I will draft a change to the spec to reflect these points and will update my open PR in Java to leverage the existing logging library.

@MrAlias
Copy link
Contributor

MrAlias commented Jul 28, 2020

One thing to make sure is clear is that in the Go implementation there is a global implementation of the error handler so the instrumenter can use the error handling prior to an SDK being registered. Similar to a global metric or trace handler or propagator.

Also, to capture the point from the spec meeting this morning: the error handling design should be customizable by the user. If the instrumentor, or even the operator wants to write a more advanced error handler they can do so and register it. This means the user can build out as complex or as simple of an error handler as they want and by default the default global handler just handles error in the simplest way while remaining idiomatic to the language. Additionally, the spec should include some language about how only non-recoverable errors should be sent to the handler. If there is defined fallback actions (like an empty tracer name being replaced with a default) the action should be preformed and no error sent.

@MitchellDumovic
Copy link
Contributor Author

@MrAlias It seems like you're suggesting that the error handler shouldn't be global at the SDK level, but at the API level, is that correct? I'm asking because in my original PR for Java I had added the error handler to the API and there was some pushback.

cc @jkwatson @carlosalberto

@Oberon00
Copy link
Member

The title says SDK and that would seem logical to me too. All other configuration is also in the SDK.

@carlosalberto
Copy link
Contributor

I'd be up for putting it in the SDK too (I don't imagine unrecoverable scenarios with the API being the only component). Any opinion on this specific part @MrAlias ?

@jkwatson
Copy link
Contributor

I don't think we should add to the API surface by adding this to the API specification. SDK: maybe

@MrAlias
Copy link
Contributor

MrAlias commented Jul 30, 2020

... the error handler shouldn't be global at the SDK level, but at the API level, is that correct?

Allowing instrumentation authors access to the global error handler seems needed. They as well might need to send unrecoverable errors to the handler. This means that the error handler will need to be something that can be register in the global scope, similar to a generic tracer, meter, or propagator interface. Additionally, a service operator needs the ability to customize the handling of errors. Meaning they will need to register their implementation of the error handler in the global scope. Because of these uses cases it seems to follow that the error handler needs to be decoupled from the SDK (the instrumentation author will not have an instance of the SDK) and be visible at the global scope.

I agree having the error handler be a part of the API seems incorrect and we shouldn't do that. It will pollute the API with implementation details not pertaining to telemetry. Also, it seems to follow that it should not be specified as part of the SDK. Instead it seems like the error handler should be included in the language in a language specific way. This means it would be up to the SIG to defined it at an appropriate hierarchical level, not muddy the API or SDK, and serve all the functional use cases for an instrumentation author or service operator.

@carlosalberto
Copy link
Contributor

Also, it seems to follow that it should not be specified as part of the SDK. Instead it seems like the error handler should be included in the language in a language specific way. This means it would be up to the SIG to defined it at an appropriate hierarchical level, not muddy the API or SDK, and serve all the functional use cases for an instrumentation author or service operator.

Sounds great to me.

@Oberon00
Copy link
Member

Oberon00 commented Jul 31, 2020

While that does sound reasonable, I'm not sure what the conclusion is from that? Specify nothing? Specify only very vaguely that language implementations should allow users to be notified of internal errors? EDIT: Are you suggesting a separate artifact for the error handler that the API and SDK depend on? Maybe also the propagation layer? If it's that generic, you can as well use a logger framework, those also allow installing custom handlers.

@carlosalberto
Copy link
Contributor

While that does sound reasonable, I'm not sure what the conclusion is from that? Specify nothing? Specify only very vaguely that language implementations should allow users to be notified of internal errors?

Good point on not being clear. My understanding is that we should add a section on a recommended global handler, what is well integrated with the language. If there's one already, great, make the SDK and other components report to it; else, define one (like the Go one).

@tigrannajaryan
Copy link
Member

Is this solved by #757 or we need something more?

@MitchellDumovic
Copy link
Contributor Author

I believe that this should be resolved by that PR, thanks for pointing that out! Will close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA
Projects
None yet
Development

No branches or pull requests

8 participants