-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add customizable global error handler #1419
Add customizable global error handler #1419
Conversation
|
I don't think this belongs in the API. The API is reserved for instrumentation concerns. To me, this is very clearly an operator concern, and should only be a part of the SDK. |
I see the point on why @jkwatson doesn't want it here - in the end, we probably don't event want to have errors reported when the API alone is used. The fact we are consuming such error handler from SDK-level components seem to add weight to that argument. Personally I'd be fine with adding this very same concept to either Also, I'm wondering whether we need the factory for the error. I imagine users setting such handler through code directly (when configuring the SDK itself, such as the java-instrumentation code would do), but I'm not totally sure we need a factory here. |
I'm very happy to have an error handling mechanism baked into the SDK, definitely. |
f9a9a4a
to
07ff0d6
Compare
Codecov Report
@@ Coverage Diff @@
## master #1419 +/- ##
============================================
- Coverage 92.28% 92.22% -0.06%
- Complexity 898 904 +6
============================================
Files 114 116 +2
Lines 3241 3256 +15
Branches 264 264
============================================
+ Hits 2991 3003 +12
- Misses 170 171 +1
- Partials 80 82 +2
Continue to review full report at Codecov.
|
I agree that it makes sense for this to be in the SDK - I've updated this PR to do so, and also took @carlosalberto's suggestion to use a code setter instead of a factory. |
@@ -0,0 +1,51 @@ | |||
/* | |||
* Copyright 2019, OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 2020
@@ -0,0 +1,27 @@ | |||
/* | |||
* Copyright 2019, OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here the year
@@ -0,0 +1,35 @@ | |||
/* | |||
* Copyright 2019, OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
* Default implementation of {@link ErrorHandler}. By default, logs the full string of the error | ||
* with a level of "Warning". | ||
* | ||
* @since 0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @since 0.1.0 | |
* @since 0.7.0 |
* | ||
* @see OpenTelemetry | ||
*/ | ||
@ThreadSafe | ||
public final class OpenTelemetrySdk { | ||
private static final Object mutex = new Object(); | ||
|
||
@Nullable private static volatile OpenTelemetrySdk instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than introducing a whole instance and double-checked locking in the getInstance, why not just have the errorHandler be static, since it's global anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the handler being static (we don't need anything fancy here, at least for now).
} | ||
|
||
@Override | ||
public void handle(OpenTelemetryException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious do we have a story about controlling the default behavior based on the type of exception? A level flag to `OpenTelemetryException?
The context is I am looking at null / arg checks which we should remove to abide by the error handling specs, that the SDK should not throw errors due to incorrect usage by a user. I think for these logging by default would be problematic since if a simple method like .setAttribute
was logging it could easily be enough to bring down the app not so unlike an actual exception. But naturally for less common / sporadic errors like export error it would generally be ok to log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is basically an API for error logging, I'm starting to wonder if adding this additional error handler is overkill. Couldn't users just specify a custom logger if they want special behavior, outside the normal java.util.logging behavior?
That is to say, if we stipulate the all SDK logging will be in the io.opentelemetry.sdk
namespace, can't we just use normal logging facilities to handle the global error handler use-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context is I am looking at null / arg checks which we should remove to abide by the error handling specs, that the SDK should not throw errors due to incorrect usage by a user.
I personally think we shouldn't log wrong parameters, if that's what you are asking. Then again, other implementations log such errors (the Python one does, for example). But we should probably discuss that in the Specification call tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkwatson What if the user wants to do something with the error, other than logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the logging Handler
implementation can do whatever it wants, right? It's not required to do any actual logging, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if the user wants to do something other than simple logging? Probably we should look for cases where simple logging is not the principal use case (to go one way or another).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by 'user', do you mean the SDK author generating the "errors" or, the user of the SDK who is doing something with them? A java.util.logging.Handler, I think, can do anything it wants...it doesn't have to be "simple logging".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have mixed feelings about using the Java's logging facility. In theory having a handler is a straightforward, super simple thing to do, albeit limited, while the logging path includes dealing with an entire LogRecord object (and what to expect to handle there? Throwable, message? parameters?), where the user has to override three abstract methods for Handler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not convinced it was the right way to go, but rather than re-invent an internal logging API, I'd rather use something off the shelf. I can also see arguments for not wanting to generate an entire stack trace just to report some sort of SDK behavior...in which case we're getting dangerously close to implementing a logging API. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jumping in with my two-cents here…
Taking a step back, I think the the high-level goal here is that we want it to be easy for users of the SDK to get visibility into any errors that occur during setup or at runtime that prevent the expected tracing or metrics information from being exported.
I think that three things that are very important for achieving this high-level goal are:
(1) Making it easy for the SDK authors to report errors in a standardized way.
(2) Making it easy for the SDK user (not the SDK author) to have control over where errors get logged and which types of errors get logged
(3) Having a sane default so that important errors do not get squashed when someone is trying to set up OpenTelemetry.
I think that both the solution outlined in this PR and using Java’s built-in logging facility technically achieve all of these goals, but I do agree with @carlosalberto that Java’s built-in logging facility potentially adds friction that makes the whole process clunkier to work with. A user has to implement three abstract methods, it’s harder to transmit information about where an error came from and switch on that error, and it’s potentially less obvious to the SDK author that calling logger.warning
will be correctly handled and propagated to the right place.
Having said that, I do agree with the notion that we shouldn’t be trying to implement our own logging API from scratch. If we feel like we need a lot of the bells and whistles that the Java logging library provides for our use cases, then I think that we should just go ahead and use that. If we only need a small subset of them, then I think that the overhead of building and maintaining them is worth the increased clarity and ease-of-use that this solution brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents here...
There is something wrong with OpenTelemetry if an "observability" library goes back to error handling and intensive logging to debug it, why do we have this library then? Why do we ask users to use this library if we then tell them to "handle" our errors.
I know that there is a known problem of how to monitor your monitoring tools, but I feel that we should try to do a smarter thing than this, using metrics, using tracing, and only when something is fatal and we cannot do anything to recover inform the user.
I agree that we should have great defaults so users shouldn't have to do anything in general. Some users may want control but if the defaults are good enough, maybe not. Putting on the user hat, if I had the ability to, I would want Unexpected, recoverable / transient errors logged as warning If otel provided a flag that I could enable in tests for the extra behavior there, then I'd just use it. If there was a error handler, I'd implement my own flag using it. If it was a jul handler, it would take me some more time to understand how to use jul, but I assume it would work too. None of these options feel bad to me. And if there's no way to have a flag for unit tests, I don't think I'd feel significantly bothered though there probably will be points where I wonder why a tag is missing and the debug to find a null value takes longer than it could have. |
Interesting points @bogdandrutu To me, it feels like an (absolutely beautiful) utopia to try to monitor OpenTelemetry with itself (If I got it right from your comment) - what will you do if you are stuck exporting data, for example? Where can you report such errors as traces/metrics? Also, there's the case of local errors that could easily be diagnosed, without need to do fancy work. As the description of this PR mentions, Go already such error handler. Would you be against such approach? Asking also as the idea is to have something like this in the Specification itself ;) Finally, as we already do logging for many errors, what do you think about going for @jkwatson's idea about 'formalizing' the usage of Java logging facility to make this possible? Would you be fine with that? |
Ping @bogdandrutu ;) |
After discussion at the spec SIG this week and in the Otel spec issue, it looks like the best approach here is to just leverage the From what I understand, we actually don't need to make any code changes for this (besides actually adding additional log messages where they are missing now). However, I'd like to add documentation with examples of how the logger can be customized for those unfamiliar with With this, OpenTelemetry authors can continue to use Does this all sound like a reasonable way to move forward? @carlosalberto @bogdandrutu @anuraaga @jkwatson |
I took a stab at adding some documentation for the java logger here - comments are appreciated! |
With #1501 merged, can we close this for now? If we run into a case that logging really can't handle, we can re-open this idea. |
This is a POC solution to open-telemetry/opentelemetry-specification#696
Much of the inspiration for this PR came from the respective implementation in Go here
Adds an error handler delegate to
OpenTelemetry
, which can handle errors through a call toOpenTelemetry.handleError
.The delegate accepts errors in the form of exceptions of type
OpenTelemetryException
, which most often are constructed at the time of an error by wrapping an exception with some additional information about where it occurred. By default, the delegate logs a warning using the native Java logger. A custom delegate can be registered by providing aErrorHandlerFactory
class, in a similar way that a customTraceProvider
can be registered.This PR also updates several locations (primarily in various exporters) to handle an error using the global handler. Some locations I have added net-new reporting, and others I have just replaced the
log.warn
message that previously existed.