-
Notifications
You must be signed in to change notification settings - Fork 331
JIRA: SAMZA-1733 Populating ListGauge metric using DiagnosticsAppender for exceptions #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pshot reporter. Switched producer-shutdown logic to ensure metric-flush at shutdown-time
…lt policy to retain last N
Added exception metric (as string-guage), to be emitted using the snapshot reporter. Switched producer-shutdown logic to ensure metric-flush at shutdown-time Adding ListGauge and integration with SamzaContainerMetric Typifying ListGauge to ListGauge<T>, adding an eviction policy, default policy to retain last N
…mza container metric) using DiagnosticsAppender
vjagadish1989
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be helpful to wrap up the original ListGauge PR first so that this is simpler to compare and review.
|
|
||
| def startDiagnostics { | ||
| // TODO: where should this reside, MetricConfig? Log4jSystemConfig? or a new separate config? | ||
| val DIAGNOSTICS_APPENDER_ENABLE = "diagnostics.appender.enable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/diagnostics.appender.enable/samza.diagnostics.enabled
| return timestamp; | ||
| } | ||
|
|
||
| public void setTimestamp(long timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unnecessary to expose setters
Removing ValueInfo class and moving TimestampedValue to samza-api util Adding TimestampedValue.java in core
…ic into listgauge
This PR introduces a ListGauge type, A subsequent PR: #543 shows how this issued in conjunction with a diagnostics appender for error-tracking and dumpting to kafka via SnapshotReporter. Author: Ray Matharu <[email protected]> Reviewers: Jagadish <[email protected]>, Cameron Lee <[email protected]> Closes #541 from rayman7718/listgauge
This PR introduces a ListGauge type, A subsequent PR: apache#543 shows how this issued in conjunction with a diagnostics appender for error-tracking and dumpting to kafka via SnapshotReporter. Author: Ray Matharu <[email protected]> Reviewers: Jagadish <[email protected]>, Cameron Lee <[email protected]> Closes apache#541 from rayman7718/listgauge
nickpan47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fist pass, have a few questions. Will discuss face-to-face tomorrow.
| } | ||
|
|
||
| def startDiagnostics { | ||
| // TODO: where should this reside, MetricConfig? Log4jSystemConfig? or a new separate config? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: is this a per job configuration? If yes, it would probably be JobConfig and should be job.diagnostics.enabled.
| */ | ||
| public class DiagnosticsAppender extends AppenderSkeleton { | ||
|
|
||
| private final Logger logger = LoggerFactory.getLogger(this.getClass().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been using the code convention like: logger = LoggerFactory.getLog(DiagnosticsAppender.class.getName()).
| // if an event with a non-null throwable is received => exception event | ||
| if (loggingEvent.getThrowableInformation() != null) { | ||
| DiagnosticsExceptionEvent diagnosticsExceptionEvent = | ||
| new DiagnosticsExceptionEvent(loggingEvent.timeStamp, loggingEvent.getThrowableInformation()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how do we get the context info regarding to the exception? Through the stack trace? How do we add the logic component context like sub-query, task instance, container, and thread context?
| public class DiagnosticsAppender extends AppenderSkeleton { | ||
|
|
||
| private final Logger logger = LoggerFactory.getLogger(this.getClass().getName()); | ||
| private final ListGauge<DiagnosticsExceptionEvent> samzaContainerExceptionMetric; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how is this metric serialized and sent to diagnostic topic? If yes, shouldn't DiagnosticsExceptionEvent be serializable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing StreamAppender implementation uses LoggingEventJsonSerde to serialize the loggingEvent. Would be better to use the same if it satisfies all the requirement here(or add fields ContainerContext and operatorContext if we need it).
| */ | ||
| package org.apache.samza.diagnostics; | ||
|
|
||
| import org.apache.log4j.spi.ThrowableInformation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better not to depend on log4j specific objects in your model. If we choose to update to log4j2 in samza or switch to some other logging library(logback2), then this has to be changed(along with Appender implementation).
| // TODO: where should this reside, MetricConfig? Log4jSystemConfig? or a new separate config? | ||
| val DIAGNOSTICS_APPENDER_ENABLE = "samza.diagnostics.enabled" | ||
| if (containerContext.config.getBoolean(DIAGNOSTICS_APPENDER_ENABLE, false)) { | ||
| import org.apache.log4j.Logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Better to move the imports to the top.
| */ | ||
| public class DiagnosticsAppender extends AppenderSkeleton { | ||
|
|
||
| private final Logger logger = LoggerFactory.getLogger(this.getClass().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good practice to make the logger variable static and share it across all instances of the class.
| * When used inconjunction with {@link org.apache.samza.metrics.reporter.MetricsSnapshotReporter} provides a | ||
| * stream of diagnostics-related events. | ||
| */ | ||
| public class DiagnosticsAppender extends AppenderSkeleton { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Do we need to have separate implementation for samza jobs using log4j2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same implementation can be adopted, as of log4j2.4.
| public class DiagnosticsAppender extends AppenderSkeleton { | ||
|
|
||
| private final Logger logger = LoggerFactory.getLogger(this.getClass().getName()); | ||
| private final ListGauge<DiagnosticsExceptionEvent> samzaContainerExceptionMetric; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing StreamAppender implementation uses LoggingEventJsonSerde to serialize the loggingEvent. Would be better to use the same if it satisfies all the requirement here(or add fields ContainerContext and operatorContext if we need it).
|
|
||
| info("Starting Diagnostics Appender.") | ||
| val diagnosticsAppender = new DiagnosticsAppender(this.metrics) | ||
| rootLogger.addAppender(diagnosticsAppender) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In standalone, there're plans to run more than one StreamProcessor in a jvm(which translates to more than one SamzaContainer thread in jvm).
So it will be better to add diagnosticsAppender to rootLogger if it is not already added.
|
Made the following changes:
|
nickpan47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rayman7718 thanks for the update! I have a few comments regarding the usage of log4j in Samza code base. Please check. Thanks!
| import org.apache.samza.util._ | ||
| import org.apache.samza.{SamzaContainerStatus, SamzaException} | ||
| import org.apache.samza.diagnostics.DiagnosticsAppender; | ||
| import org.apache.log4j.Logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not using log4j directly in samza-core. We always use slf4j libraries in all our source code, except for the samza-log4j module.
| if (containerContext.config.getDiagnosticsEnabled) { | ||
| val rootLogger = Logger.getRootLogger | ||
|
|
||
| if (rootLogger.getAppender(classOf[DiagnosticsAppender].getName) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in log4j2, the API to get appender dynamically also changed: https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/config/Configuration.html#getAppender(java.lang.String). Please check w/ @PawasChhokra to see what's the best way to make sure the code here is log4j version agnostic.
| if (containerContext.config.getDiagnosticsEnabled) { | ||
| val rootLogger = Logger.getRootLogger | ||
|
|
||
| if (rootLogger.getAppender(classOf[DiagnosticsAppender].getName) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this condition be (rootLogger.getAppender(classOf[DiagnosticsAppender].getName) != null)?
| import java.util.Collections | ||
| import java.util.HashMap | ||
| import java.util.Map | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whitespace change should be avoided.
|
|
||
| def getAsMap(): Map[String, Map[String, Object]] = Collections.unmodifiableMap(immutableMetrics) | ||
|
|
||
| def this() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a code search and couldn't find any reference to this default constructor. What's the reason to add it?
|
|
||
| package org.apache.samza.metrics.reporter | ||
|
|
||
| import java.util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be redundant, if we remove the prefix to HashMap in line 59.
|
nickpan47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rayman7718 thanks a lot for the updates. Still have a few minor comments, but looks good overall.
|
|
||
| def getDiagnosticsEnabled = { getBoolean(JobConfig.JOB_DIAGNOSTICS_ENABLED, false) } | ||
|
|
||
| def getDiagnosticsAppenderClass = { getOption(JobConfig.DIAGNOSTICS_APPENDER_CLASS) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would make sense to provide a default class name here as well, for log4j 1.2.x (since in open source, we still depend on log4j 1.2.x).
| def startDiagnostics { | ||
| if (containerContext.config.getDiagnosticsEnabled) { | ||
| info("Starting diagnostics.") | ||
| val diagnosticsAppender = Class.forName(containerContext.config.getDiagnosticsAppenderClass.get). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to capture the class loading exceptions and log a line here, then throw them as ConfigException(msg, e).
| this.setName(SimpleDiagnosticsAppender.class.getName()); | ||
|
|
||
| // ensure appender is attached only once per JVM (regardless of #containers) | ||
| if (org.apache.log4j.Logger.getRootLogger().getAppender(SimpleDiagnosticsAppender.class.getName()) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realize: is this code section safe for multi-threaded environment? Ideally, we should make it threadsafe. If not, we better make 100% sure that the loading of this appender only happens in one single thread and make the usage documentation here very clear.
…nder instantiation, adding default value in jobConfig
|
nickpan47
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Feel free to check after fix the minor comments.
| def getDiagnosticsEnabled = { getBoolean(JobConfig.JOB_DIAGNOSTICS_ENABLED, false) } | ||
|
|
||
| def getDiagnosticsAppenderClass = { | ||
| getOrDefault(JobConfig.DIAGNOSTICS_APPENDER_CLASS, "org.apache.samza.logging.log4j.SimpleDiagnosticsAppender") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: define a static constant for the default value in the class.
| catch { | ||
| case e@(_: ClassNotFoundException | _: InstantiationException | _: InvocationTargetException) => { | ||
| error("Failed to instantiate diagnostic appender", e) | ||
| throw new ConfigException("Failed to instantiate diagnostic appender class specified in config", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add the configured DiagnosticsAppenderClass name in the log and exception would make it easier for debugging.
This PR shows how the ListGauge can be used to emit exceptions using a DiagnosticsAppender.