-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2193] Fail Azkaban job on Temporal Job Failure #4096
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
gobblin-temporal/src/test/java/org/apache/gobblin/temporal/yarn/YarnServiceTest.java:148
- The
testHandleJobFailureEvent
method should also test the scenario where the job state is successful.
}
...mporal/src/main/java/org/apache/gobblin/temporal/joblauncher/GobblinTemporalJobLauncher.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4096 +/- ##
============================================
+ Coverage 45.38% 51.49% +6.10%
- Complexity 3192 7566 +4374
============================================
Files 696 1388 +692
Lines 26628 52198 +25570
Branches 2655 5733 +3078
============================================
+ Hits 12085 26878 +14793
- Misses 13542 23026 +9484
- Partials 1001 2294 +1293 ☔ View full report in Codecov by Sentry. |
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/event/JobFailureEvent.java
Outdated
Show resolved
Hide resolved
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/event/JobFailureEvent.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Outdated
Show resolved
Hide resolved
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
Show resolved
Hide resolved
...mporal/src/main/java/org/apache/gobblin/temporal/joblauncher/GobblinTemporalJobLauncher.java
Outdated
Show resolved
Hide resolved
...mporal/src/main/java/org/apache/gobblin/temporal/joblauncher/GobblinTemporalJobLauncher.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Outdated
Show resolved
Hide resolved
gobblin-temporal/src/test/java/org/apache/gobblin/temporal/yarn/YarnServiceTest.java
Outdated
Show resolved
Hide resolved
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/GobblinYarnAppLauncher.java
Show resolved
Hide resolved
gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/YarnService.java
Outdated
Show resolved
Hide resolved
if (this.jobSummaryEvent.getJobState() != null && !this.jobSummaryEvent.getJobState().getState().isSuccess()) { | ||
this.amrmClientAsync.unregisterApplicationMaster(FinalApplicationStatus.FAILED, this.jobSummaryEvent.getIssuesSummary(), null); | ||
} else { | ||
this.amrmClientAsync.unregisterApplicationMaster(FinalApplicationStatus.SUCCEEDED, StringUtils.defaultString(this.jobSummaryEvent.getIssuesSummary()), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.jobSummaryEvent.getIssuesSummary()
wouldn't be null, right? since getIssuesSummary()
returns an empty string as default. if yes, it is fine to use StringUtils.defaultString
but we should use for both statuses or not use it at all
TextStringBuilder sb = new TextStringBuilder(); | ||
try { | ||
List<Issue> issues = this.getIssueRepository().getAll(); | ||
if (issues.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use issues.isEmpty()
LOGGER.error("Gobblin Yarn application failed for the following reason: " + applicationReport.getDiagnostics()); | ||
applicationFailed = true; | ||
LOGGER.error("Gobblin Yarn application failed because of the following issues: " + applicationReport.getDiagnostics()); | ||
} else if (StringUtils.isNotBlank(applicationReport.getDiagnostics())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed as it's not useful to have diagnostics for success cases, these are mostly task failures which have already been retried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be useful in case where no work units were generated as in that cases job always succeeds so it will be easier to know directly.
synchronized (this.applicationDone) { | ||
while (!this.applicationCompleted) { | ||
try { | ||
this.applicationDone.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be simpler and cleaner to use CountDownLatch
instead of explicit synchronization with synchronized, wait(), and notify()
throw new RuntimeException("Gobblin Yarn application failed"); | ||
} | ||
} catch (InterruptedException ie) { | ||
LOGGER.error("Interrupted while waiting for the Gobblin Yarn application to finish", ie); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw exception?
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Currently when the temporal job running on Yarn fails, we don't propagate the error back to Azkaban job which launches the Yarn Application.
The change here bubbles the issues encountered when the job fails upto the GobblinYarnAppLaucher run by the Azkaban job and fails with a RuntimeException after logging the issues summary.
Tests
Tested manually
Commits