-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails to resume if controller crashes before queue is saved #408
Build fails to resume if controller crashes before queue is saved #408
Conversation
try (var tailLog = new TailLog(rjr, "p", 1)) { | ||
iar.createAgent(rjr, InboundAgentRule.Options.newBuilder().name("J").label("mib").color(PrefixedOutputStream.Color.YELLOW).webSocket().build()); | ||
rjr.runRemotely(ExecutorStepDynamicContextRJRTest::setupJobAndStart); | ||
rjr.stopJenkinsForcibly(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same test passes with
rjr.stopJenkins();
@@ -67,7 +67,7 @@ | |||
<properties> | |||
<changelist>999999-SNAPSHOT</changelist> | |||
<!-- TODO Until in plugin-pom --> | |||
<jenkins-test-harness.version>2182.v0138ccb_c0b_cb_</jenkins-test-harness.version> | |||
<jenkins-test-harness.version>2357.vf2a_982b_b_910f</jenkins-test-harness.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need jenkinsci/jenkins-test-harness#876 for RealJenkinsRule#stopJenkinsForcibly
...est/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepDynamicContextRJRTest.java
Outdated
Show resolved
Hide resolved
...est/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepDynamicContextRJRTest.java
Outdated
Show resolved
Hide resolved
...est/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepDynamicContextRJRTest.java
Outdated
Show resolved
Hide resolved
The given test sometimes succeeds... |
The case where it succeeds is when the stop happens before entering the second node step (still in |
src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Jesse Glick <[email protected]>
(For reference, this is a corner case which would previously trigger |
if (flowNode == null) { | ||
LOGGER.fine(() -> "No FlowNode found for node block " + getContext() + "; can't recover" ); | ||
} else { | ||
var action = flowNode.getAction(QueueItemActionImpl.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remind myself: this would have been added right away in
Lines 132 to 139 in 6ae8123
public boolean start() throws Exception { | |
final PlaceholderTask task = new PlaceholderTask(getContext(), step.getLabel()); | |
Queue.WaitingItem waitingItem = Queue.getInstance().schedule2(task, 0).getCreateItem(); | |
if (waitingItem == null) { | |
// There can be no duplicates. But could be refused if a QueueDecisionHandler rejects it for some odd reason. | |
throw new IllegalStateException("failed to schedule task"); | |
} | |
getContext().get(FlowNode.class).addAction(new QueueItemActionImpl(waitingItem.getId())); |
src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
Show resolved
Hide resolved
src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecutionRJRTest.java
Show resolved
Hide resolved
src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecutionRJRTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecutionRJRTest.java
Outdated
Show resolved
Hide resolved
await("Waiting for agent J to reconnect").atMost(Duration.ofSeconds(30)).until(() -> r.jenkins.getComputer("J").isOnline(), is(true)); | ||
var actions = await().until(() -> b.getActions(InputAction.class), allOf(iterableWithSize(1), hasItem(new InputActionWithId("Branch1")))); | ||
proceed(actions, "Branch1", p.getName() + "#" + b.number); | ||
// This is quicker than waitForMessage that can wait for up to 10 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 maybe we should start developing reusable Matcher
s in JTH so you could write for example
await().until(() -> b, hasMessage("Complete branch 2 ?"));
which would be more flexible & composable than the current assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, would be much nicer than the current assertions available on JenkinsRule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#408 (comment) seems like a real bug. Otherwise looks good.
Co-authored-by: Jesse Glick <[email protected]>
Co-authored-by: Jesse Glick <[email protected]>
Co-authored-by: Jesse Glick <[email protected]>
This covers a case where a queue item for a node was filed, then the controller shut down without going through clean up.
Because of that, the queue wasn't saved, and when the controller restarts, the queue item is gone.
This modifies the behaviour of
ExecutorStepExecution#onResume
to detect this condition and re-schedule a new queue item if needed.Testing done
See provided unit test.
Submitter checklist