Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified PlaceholderTask #354

Merged
merged 4 commits into from
Jan 30, 2024
Merged

Simplified PlaceholderTask #354

merged 4 commits into from
Jan 30, 2024

Conversation

jglick
Copy link
Member

@jglick jglick commented Jan 26, 2024

Working on the same test as in #353, I found a (locally unreproducible) case where it appears that Queue.cancel from PlaceholderTask.Callback.finished triggered CancelledItemListener. PlaceholderTask.stopping was designed to prevent exactly this; the fact that it did not suggests that there are multiple PlaceholderTasks involved. As in #344, the root design flaw is that there is mutable state in PlaceholderTask yet it is difficult to ensure Java object identity here. So here I am moving the stopping field to transient global state (RunningTasks). I am also restoring cookie to its original purpose of an actual environment variable value used for nothing more than killing off stray background processes at the end of the block; and using StepContext as the primary key whenever we need to look up state. I did consider making PlaceholderTask actually immutable but this would require changing the storage of the label and cookie fields, which is more difficult for two reasons: we need to actually read these values from builds running before the upgrade; and we would need to find some other spot to persist them (such as fields in ExecutorStepExecution perhaps). I think these fields are not real problems anyway, since they are only mutated once, in PlaceholderExecutable.run, and before that point there does not seem to be a risk of there being multiple copies of the PlaceholderTask: the known problems involve deserialization from both the queue and from ExecutorStepDynamicContext.task, but ExecutorStepDynamicContext is not created until after cookie and label have been written.

@jglick jglick requested a review from a team as a code owner January 26, 2024 21:42
@jglick jglick added the bug label Jan 26, 2024
@jglick jglick requested a review from dwnusbaum January 26, 2024 21:42
// PlaceholderTask#getAssignedLabel is set to the Node name when execution starts
// Thus we're guaranteeing the execution began and the Node is now unknown.
// Theoretically it's safe to simply fail earlier when rehydrating any EphemeralNode... but we're being extra safe.
if (placeholder.getCookie() != null && Jenkins.get().getNode(placeholder.getAssignedLabel().getName()) == null ) {
if (placeholder.hasStarted() && Jenkins.get().getNode(placeholder.getAssignedLabel().getName()) == null ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just better encapsulation)

@@ -97,8 +97,6 @@ void resume(StepContext context) throws Exception {
if (item == null) {
throw new IllegalStateException("queue refused " + task);
}
// Try to avoid having distinct a instance of PlaceholderTask here compared to any previously-scheduled task.
task = (ExecutorStepExecution.PlaceholderTask)item.task;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting this bit of #344 which should now be unnecessary: any instances in memory ought to be interchangeable at this point.

@@ -875,7 +855,7 @@ private String computeEnclosingLabel(FlowNode executorStepNode, List<FlowNode> h
}

@Override public String toString() {
return "ExecutorStepExecution.PlaceholderTask{runId=" + runId + ",label=" + label + ",context=" + context + ",cookie=" + cookie + ",auth=" + auth + '}';
return "ExecutorStepExecution.PlaceholderTask{label=" + label + ",context=" + context + '}';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing noise and redundancy. CpsStepContext.toString already encodes the Run#externalizableId; and cookie is now rarely interesting. auth was never interesting unless you installed authorized-project, now deprecated.

if (cookie == null) {
private static void finish(StepContext context, @CheckForNull final String cookie) {
RunningTask runningTask = RunningTasks.remove(context);
final AsynchronousExecution execution = runningTask.execution;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(hide whitespace)

@@ -1189,8 +1153,6 @@ public String getAbsoluteUrl() {
return "PlaceholderExecutable:" + PlaceholderTask.this;
}

private static final long serialVersionUID = 1L;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why this was here, when the PlaceholderExecutable is not Serializable.

@jglick jglick marked this pull request as draft January 27, 2024 01:29
@jglick

This comment was marked as resolved.

… task completed), and `RunningTasks` add/remove timing was off
@jglick jglick marked this pull request as ready for review January 29, 2024 23:47
@jglick jglick requested a review from dwnusbaum January 29, 2024 23:47
@dwnusbaum
Copy link
Member

If d1a2aa2 fixes test failures in PCT, would it make sense to add a test (or multiple tests) here for whatever it is fixing?

@jglick
Copy link
Member Author

jglick commented Jan 30, 2024

Probably yes. The clear problem was the willContinue implementation, which caused a test using mock-slave to keep an ephemeral agent online after its node block ended, and thus to be reused rather than a new agent being created. This seems to have been the cause of more subtle issues in other tests as well. I think I can find a more focused way to reproduce that. (Maybe a test using OnceRetentionStrategy needs to be introduced? I do not particularly want to add a test dep on mock-slave however.)

While working on that I noticed that Callback.finished may not have been running all of its logic consistently, since finish had been remove the RunningTask entry (which led to a broader reworking of the timing of this entry’s addition and removal). I am not really sure how critical a problem that was: it did not appear to result in any PCT failures.

import org.jvnet.hudson.test.TestExtension;

/**
* Like {@link ExecutorStepTest} but not using {@link Parameterized} which appears incompatible with {@link TestExtension}.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not notice in #323 that PipelineOnlyTaskDispatcher was not actually getting loaded. (queueTaskOwnerCorrectWhenRestarting passed with or without it.)

var p = r.createProject(WorkflowJob.class, "p");
p.setDefinition(new CpsFlowDefinition("node('test') {}", true));
r.assertLogContains("Running on test-1", r.buildAndAssertSuccess(p));
r.assertLogContains("Running on test-2", r.buildAndAssertSuccess(p));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without d1a2aa2, fails

java.lang.AssertionError: 

Expected: a string containing "Running on test-2"
     but: was "Started
[Pipeline] Start of Pipeline
[Pipeline] node
Running on test-1 in …/agents/test-1/workspace/p
[Pipeline] {
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS
"
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
	at org.jvnet.hudson.test.JenkinsRule.assertLogContains(JenkinsRule.java:1518)
	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStep2Test.lambda$cloud$2(ExecutorStep2Test.java:121)

@jglick jglick merged commit 63864b7 into jenkinsci:master Jan 30, 2024
14 checks passed
@jglick jglick deleted the PlaceholderTask branch January 30, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants