Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-53958] allow pipeline jobs to run when built-in is offline #9203

Merged
merged 1 commit into from
May 8, 2024

Conversation

mawinter69
Copy link
Contributor

@mawinter69 mawinter69 commented Apr 27, 2024

When a pipeline starts building it creates an OneOffExecutor that takes care of the pipeline execution on the built-in node. The executor has some logic to prevent running things when an agent has gone offline in the timeframe between assigning the task to the executor and the executor actually starting running the task. But this logic falsely leads to a termination of the executor for the pipeline job and the attempts to restart the task fails as the task is no longer in the queue.
This change tries to avoid this by ignoring the online state for the built-in node as it will never be really offline (there is no channel that can be closed). One can take it temporarily offline but this should not prevent pipelines that do not explicitly make use of the built-in to start running.

See JENKINS-53958.

Testing done

Manual testing:

Scenario 1: pipeline not using built-in

  1. take the built-in node offline
  2. create a second agent
  3. Create pipeline job that will run something on the second agent
  4. run the pipeline -> pipeline run succeeds

Scenario 2: pipeline explicitly using built-in

  1. take the built-in node offline
  2. create a second agent
  3. Create pipeline job that will run something on the built-in
  4. run the pipeline -> pipeline run waits for the built-in node

Proposed changelog entries

  • Allow pipeline jobs to run when built-in is offline.

Proposed upgrade guidelines

N/A

Submitter checklist

Desired reviewers

@mention

Before the changes are marked as ready-for-merge:

Maintainer checklist

When a pipeline starts building it creates an OneOffExecutor that takes
care of the pipeline execution on the built-in node.
The executor has some logic to prevent running things when an agent has
gone offline in the timeframe between assiging the task to the executor
and the executor actually starting running the task. But this logic
falsely leads to a termination of the executor for the pipeline job and
the attempts to restart the task fails as the task is no longer in the
queue.
Thas change tries to avoid this by ignoring the online state for the
built-in node as it will never be really offline (there is no channel
that can be closed). One can take it temporarily offline but this should
not prevent pipelines that do not explicitly make use of the built-in to
start running.
@NotMyFault NotMyFault requested a review from a team April 29, 2024 08:20
@NotMyFault NotMyFault added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Apr 29, 2024
@NotMyFault NotMyFault requested a review from a team May 1, 2024 07:58
@StefanSpieker
Copy link
Contributor

In general, I like the idea, but in the past the current behavior also helped us when the controller disk space ran out and the built-in node was taken offline automatically. We do not run jobs on the built-in node, but since it is needed for starting jobs, it prevented further jobs to run.
We have a better monitoring today, so we hadn't had this issue for quite some time, but it might be worth considering.

@mawinter69
Copy link
Contributor Author

I think that when a node is offline it should only affect explicit usage of the node i.e. because it is matched by a label expression in a node step or due to a job restriction.
The situation as it is now is bad as pipelines jobs just fail as if they were never triggered, only in the logs is some message where it is not instantly clear what happened. Better have the job fail with disk problems I would say.
Maybe one should check before starting the flyweighttask if the node is online and if not then leave it in the queue. But that is a bigger change and requires thorough testing.

Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label May 6, 2024
@Luckybangar

This comment was marked as off-topic.

@timja timja merged commit fa0464c into jenkinsci:master May 8, 2024
16 checks passed
@alexsch01
Copy link

I just updated to a version of Jenkins that now has this behavior. Ughh.

Is there a config option to revert this?

@mawinter69 mawinter69 deleted the JENKINS-53958 branch October 2, 2024 15:24
@MarkEWaite
Copy link
Contributor

I just updated to a version of Jenkins that now has this behavior. Ughh.

Is there a config option to revert this?

As far as I know there is not a configuration option to revert this change. Can you provide a detailed description of the problem you are seeing as a result of this change so that it can be evaluated?

The Jenkins issue tracker is the best place to provide that detailed description

@alexsch01
Copy link

@MarkEWaite
I used the Disable in built node button to prevent all schedules from running
Now that doesn't work
Is there another way of achieving that?

@daniel-beck
Copy link
Member

This is not the place to have that discussion. Please file an issue in Jira or open a topic in the forum.

@alexsch01
Copy link

my apologies, I made issue https://issues.jenkins.io/browse/JENKINS-73866

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants