Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up fixes to Nodes refactoring #9053

Merged
merged 1 commit into from
Mar 24, 2024
Merged

Conversation

jglick
Copy link
Member

@jglick jglick commented Mar 19, 2024

Follows up #8979. I found a test regression in a CloudBees CI plugin triggered by this change, after removing a defensive null check not present in #8979. After some investigation I found that #8979 caused MasterComputer to be created a bit later during Jenkins startup: rather than inside Nodes.load as before

jenkins.model.Jenkins$MasterComputer.<init>(Jenkins.java:5346)
hudson.model.Hudson$MasterComputer.<init>(Hudson.java:330)
jenkins.model.Jenkins.createComputer(Jenkins.java:3367)
hudson.model.AbstractCIBase.lambda$createNewComputerForNode$0(AbstractCIBase.java:173)
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:171)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:252)
hudson.model.Queue._withLock(Queue.java:1410)
hudson.model.Queue.withLock(Queue.java:1284)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
jenkins.model.Jenkins.updateComputerList(Jenkins.java:1705)
jenkins.model.Nodes$6.run(Nodes.java:351)
hudson.model.Queue._withLock(Queue.java:1410)
hudson.model.Queue.withLock(Queue.java:1284)
jenkins.model.Nodes.load(Nodes.java:346)
jenkins.model.Jenkins$12.run(Jenkins.java:3499)

it was being created during an explicit call to Jenkins.updateComputerList in the startup sequence

jenkins.model.Jenkins$MasterComputer.<init>(Jenkins.java:5366)
hudson.model.Hudson$MasterComputer.<init>(Hudson.java:330)
jenkins.model.Jenkins.createComputer(Jenkins.java:3387)
hudson.model.AbstractCIBase.lambda$createNewComputerForNode$0(AbstractCIBase.java:173)
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:171)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:252)
hudson.model.Queue._withLock(Queue.java:1409)
hudson.model.Queue.withLock(Queue.java:1283)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
jenkins.model.Jenkins.updateComputerList(Jenkins.java:1710)
jenkins.model.Jenkins.<init>(Jenkins.java:1029)

The reason was an attempted optimization to skip loading when the nodes directory did not exist, which would be true if no agent had ever been created (the case in the plugin test), even though listFiles was already being checked for null; the optimization wound up also skipping the early call to Jenkins.updateComputerList, which was unintentional.

That would probably not have mattered were it not for two other factors: an NPE in Queue.createFlyWeightTaskRunnable which failed to check¹ for a null return value from Jenkins.toComputer() and thus broke resumed Pipeline build handling; and a mistake in the proprietary plugin which wound up attempting to load running Pipeline builds earlier than they normally would have been loaded by FlowExecutionList.resume.

I also noticed that the new ScheduleMaintenanceAfterSavingNode calls Queue.scheduleMaintenance even when Jenkins.save (not Slave.save) is called, which might be wasteful, though it does not seem to have contributed to the test failure. Perhaps it is correct, in case Jenkins.numExecutors is adjusted?

Testing done

Removal of early exit in Nodes.load fixes a functional test in a CloudBees CI plugin. Extra null check also did. Fixing timing in the plugin also did.

¹Note that SpotBugs does not find this even after deleting

<Class name="hudson.model.Queue"/>
<Class name="hudson.model.Queue$JobOffer"/>

Proposed changelog entries

  • N/A, no known user impact

Before the changes are marked as ready-for-merge:

Maintainer checklist

@jglick jglick requested review from Vlatombe and julieheard March 19, 2024 18:56
Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is now ready for merge. We will merge it after approximately 24 hours if there is no negative feedback.

/label ready-for-merge

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Mar 23, 2024
@MarkEWaite MarkEWaite added the skip-changelog Should not be shown in the changelog label Mar 23, 2024
@MarkEWaite MarkEWaite merged commit c3e3a4a into jenkinsci:master Mar 24, 2024
16 checks passed
@jglick jglick deleted the Nodes.load branch March 25, 2024 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback skip-changelog Should not be shown in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants