Follow-up fixes to `Nodes` refactoring #9053

jglick · 2024-03-19T18:56:08Z

Follows up #8979. I found a test regression in a CloudBees CI plugin triggered by this change, after removing a defensive null check not present in #8979. After some investigation I found that #8979 caused MasterComputer to be created a bit later during Jenkins startup: rather than inside Nodes.load as before

jenkins.model.Jenkins$MasterComputer.<init>(Jenkins.java:5346)
hudson.model.Hudson$MasterComputer.<init>(Hudson.java:330)
jenkins.model.Jenkins.createComputer(Jenkins.java:3367)
hudson.model.AbstractCIBase.lambda$createNewComputerForNode$0(AbstractCIBase.java:173)
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:171)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:252)
hudson.model.Queue._withLock(Queue.java:1410)
hudson.model.Queue.withLock(Queue.java:1284)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
jenkins.model.Jenkins.updateComputerList(Jenkins.java:1705)
jenkins.model.Nodes$6.run(Nodes.java:351)
hudson.model.Queue._withLock(Queue.java:1410)
hudson.model.Queue.withLock(Queue.java:1284)
jenkins.model.Nodes.load(Nodes.java:346)
jenkins.model.Jenkins$12.run(Jenkins.java:3499)

it was being created during an explicit call to Jenkins.updateComputerList in the startup sequence

jenkins.model.Jenkins$MasterComputer.<init>(Jenkins.java:5366)
hudson.model.Hudson$MasterComputer.<init>(Hudson.java:330)
jenkins.model.Jenkins.createComputer(Jenkins.java:3387)
hudson.model.AbstractCIBase.lambda$createNewComputerForNode$0(AbstractCIBase.java:173)
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:171)
hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:252)
hudson.model.Queue._withLock(Queue.java:1409)
hudson.model.Queue.withLock(Queue.java:1283)
hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
jenkins.model.Jenkins.updateComputerList(Jenkins.java:1710)
jenkins.model.Jenkins.<init>(Jenkins.java:1029)

The reason was an attempted optimization to skip loading when the nodes directory did not exist, which would be true if no agent had ever been created (the case in the plugin test), even though listFiles was already being checked for null; the optimization wound up also skipping the early call to Jenkins.updateComputerList, which was unintentional.

That would probably not have mattered were it not for two other factors: an NPE in Queue.createFlyWeightTaskRunnable which failed to check¹ for a null return value from Jenkins.toComputer() and thus broke resumed Pipeline build handling; and a mistake in the proprietary plugin which wound up attempting to load running Pipeline builds earlier than they normally would have been loaded by FlowExecutionList.resume.

I also noticed that the new ScheduleMaintenanceAfterSavingNode calls Queue.scheduleMaintenance even when Jenkins.save (not Slave.save) is called, which might be wasteful, though it does not seem to have contributed to the test failure. Perhaps it is correct, in case Jenkins.numExecutors is adjusted?

Testing done

Removal of early exit in Nodes.load fixes a functional test in a CloudBees CI plugin. Extra null check also did. Fixing timing in the plugin also did.

¹Note that SpotBugs does not find this even after deleting

jenkins/core/src/spotbugs/excludesFilter.xml

Lines 273 to 274 in ec5c072

    
           <Class name="hudson.model.Queue"/> 
        
           <Class name="hudson.model.Queue$JobOffer"/>

Proposed changelog entries

N/A, no known user impact

Before the changes are marked as ready-for-merge:

Maintainer checklist

Give feedback

There are at least two (2) approvals for the pull request and no outstanding requests for change.
Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
Proper changelog labels are set so that the changelog can be generated automatically.
If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).
Options

MarkEWaite

This PR is now ready for merge. We will merge it after approximately 24 hours if there is no negative feedback.

/label ready-for-merge

Follow-up fixes to Nodes refactoring

bab1454

jglick requested review from Vlatombe and julieheard March 19, 2024 18:56

Vlatombe approved these changes Mar 20, 2024

View reviewed changes

MarkEWaite approved these changes Mar 23, 2024

View reviewed changes

comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Mar 23, 2024

MarkEWaite added the skip-changelog Should not be shown in the changelog label Mar 23, 2024

MarkEWaite merged commit c3e3a4a into jenkinsci:master Mar 24, 2024
16 checks passed

jglick deleted the Nodes.load branch March 25, 2024 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up fixes to `Nodes` refactoring #9053

Follow-up fixes to `Nodes` refactoring #9053

jglick commented Mar 19, 2024

Maintainer checklist

MarkEWaite left a comment

	<Class name="hudson.model.Queue"/>
	<Class name="hudson.model.Queue$JobOffer"/>

Follow-up fixes to Nodes refactoring #9053

Follow-up fixes to Nodes refactoring #9053

Conversation

jglick commented Mar 19, 2024

Testing done

Proposed changelog entries

Maintainer checklist

MarkEWaite left a comment

Choose a reason for hiding this comment

Follow-up fixes to `Nodes` refactoring #9053

Follow-up fixes to `Nodes` refactoring #9053