Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-23152] More robust handling of build number collisions #7523

Merged
merged 1 commit into from
Dec 19, 2022

Conversation

jglick
Copy link
Member

@jglick jglick commented Dec 14, 2022

See JENKINS-23152 as defended against originally in #1379. If nextBuildNumber is stale or the in-memory build number list fails to match what is on disk in some other way, you can get errors like

SEVERE	hudson.model.Executor#run: Executor #-1 for Built-In Node: Unexpected executor death
java.lang.IllegalStateException: JENKINS-23152: …/jobs/prj/builds/123 already existed; will not overwrite with prj #123
	at hudson.model.RunMap.put(RunMap.java:193)
	at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:184)
Caused: java.lang.LinkageError: JENKINS-23152: …/jobs/prj/builds/123 already existed; will not overwrite with prj #123
	at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:192)
	at jenkins.model.ParameterizedJobMixIn$ParameterizedJob.createExecutable(ParameterizedJobMixIn.java:505)
	at jenkins.model.ParameterizedJobMixIn$ParameterizedJob.createExecutable(ParameterizedJobMixIn.java:323)
	at hudson.model.Executor$1.call(Executor.java:370)
	at hudson.model.Executor$1.call(Executor.java:352)
	at hudson.model.Queue._withLock(Queue.java:1456)
	at hudson.model.Queue.withLock(Queue.java:1312)
	at hudson.model.Executor.run(Executor.java:352)

which prevent a new build from being run at all. This patch at least retries with the newly updated nextBuildNumber until it finds an open slot.

#2439 also added some robustness during onLoad, which cannot be used reliably here since that presumes maxNumberOnDisk is freshly computed from actually inspecting the builds directory.

Testing done

Manually decrementing nextBuildNumber on disk. Not sure this is worth an automated test.

Proposed changelog entries

  • Robustness improvement regarding build number collisions.

Maintainer checklist

Before the changes are marked as ready-for-merge:

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

var rootDir = lastBuild.getRootDir().toPath();
if (Files.isDirectory(rootDir)) {
LOGGER.warning(() -> "JENKINS-23152: " + rootDir + " already existed; will not overwrite with " + lastBuild + " but will create a fresh build #" + asJob().getNextBuildNumber());
return newBuild();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is (almost) tail-recursion and could be rewritten to use a loop. OTOH if there is actually a list of thousands of builds on disk newer than the purported nextBuildNumber, a StackOverflowError is arguably no worse than the previous behavior, and arguably preferable to going into an endless loop. Could tweak to be a bounded loop etc.

throw new LinkageError(e.getMessage(), e);
} catch (ReflectiveOperationException e) {
throw new LinkageError("A new build could not be created in " + asJob().getFullName() + ": " + e, e);
} catch (IllegalStateException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only tangentially related, but note the inappropriate use of LinkageError in the PR description. This seems to be because #2125 added a multicatch which #5483 translated incorrectly.

@NotMyFault NotMyFault added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Dec 14, 2022
@NotMyFault
Copy link
Member

/label ready-for-merge


This PR is now ready for merge. We will merge it after ~24 hours if there is no negative feedback.
Please see the merge process documentation for more information about the merge process.
Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Dec 17, 2022
@MarkEWaite MarkEWaite merged commit 9254655 into jenkinsci:master Dec 19, 2022
@jglick jglick deleted the newBuild branch December 21, 2022 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants