[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810

dwnusbaum · 2024-10-01T19:44:33Z

See JENKINS-73835 and jenkinsci/workflow-job-plugin#470. I noticed this while investigating #9790, but it is a distinct issue.

This PR makes Run.delete throw an exception if it is called on a build which has not yet completed. It also adjusts some logic related to LogRotator, which is the main programmatic caller of Run.delete to avoid some race conditions and related issues.

Testing done

See new automated PRs.

Proposed changelog entries

Do not allow builds to be deleted while they are still building
Ensure build discarders only process builds which have fully completed

Proposed upgrade guidelines

N/A

Submitter checklist

Give feedback

The Jira issue, if it exists, is well-described.
The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples). Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
There is automated testing or an explanation as to why this change has no tests.
New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
New or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
For dependency updates, there are links to external changelogs and, if possible, full differentials.
For new APIs and extension points, there is a link to at least one consumer.
Options

Desired reviewers

Before the changes are marked as ready-for-merge:

Maintainer checklist

Give feedback

There are at least two (2) approvals for the pull request and no outstanding requests for change.
Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
Proper changelog labels are set so that the changelog can be generated automatically.
If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).
Options

…l running

…mplete and always call Job.logRotate after build finalization

dwnusbaum · 2024-10-01T19:46:42Z

core/src/main/java/hudson/model/Run.java

-
-            try {
-                getParent().logRotate();
-            } catch (Exception e) {
-                LOGGER.log(Level.SEVERE, "Failed to rotate log", e);
-            }


I moved this into GlobalBuildDiscarderListener so that isLogUpdated() == true when log rotation runs. It still runs synchronously, it just runs a little bit later now down in onEndBuilding. This allows us to check isLogUpdated() elsewhere to avoid race conditions with Pipeline builds.

dwnusbaum · 2024-10-01T19:49:12Z

core/src/main/java/hudson/tasks/LogRotator.java

@@ -250,7 +250,7 @@ private boolean shouldKeepRun(Run r, Run lsb, Run lstb) {
            LOGGER.log(FINER, "{0} is not to be removed or purged of artifacts because it’s the last stable build", r);
            return true;
        }
-        if (r.isBuilding()) {
+        if (r.isLogUpdated()) {


This is to avoid race conditions involving Pipeline builds. Previously it was possible for log rotation to delete builds which had not yet fully completed, which meant that a build.xml file could be written back out into the build directory after log rotation had deleted the build. I will try to demonstrate this downstream in workflow-job once I have an incremental build here.

Are you confident that this is safe for freestyle builds? I.e., that !logUpdated → !building or conversely that building → logUpdated? The lifecycles for AbstractBuild and WorkflowRun are pretty different and the Javadoc has always been vague.

I am pretty sure it is safe, but if you prefer we can switch to isBuilding() || isLogUpdated() like the check for the UI in Jelly here (or maybe we should simplify that to only check isLogUpdated as well). See the non-Pipeline implementations here and the state enum here.

For !logUpdated → !building,!isLogUpdated is only true in State.COMPLETED, which ensures !isBuilding since COMPLETED is after POST_PRODUCTION.

The converse building → logUpdated should be ok as well, since if isBuilding then we must be in State.NOT_STARTED or State.BUILIDING, both of which are not COMPLETED, so isLogUpdated will be true.

For Pipeline things are simpler since WorkflowRun.isLogUpdated delegates to WorkflowRun.isBuilding.

Also just as a data point, if isLogUpdated could not be used on its own for non-pipelines, I think we would have seen it cause issues in tests due to the usage in JenkinsRule.waitForCompletion.

(And as far as I know the states for a Run can only move forward, although looking at the code I guess a subclass that calls some of the protected methods incorrectly could cause random state changes.)

as far as I know the states for a Run can only move forward

That looks right.

like the check for the UI in Jelly

Not sure what you mean.

jenkins/core/src/main/resources/hudson/model/Run/delete.jelly

Line 30 in 2082381

<j:if test="${!it.building and !it.keepLog}">

is only checking building and ought to be switched to only check logUpdated.

🤦 I misread keepLog as logUpdated. I'll update that.

I fixed that in f676be5.

dwnusbaum · 2024-10-01T19:55:42Z

core/src/main/java/jenkins/model/GlobalBuildDiscarderListener.java

+        } catch (Exception e) {
+            LOGGER.log(Level.WARNING, e, () -> "Failed to rotate log for " + run);
+        }
+        // Avoid calling Job.logRotate twice in case JobGlobalBuildDiscarderStrategy is configured globally.


This is a bit confusing, but essentially the old behavior was that Run.run called Job.logRotate while the build was in POST_PRODUCTION state, but also ever since #4368 if the configuration of GlobalBuildDiscarderConfiguration included JobGlobalBuildDiscarderStrategy (which it does by default), we also called Job.logRotate here while the build was in COMPLETED state. For backwards compatibility I think we need to ensure it gets called at least once regardless of the global configuration, and it's preferable to call it here in onFinalized so that we can check Run.isLogUpdated in LogRotator.shouldKeepRun and Run.delete to avoid race conditions with Pipeline builds. We avoid the redundant call by filtering out the relevant strategy when processing the globally-configured discarders.

Also, the old behavior is why I am not concerned about removing this call in workflow-job and also why I am not worried about making the above call asynchronous for the sake of Pipeline builds. BackgroundGlobalBuildDiscarder.processJob has been (redundantly) calling logRotate synchronously for years now in default configurations, so it seems that the asynchronous behavior in jenkinsci/workflow-job-plugin@63fdbe8 is no longer critical.

(jenkinsci/workflow-job-plugin#70 for better linking)

Hi, after some digging into the behavior of globalBuildDiscarder I ended here.

As a Jenkins admin, I expected the specific (or simple) global discarder to take precedence over any project build discarder, especially if I remove the global project build discarder. This is also what I implied from reading the descriptions here and here (even though it's not explicitly mentioned there and does not contradict the actual behavior). I also found multiple occurrences on SO and forums mentioning that global specific job discarder will take precedence, which is not the case. This is now obvious with the change line 51 here but was already the behavior before as I tested with 2.462.2.

What makes it even more confusing, is the way the discarders are merged and applied. The most aggressive policy takes over. This is confirmed to be an expected behavior from the tests. However, from an admin perspective I'd see the least aggressive discarder to be a safer approach (e.g. for compliance reasons).

I'm curious however, to get your input on the topic. Also let me know if I should raise this somewhere else.

For context, this investigation started as a result of the security eng. telling me that he can erase his traces by overriding build discarder in a job. Example: normal run -> evil run (set buildDiscarder to 1) -> normal run. Then we can't see what happened in the evil run anymore. We also have some compliance policy that require us to keep production pipelines history up to a year which we would like to enforce.

I think the current behavior is as designed; build discarders are mainly meant to trim disk space, so it would be normal to have a blanket policy that builds over a month old are not kept, while a particular job that runs every five minutes for some automation is configured to discard all but the last build since history is trash. If your interest is in auditing, you would better use one of the many plugins which stream events or even whole build logs from Jenkins to external systems, which could be configured to only accept “create/append”-type operations and reject attempts to delete anything even if the Jenkins controller were to be compromised somehow. After all, even without any discarders, your white-hat engineer could simply delete the job after one build, along with all of its builds. You could also make the entire controller view-only to regular developers, using various “as code” systems.

test/src/test/java/hudson/model/RunTest.java

dwnusbaum · 2024-10-01T20:29:19Z

test/src/test/java/hudson/tasks/LogRotatorTest.java

@@ -96,6 +101,17 @@ public void successVsFailureWithRemoveLastBuild() throws Exception {
        assertEquals(2, numberOf(project.getLastFailedBuild()));
    }

+    @Test
+    public void ableToDeleteCurrentBuild() throws Exception {


Hmm, this was to check that the switch to isLogUpdated in LogRotator.shouldKeepRun did not break anything, since previously the call in Run.run would not have been able to delete the build with the switch. Because of the redundant call via GlobalBuildDiscarderListener though it worked anyway, the only difference was a log warning from the deleted code in Run.run, which we can't assert against any more. So maybe this test is no longer checking anything useful.

I guess, in principle, someone could want to keep zero builds if they were using publishers (GitHub Checks, Slack, etc.) to track failures?

…hileRunning

…essEvenTheBuildIsRunning to match new behavior

dwnusbaum · 2024-10-02T17:16:23Z

test/src/test/java/hudson/cli/DeleteBuildsCommandTest.java

@@ -139,24 +136,18 @@ public class DeleteBuildsCommandTest {
        assertThat(result.stdout(), containsString("Deleted 0 builds"));
    }

-    @Test public void deleteBuildsShouldSuccessEvenTheBuildIsRunning() throws Exception {


I did not notice this test before. It was added in #2310 along with all of the other tests for delete-builds.

I still think the previous behavior was undesirable - deleting builds that are still running and reachable in memory does not make sense, can lead to tons of errors in the Jenkins logs, and any builds deleted while running are very likely to come back after a Jenkins restart given that build completion will cause build.xml to be rewritten.

I can see the argument though that Run.delete should work like Job.delete and cancel the build if it is running, waiting up to 15 seconds and only throwing an exception if the build still hasn't completed at that point. @jglick suggested this approach as well, but I was hoping to avoid adding complexity here unless we really think that behavior is necessary. Any opinions on this?

No strong opinion. I think the proposed error behavior is intuitive enough. You can explicitly abort the build first if that is what you intended.

…un.isBuilding

dwnusbaum · 2024-10-11T18:39:14Z

/label ready-for-merge

This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

basil · 2024-10-15T16:38:00Z

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 7.344 s <<< FAILURE! -- in org.jenkins.plugins.lockableresources.LockStepTest
[ERROR] org.jenkins.plugins.lockableresources.LockStepTest.deleteRunningBuildNewBuildClearsLock -- Time elapsed: 7.330 s <<< ERROR!
java.io.IOException: Unable to delete p #1 because it is still running
	at hudson.model.Run.delete(Run.java:1555)
	at org.jenkins.plugins.lockableresources.LockStepTest.deleteRunningBuildNewBuildClearsLock(LockStepTest.java:725)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:658)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:840)

dwnusbaum · 2024-10-15T16:43:05Z

I'll take a look at LockStepTest.deleteRunningBuildNewBuildClearsLock. Perhaps the test can just be deleted now.

dwnusbaum · 2024-10-15T17:07:18Z

jenkinsci/lockable-resources-plugin#716 deletes the test.

…l running and ensure build discarders run after builds are fully complete (jenkinsci#9810) * [JENKINS-73835] Do not allow builds to be deleted while they are still running * [JENKINS-73835] Avoid redundant calls to Job.logRotate when builds complete and always call Job.logRotate after build finalization * [JENKINS-73835] Add issue reference to RunTest.buildsMayNotBeDeletedWhileRunning * [JENKINS-73835] Adjust DeleteBuildsCommandTest.deleteBuildsShouldSuccessEvenTheBuildIsRunning to match new behavior * [JENKINS-73835] Run/delete.jelly should check Run.isLogUpdated, not Run.isBuilding

…l running and ensure build discarders run after builds are fully complete (jenkinsci#9810) * [JENKINS-73835] Do not allow builds to be deleted while they are still running * [JENKINS-73835] Avoid redundant calls to Job.logRotate when builds complete and always call Job.logRotate after build finalization * [JENKINS-73835] Add issue reference to RunTest.buildsMayNotBeDeletedWhileRunning * [JENKINS-73835] Adjust DeleteBuildsCommandTest.deleteBuildsShouldSuccessEvenTheBuildIsRunning to match new behavior * [JENKINS-73835] Run/delete.jelly should check Run.isLogUpdated, not Run.isBuilding (cherry picked from commit d34b17e)

basil · 2024-10-15T20:43:41Z

Ever since this PR was merged, hudson.cli.DeleteBuildsCommandTest#deleteBuildsShouldFailIfTheBuildIsRunning has been consistently failing on Windows CI builds with:

java.nio.file.DirectoryNotEmptyException: C:\Jenkins\agent\workspace\Core_jenkins_master\test\target\j h14545459714006339336\jobs\aProject\builds\1
	at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:272)
	at java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:110)
	at java.base/java.nio.file.Files.deleteIfExists(Files.java:1191)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:146)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:136)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:136)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:136)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:136)
	at org.jvnet.hudson.test.TemporaryDirectoryAllocator.dispose(TemporaryDirectoryAllocator.java:104)
	at org.jvnet.hudson.test.TestEnvironment.dispose(TestEnvironment.java:84)
	at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:538)
	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:676)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:840)
	Suppressed: java.io.IOException: These files still exist : log
		at org.jvnet.hudson.test.TemporaryDirectoryAllocator.delete(TemporaryDirectoryAllocator.java:150)
		... 10 more

First failure was https://ci.jenkins.io/job/Core/job/jenkins/job/master/6632/testReport/junit/hudson.cli/DeleteBuildsCommandTest/windows_jdk17___Windows___JDK_17___Build___Test___deleteBuildsShouldFailIfTheBuildIsRunning/

dwnusbaum · 2024-10-15T21:14:19Z

Yes, looks like the test needs to forcibly stop the build. I will file a PR to fix it in a bit.

dwnusbaum · 2024-10-15T21:21:03Z

I filed #9876 to stabilize that test.

jglick · 2024-10-28T18:04:58Z

test/src/test/java/hudson/tasks/LogRotatorTest.java

+        logRotator.setRemoveLastBuild(true);
+        p.setBuildDiscarder(logRotator);
+        j.buildAndAssertStatus(Result.SUCCESS, p);
+        assertNull(p.getBuildByNumber(1));


Failure in https://github.com/jenkinsci/jenkins/pull/9921/checks?check_run_id=32166988156 (#9921) looks like a flake? Missing await?

I don't know what we would await for, because log rotation runs synchronously during build completion. Unless this is a test-specific timing issue due to JenkinsRule.buildAndAssertSuccess just waiting for QueueTaskFuture.get instead of !Run.isLogUpdated, then this test probably just needs to be skipped on Windows.

From some testing, j.buildAndAssertStatus(Result.SUCCESS, p) is guaranteed to block until GlobalBuildDiscarderListener.onFinalized has completed. Perhaps the BuildWatcher background thread tried to copy the logs at an inopportune time and caused log rotation to fail. Seems best to delete the BuildWatcher here in case it made any of the other tests flaky on Windows and then probably skip this test explicitly on Windows as well.

Also the AccessDeniedException seems strange. Either way, I filed #9923.

dwnusbaum added 2 commits September 30, 2024 12:27

[JENKINS-73835] Do not allow builds to be deleted while they are stil…

cf30fcc

…l running

[JENKINS-73835] Avoid redundant calls to Job.logRotate when builds co…

eec23fd

…mplete and always call Job.logRotate after build finalization

dwnusbaum added the bug For changelog: Minor bug. Will be listed after features label Oct 1, 2024

dwnusbaum commented Oct 1, 2024

View reviewed changes

[JENKINS-73835] Add issue reference to RunTest.buildsMayNotBeDeletedW…

97fb6aa

…hileRunning

dwnusbaum mentioned this pull request Oct 1, 2024

[JENKINS-73824][JENKINS-73835] Remove redundant log rotation after changes in core and add regression tests related to deleting Pipeline jobs and builds jenkinsci/workflow-job-plugin#470

Merged

6 tasks

dwnusbaum marked this pull request as ready for review October 1, 2024 21:28

[JENKINS-73835] Adjust DeleteBuildsCommandTest.deleteBuildsShouldSucc…

9a6a90a

…essEvenTheBuildIsRunning to match new behavior

dwnusbaum commented Oct 2, 2024

View reviewed changes

jglick approved these changes Oct 2, 2024

View reviewed changes

[JENKINS-73835] Run/delete.jelly should check Run.isLogUpdated, not R…

f676be5

…un.isBuilding

jglick approved these changes Oct 7, 2024

View reviewed changes

Merge branch 'master' into JENKINS-73835

9aaa7ba

scherler approved these changes Oct 8, 2024

View reviewed changes

comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Oct 11, 2024

MarkEWaite merged commit d34b17e into jenkinsci:master Oct 12, 2024
16 checks passed

dwnusbaum deleted the JENKINS-73835 branch October 14, 2024 16:03

dwnusbaum mentioned this pull request Oct 15, 2024

[JENKINS-73835] Delete LockStepTest.deleteRunningBuildNewBuildClearsLock now that running builds may not be deleted jenkinsci/lockable-resources-plugin#716

Merged

15 tasks

dwnusbaum mentioned this pull request Oct 15, 2024

Stop build in DeleteBuildsCommandTest.deleteBuildsShouldFailIfTheBuildIsRunning so test can be cleaned up on Windows consistently #9876

Merged

dwnusbaum mentioned this pull request Oct 16, 2024

[JENKINS-73835] Adjust NetworkTest.errorCleaningArtifacts test to record more loggers for compatibility with newer cores jenkinsci/artifact-manager-s3-plugin#551

Merged

6 tasks

jglick reviewed Oct 28, 2024

View reviewed changes

dwnusbaum mentioned this pull request Oct 28, 2024

Skip LogRotatorTest#ableToDeleteCurrentBuild on Windows #9923

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810

[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810

dwnusbaum commented Oct 1, 2024 •

edited

Loading

Submitter checklist

Maintainer checklist

dwnusbaum Oct 1, 2024

dwnusbaum Oct 1, 2024

jglick Oct 2, 2024

dwnusbaum Oct 2, 2024 •

edited

Loading

dwnusbaum Oct 2, 2024

jglick Oct 2, 2024 •

edited

Loading

dwnusbaum Oct 2, 2024

dwnusbaum Oct 2, 2024

dwnusbaum Oct 1, 2024 •

edited

Loading

jglick Oct 2, 2024

IppX Feb 3, 2025

jglick Feb 10, 2025

dwnusbaum Oct 1, 2024 •

edited

Loading

jglick Oct 2, 2024

dwnusbaum Oct 2, 2024

jglick Oct 2, 2024

dwnusbaum commented Oct 11, 2024

basil commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

basil commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

jglick Oct 28, 2024

dwnusbaum Oct 28, 2024

dwnusbaum Oct 28, 2024

dwnusbaum Oct 28, 2024

[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810

[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810

Conversation

dwnusbaum commented Oct 1, 2024 • edited Loading

Testing done

Proposed changelog entries

Proposed upgrade guidelines

Submitter checklist

Desired reviewers

Maintainer checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jglick Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum commented Oct 11, 2024

basil commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

basil commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

dwnusbaum commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwnusbaum commented Oct 1, 2024 •

edited

Loading

dwnusbaum Oct 2, 2024 •

edited

Loading

jglick Oct 2, 2024 •

edited

Loading

dwnusbaum Oct 1, 2024 •

edited

Loading

dwnusbaum Oct 1, 2024 •

edited

Loading