-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-65829] Fix WorkspaceCleanupThread to consider suffixed workspaces even if original is inexistent #9083
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workspacecleaner is broken in my eyes, already before and this change doesn't fix the problems. It totally neglects the fact of concurrent build directories. And the approach to check the timestamp of the workspace root is also problematic. The timestamp of a directory only changes when you create or delete files but if you just modify files the timestamp might not change. Also adding removing files in subdirectories will not lead to a change of the timestamp.
Another bug is the check if a job is currently building. It uses job.isBuilding()
but that only return true if the last build is running. When I have concurrent builds and my last build failed but the previous is still running it will falsely. The correct approach would probably be to loop over the executors of a node and check if any of them corresponds to the current job.
So worst case might be that you delete a workspace that is currently in use. The risk is not very high though, but I've had some situations where jenkins was not properly releasing the lease of workspaces. The result was that a build was running in a workspace with @2
on an agent that has only 1 executor. In such situations the risk of deleting an inuse workspace will be much higher.
} | ||
|
||
// if not the workspace or a workspace suffix | ||
if (!dir.getName().equals(workspaceBaseName) && !dir.getName().startsWith(workspaceBaseName + WorkspaceList.COMBINATOR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also match against build directories of concurrent builds.
Another problem I see is that the deletion is not atomic (or near atomic). For large workspaces the deletion might take quite some time and in the meantime a new could have started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the last modified date of each directory makes sense to me. Thanks for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/label ready-for-merge
This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.
Thanks!
See JENKINS-65829. The
WorkspaceCleanupThread
was not considering workspaces with suffixes if the original workspace did not exist. Typically@libs
workspaces on a controller.Reworked the
WorkspaceCleanupThread
to reduce the number of remoting calls by creating aMasterToSlaveFileCallable
that do filtering + deletion. There used to be multiple calls to check if directory exists, the last modified date, do a delete recursive, do a delete of the suffixes.Now reduced to only one remoting call.
https://github.com/Dohbedoh/jenkins/blob/JENKINS-65829/core/src/main/java/hudson/model/WorkspaceCleanupThread.java#L102
Note: the retention is check on each directory. Before, we used to check the last modified date of the WORKSPACE directory and if older than 30 day, we would delete it and all suffixed workspace. Now we check the last modified date of each directory. Which makes sense to me. WDYT ?
Testing done
Test 1
--> Validate that the
$WORKSPACE@libs
on the controller is removedTest 2
$WORKSPACE@tmp
,$WORKSPACE@test
.built-in
node - we need to to this because the cleanup thread preserve the workspace of the last build. So we want to test that everything on the agent (previous build) gets cleaned up.--> Validate that the
$WORKSPACE
,$WORKSPACE@libs
and$WORKSPACE@tmp
on the agent are removedProposed changelog entries
WorkspaceCleanupThread
to consider workspaces with suffixes even if the original is inexistentWorkspaceCleanupThread
Proposed upgrade guidelines
N/A
Submitter checklist
Desired reviewers
Before the changes are marked as
ready-for-merge
:Maintainer checklist