-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): remove ArtifactGC finalizer
when no artifacts. Fixes #13499
#13500
Conversation
c2ed5d5
to
5119e99
Compare
finalizers
can not be removed fix #13499finalizers
can not be removed fix #13499
finalizers
can not be removed fix #13499finalizer
when Pods completed. Fixes #13499
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is quite correct. A length of 0
could also mean that an ArtifactGC Pod has yet to run.
In your issue, the ArtifactGC Pod failed, and this is exactly what the forceFinalizerRemoval
field is for
argo-workflows/workflow/controller/operator.go Lines 242 to 250 in ddbb3c7
hi @agilgur5 , artifacts gc will only be running when task reconciliation is completed. I think the pod has finished running at this time. and i think this is a normal operation to delete the workflow, no need to |
TaskResult reconciliation should already be completed. In your issue, I'm pretty sure the ArtifactGC Pod itself fails (due to the architecture). As it failed, the Controller cannot be sure that the artifact was deleted or not |
Hmm, no, ArtifactGC can run on
Or you're experiencing a different race condition from v3.5.3+ / v3.5.5+. This actually sounds like the root cause is #12993, which was recently fixed in #13454 |
Yes, I know that the reason why I failed to run is because the image |
Yes, the second half of my previous comment is relevant with regard to TaskResult reconciliation. This looks like it is caused by a bug in TaskResult reconciliation and not specific to ArtifactGC |
thanks @agilgur5 , i got it |
Ok this makes more sense now given your last comment on the issue, thanks again for elaborating! From my first comment above though:
I think we could potentially remove this variable entirely and just replace with a check that all ArtifactGC Pods were recouped? Otherwise it looks like cc @juliev0 here on the PR as well |
Thanks for finding this. So, I commented in the issue here, but will add here as well: Looking at the I'm surprised we haven't encountered this issue before, of a Workflow that creates no artifacts. |
5119e99
to
287d5f5
Compare
thanks for the modification! The e2e tests are sometimes flakey, so if the e2e test failures are unrelated to your changes, feel free to push empty commits until it passes |
5ecda25
to
0f672b2
Compare
@juliev0 did you see my comment above? I thought the check might be to make sure that there were no leftover ArtifactGC Pods that were not yet recouped. But if it's even simpler than that, that works
Given that we have 2 approvers here, it'd be more efficient for one of us to just rerun the unsuccessful jobs (and any member can do
It's very possible people have been using Speaking of, it'd be nice to add an E2E test for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengjoey do you think you could add a regression test for this scenario? If I'm not mistaken, you should be able to use any main
container that intentionally fails while having output.artifacts
specified
Ok, I will add an e2e test |
cc1c58c
to
cff95ce
Compare
Hmm, does it accomplish that? I'm having trouble following if so. I think it was just to eliminate work. Thanks for adding the test @chengjoey. If I'm not mistaken, I think you should be able to add it as another case to this test. This test verifies that the finalizer is removed by checking that the workflow actually gets deleted. |
finalizer
when Pods completed. Fixes #13499finalizer
when no artifacts. Fixes #13499
cff95ce
to
62e33ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly consistency comments on the test below
test/e2e/testdata/artifactgc/artgc-has-gc-but-failed-no-artifacts.yaml
Outdated
Show resolved
Hide resolved
test/e2e/testdata/artifactgc/artgc-has-gc-but-failed-no-artifacts.yaml
Outdated
Show resolved
Hide resolved
test/e2e/testdata/artifactgc/artgc-has-gc-but-failed-no-artifacts.yaml
Outdated
Show resolved
Hide resolved
…removed Signed-off-by: joey <[email protected]>
62e33ab
to
7185fc7
Compare
thanks @agilgur5 , all has done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the iteration!
Will let Julie take another look at the tests before merging
Yes, thank you @chengjoey ! |
#13499 (#13500) Signed-off-by: joey <[email protected]>
Fixes #13499
Motivation
if workflow pods length is zero,
anyPodSuccess
should be true, then theremoveFinalizer
can be executed without forceModifications
anyPodSuccess := len(pods) == 0
Verification
run
examples/artifact-gc-workflow.yaml
on mac m1 os, and then delete the workflow