-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use finalizers to spin up AppRepo clean-up jobs #6647
Use finalizers to spin up AppRepo clean-up jobs #6647
Conversation
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
✅ Deploy Preview for kubeapps-dev canceled.Built without sensitive environment variables
|
Signed-off-by: Antonio Gamez Diaz <[email protected]>
…rrencyPolicy` configuration Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some in-line comments. Most changes are just refactoring and separating the code into different files... it was becoming too messy to deal with 😅
// if the object is not being deleted, check the finalizers | ||
if apprepo.GetDeletionTimestamp().IsZero() { | ||
// check if it contains the finalizer, if not, add it and update the object | ||
if !containsFinalizer(apprepo, finalizerName) { | ||
log.Infof("the AppRepository %q doesn't have a finalizer yet, adding one...", apprepo.GetName()) | ||
|
||
ok := addFinalizer(apprepo, finalizerName) | ||
if !ok { | ||
return fmt.Errorf("error adding finalizer to the AppRepository %q", apprepo.GetName()) | ||
} | ||
_, err = c.apprepoclientset.KubeappsV1alpha1().AppRepositories(namespace).Update(context.TODO(), apprepo, metav1.UpdateOptions{}) | ||
if err != nil { | ||
return fmt.Errorf("error updating the AppRepository %q: %v", apprepo.GetName(), err) | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we need to have more privileges, an AppRepo update is required to add the finalizers.
// if the object is being deleted and contains a finalizer | ||
if !apprepo.GetDeletionTimestamp().IsZero() && containsFinalizer(apprepo, finalizerName) { | ||
// if the object is being deleted and contains a finalizer and the event is not a deletion event, | ||
// then handle the finalizer-derived clean-up tasks and remove the finalizer | ||
if !shouldDelete { | ||
log.Infof("Starting the clean-up tasks derived from the finalizer of the AppRepository %q", apprepo.GetName()) | ||
|
||
if apprepo.Spec.FilterRule.JQ != "" { | ||
rulesJSON, err := json.Marshal(apprepo.Spec.FilterRule) | ||
if err != nil { | ||
log.Errorf("Unable to parse filter rules for %s: %v", apprepo.Name, err) | ||
} else { | ||
args = append(args, "--filter-rules", string(rulesJSON)) | ||
} | ||
} | ||
// start the clean-up | ||
err = c.cleanUpAppRepo(apprepo, apprepo.GetNamespace(), apprepo.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it sounds a bit hacky/tricky... but when testing IRL (and then confirmed in the k8s repo), a deletion is triggering also update events... so I ended up handling it that way.
I did my best at explaining the code, but if you can think of a better solution, let me know :P
c.Flags().Int32Var(&serveOpts.SuccessfulJobsHistoryLimit, "successful-jobs-history-limit", 3, "Number of successful finished jobs to retain") | ||
c.Flags().Int32Var(&serveOpts.FailedJobsHistoryLimit, "failed-jobs-history-limit", 1, "Number of failed finished jobs to retain") | ||
c.Flags().StringVar(&serveOpts.ConcurrencyPolicy, "concurrency-policy", "Replace", "How to treat concurrent executions of a Job. Valid values are: 'Allow', 'Forbid' and 'Replace'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also added these extra flags just in case we wanted to configure the cronjob. For now, I left the existing default values, so as to the behavior doesn't actually change.
Yeah - might be worth while. I think our finalizer shouldn't cause issues like others do (in that, it's not doing anything complex), but right, we all think that then something unexpected will stop it (DB connection or something). Updating the FAQ sounds like a good idea (once this is out).
Hmm. I don't have a problem with our controller having write access to app repos in all namespaces? It's our CR, I don't think people generally have an issue with the controller having write access across all namespaces. Or did you mean something else? |
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Great, I just wanted to double-check just in case I was missing something. I agree, the CR-controller's SA should be able to manage the resources cluster-wide. I have performed the change at 30861d4. I created a separate ClusterRole because the apprepo-...-write was also granting
Done at 1241da1 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreal - thanks Antonio for going the extra mile to do this properly so we don't accumulate more tech-debt!
kind: ClusterRole | ||
metadata: | ||
name: {{ template "kubeapps.apprepository.fullname" . }} | ||
namespace: {{ .Release.Namespace | quote }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters much, but I don't think we need to specify a namespace for a clusterrole? (as in, it'll have no effect, I think?). Maybe we've specified them in our templates elsewhere and that's why it's here. Fine either way, just not sure if it could cause confusion. Same for the ClusterRoleBinding below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, I thought I had actually removed them. They don't have any effect, as you pointed out.
Ah, I know what happened... I copied and pasted the metadata so that all the objects had the same... and I forgot to remove the namespace
again! Thanks for catching it!
@@ -261,18 +262,18 @@ func (c *Controller) processNextWorkItem() bool { | |||
// Forget here else we'd go into a loop of attempting to | |||
// process a work item that is invalid. | |||
c.workqueue.Forget(obj) | |||
runtime.HandleError(fmt.Errorf("expected string in workqueue but got %#v", obj)) | |||
runtime.HandleError(fmt.Errorf("Expected string in workqueue but got %#v", obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for capitalizing these? I know our codebase is not consistent, but if we were making it consistent, I'd go the other way. See https://google.github.io/styleguide/go/decisions.html#error-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason apart from internal consistency... having mixed-up capitalized log traces was killing me... so I just normalized them using the most common choice (in this piece of code). But I agree, log traces should be lower-cased instead.
If we start changing little by little, at some point we might end up with a consistent codebase :P, which will be helpful if we ever think of exporting metrics with OpenTelemetry or any other stack.
For the record, the regex (debug|warn|Error|Info)(f?)\(("|')\p{Lu}
yields 549 upper-cased occurrences across 66 .go
files. For now, I'm going to lower-case the ones in apprepo controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just note: I think that google golang style guide has error strings as lower case, while log lines as upper case (next section following the link), so I think the log lines should remain upper case, just the errors like the one here should be lower-cased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, you're right... I was mixing up the error strings (fmt.Error....) and the log error traces (log.Errorf...), so:
- log.* --> uppercased
- fmt.Errorf --> lowercased
Reverting changes in 1696b83 then :P
"--database-url=" + config.DBURL, | ||
"--database-user=" + config.DBUser, | ||
"--database-name=" + config.DBName, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that there's a lot of deletions in this file - guessing they'll be re-added below. EDIT: yep, see them in the job_utils. Mild preference for keeping moves of code like this to separate PRs, but realise it's not always possible when the code is changing too. Sometimes I do the code changes and refrain from moving/reorganising until a follow-up PR, just so it's easier to follow what has actually changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I should have done that too. In the beginning, I wasn't planning to move the code... but while coding I noticed the endless one-file was starting to be unbearable and I just splitted it apart. Sorry for the inconvenience when reviewing it 😅
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the new tests for the new functionality too.
image: docker.io/kubeapps/asset-syncer:latest | ||
name: delete | ||
EOF | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thanks for the doc update!
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
Signed-off-by: Antonio Gamez Diaz <[email protected]>
7ceb5ce
to
eb46c13
Compare
Description of the change
Even if the sync jobs were added a security context (by means of each AppRepo CRD), this information was not available for Cleanup jobs. This is mainly due to the fact that those jobs are spun up once a NotFound error is thrown when fetching an AppRepo.
However, Kubernetes does have a native approach for dealing with these scenarios: finalizers.
In #6605 we proposed a simplistic workaround based on adding more params to the controller... but as suggested in #6605 (comment), moving to finalizers is a better long-term solution.
Benefits
Cleanup jobs are now handled within an existing AppRepo context... meaning we have all the syncJobTemplate available to be used (ie, securityPolicies and so on).
Possible drawbacks
When dealing with finalizers in the past I often found it really annoying when they get stuck and prevent the resource to get deleted. I wonder if we should add some info in the FAQ on how to manually remove the finalizers.
Additionally, and this might be something important: for the AppRepo controller to be able to
update
AppRepos in other namespaces != kubeapps.... (to add the finalizer) it now needs extra RBAC. Before we were just granting...-appprepos-read
.. but now we would need to grant...-write
as well...and I'm not sure we really want to do so.WDYT, @absoludity ?
Another idea is using an admission policy... but not sure again if we want to deal with that...
(I haven't modified the RBAC yet in this PR)Changes have been performed finallyApplicable issues
Additional information
This PR is based on top of #6646, but the main change to review is 6e70910
The rest is just moving code into separate files, mostly.
Also, I have been taking a look at
kubebuilder
to create a new controller, based on thesigs.k8s.io/controller-runtime
rather than on the workqueues we currently have. While it is pretty easy to start with (see quickstart), I think it is adding too much complexity (using kustomize, adding rbac proxies, prometheus metrics, etc...I also quickly tried the k8s codegen scripts, but ran into some issues with my setup... but perhaps it's the best option.
IMO, at some point we should start thinking about moving towards a new state-of-the-art k8s controller boilerplate.