Use finalizers to spin up AppRepo clean-up jobs #6647

antgamdia · 2023-08-17T09:50:36Z

Description of the change

Even if the sync jobs were added a security context (by means of each AppRepo CRD), this information was not available for Cleanup jobs. This is mainly due to the fact that those jobs are spun up once a NotFound error is thrown when fetching an AppRepo.
However, Kubernetes does have a native approach for dealing with these scenarios: finalizers.

In #6605 we proposed a simplistic workaround based on adding more params to the controller... but as suggested in #6605 (comment), moving to finalizers is a better long-term solution.

Benefits

Cleanup jobs are now handled within an existing AppRepo context... meaning we have all the syncJobTemplate available to be used (ie, securityPolicies and so on).

Possible drawbacks

When dealing with finalizers in the past I often found it really annoying when they get stuck and prevent the resource to get deleted. I wonder if we should add some info in the FAQ on how to manually remove the finalizers.

Additionally, and this might be something important: for the AppRepo controller to be able to update AppRepos in other namespaces != kubeapps.... (to add the finalizer) it now needs extra RBAC. Before we were just granting ...-appprepos-read.. but now we would need to grant ...-write as well...and I'm not sure we really want to do so.
WDYT, @absoludity ?
Another idea is using an admission policy... but not sure again if we want to deal with that...

~~(I haven't modified the RBAC yet in this PR)~~ Changes have been performed finally

Applicable issues

fixes Cannot configure securityContext of sync and cleanup jobs #6545

Additional information

This PR is based on top of #6646, but the main change to review is 6e70910
The rest is just moving code into separate files, mostly.

Also, I have been taking a look at kubebuilder to create a new controller, based on the sigs.k8s.io/controller-runtime rather than on the workqueues we currently have. While it is pretty easy to start with (see quickstart), I think it is adding too much complexity (using kustomize, adding rbac proxies, prometheus metrics, etc...
I also quickly tried the k8s codegen scripts, but ran into some issues with my setup... but perhaps it's the best option.

IMO, at some point we should start thinking about moving towards a new state-of-the-art k8s controller boilerplate.

Signed-off-by: Antonio Gamez Diaz <[email protected]>

netlify · 2023-08-17T09:50:41Z

✅ Deploy Preview for kubeapps-dev canceled.

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`eb46c13`
🔍 Latest deploy log	https://app.netlify.com/sites/kubeapps-dev/deploys/64e47c018a2645000875ba1f

Signed-off-by: Antonio Gamez Diaz <[email protected]>

…rrencyPolicy` configuration Signed-off-by: Antonio Gamez Diaz <[email protected]>

Signed-off-by: Antonio Gamez Diaz <[email protected]>

antgamdia

Adding some in-line comments. Most changes are just refactoring and separating the code into different files... it was becoming too messy to deal with 😅

antgamdia · 2023-08-17T16:58:20Z

cmd/apprepository-controller/server/controller.go

+	// if the object is not being deleted, check the finalizers
+	if apprepo.GetDeletionTimestamp().IsZero() {
+		// check if it contains the finalizer, if not, add it and update the object
+		if !containsFinalizer(apprepo, finalizerName) {
+			log.Infof("the AppRepository %q doesn't have a finalizer yet, adding one...", apprepo.GetName())
+
+			ok := addFinalizer(apprepo, finalizerName)
+			if !ok {
+				return fmt.Errorf("error adding finalizer to the AppRepository %q", apprepo.GetName())
+			}
+			_, err = c.apprepoclientset.KubeappsV1alpha1().AppRepositories(namespace).Update(context.TODO(), apprepo, metav1.UpdateOptions{})
+			if err != nil {
+				return fmt.Errorf("error updating the AppRepository %q: %v", apprepo.GetName(), err)
+			}
+		}
+	} else {


This is where we need to have more privileges, an AppRepo update is required to add the finalizers.

antgamdia · 2023-08-17T17:01:13Z

cmd/apprepository-controller/server/controller.go

+	// if the object is being deleted and contains a finalizer
+	if !apprepo.GetDeletionTimestamp().IsZero() && containsFinalizer(apprepo, finalizerName) {
+		// if the object is being deleted and contains a finalizer and the event is not a deletion event,
+		// then handle the finalizer-derived clean-up tasks and remove the finalizer
+		if !shouldDelete {
+			log.Infof("Starting the clean-up tasks derived from the finalizer of the AppRepository %q", apprepo.GetName())

-	if apprepo.Spec.FilterRule.JQ != "" {
-		rulesJSON, err := json.Marshal(apprepo.Spec.FilterRule)
-		if err != nil {
-			log.Errorf("Unable to parse filter rules for %s: %v", apprepo.Name, err)
-		} else {
-			args = append(args, "--filter-rules", string(rulesJSON))
-		}
-	}
+			// start the clean-up
+			err = c.cleanUpAppRepo(apprepo, apprepo.GetNamespace(), apprepo.GetName())


I know it sounds a bit hacky/tricky... but when testing IRL (and then confirmed in the k8s repo), a deletion is triggering also update events... so I ended up handling it that way.
I did my best at explaining the code, but if you can think of a better solution, let me know :P

antgamdia · 2023-08-17T17:02:50Z

cmd/apprepository-controller/cmd/root.go

+	c.Flags().Int32Var(&serveOpts.SuccessfulJobsHistoryLimit, "successful-jobs-history-limit", 3, "Number of successful finished jobs to retain")
+	c.Flags().Int32Var(&serveOpts.FailedJobsHistoryLimit, "failed-jobs-history-limit", 1, "Number of failed finished jobs to retain")
+	c.Flags().StringVar(&serveOpts.ConcurrencyPolicy, "concurrency-policy", "Replace", "How to treat concurrent executions of a Job. Valid values are: 'Allow', 'Forbid' and 'Replace'")


I have also added these extra flags just in case we wanted to configure the cronjob. For now, I left the existing default values, so as to the behavior doesn't actually change.

absoludity · 2023-08-21T01:34:10Z

When dealing with finalizers in the past I often found it really annoying when they get stuck and prevent the resource to get deleted. I wonder if we should add some info in the FAQ on how to manually remove the finalizers.

Yeah - might be worth while. I think our finalizer shouldn't cause issues like others do (in that, it's not doing anything complex), but right, we all think that then something unexpected will stop it (DB connection or something). Updating the FAQ sounds like a good idea (once this is out).

Additionally, and this might be something important: for the AppRepo controller to be able to update AppRepos in other namespaces != kubeapps.... (to add the finalizer) it now needs extra RBAC. Before we were just granting ...-appprepos-read.. but now we would need to grant ...-write as well...and I'm not sure we really want to do so. WDYT, @absoludity ? Another idea is using an admission policy... but not sure again if we want to deal with that...

Hmm. I don't have a problem with our controller having write access to app repos in all namespaces? It's our CR, I don't think people generally have an issue with the controller having write access across all namespaces. Or did you mean something else?

Signed-off-by: Antonio Gamez Diaz <[email protected]>

antgamdia · 2023-08-21T12:16:08Z

Hmm. I don't have a problem with our controller having write access to app repos in all namespaces? It's our CR, I don't think people generally have an issue with the controller having write access across all namespaces. Or did you mean something else?

Great, I just wanted to double-check just in case I was missing something. I agree, the CR-controller's SA should be able to manage the resources cluster-wide. I have performed the change at 30861d4. I created a separate ClusterRole because the apprepo-...-write was also granting read on secrets, which is something we don't wanna do cluster-wise.

Yeah - might be worth while.

Done at 1241da1 :)

absoludity

Unreal - thanks Antonio for going the extra mile to do this properly so we don't accumulate more tech-debt!

absoludity · 2023-08-22T00:43:06Z

chart/kubeapps/templates/apprepository/rbac.yaml

+kind: ClusterRole
+metadata:
+  name: {{ template "kubeapps.apprepository.fullname" . }}
+  namespace: {{ .Release.Namespace | quote }}


Not that it matters much, but I don't think we need to specify a namespace for a clusterrole? (as in, it'll have no effect, I think?). Maybe we've specified them in our templates elsewhere and that's why it's here. Fine either way, just not sure if it could cause confusion. Same for the ClusterRoleBinding below.

Ooops, I thought I had actually removed them. They don't have any effect, as you pointed out.
Ah, I know what happened... I copied and pasted the metadata so that all the objects had the same... and I forgot to remove the namespace again! Thanks for catching it!

absoludity · 2023-08-22T01:28:12Z

cmd/apprepository-controller/server/controller.go

@@ -261,18 +262,18 @@ func (c *Controller) processNextWorkItem() bool {
 			// Forget here else we'd go into a loop of attempting to
 			// process a work item that is invalid.
 			c.workqueue.Forget(obj)
-			runtime.HandleError(fmt.Errorf("expected string in workqueue but got %#v", obj))
+			runtime.HandleError(fmt.Errorf("Expected string in workqueue but got %#v", obj))


Any reason for capitalizing these? I know our codebase is not consistent, but if we were making it consistent, I'd go the other way. See https://google.github.io/styleguide/go/decisions.html#error-strings

No reason apart from internal consistency... having mixed-up capitalized log traces was killing me... so I just normalized them using the most common choice (in this piece of code). But I agree, log traces should be lower-cased instead.
If we start changing little by little, at some point we might end up with a consistent codebase :P, which will be helpful if we ever think of exporting metrics with OpenTelemetry or any other stack.

For the record, the regex (debug|warn|Error|Info)(f?)\(("|')\p{Lu} yields 549 upper-cased occurrences across 66 .go files. For now, I'm going to lower-case the ones in apprepo controller.

Just note: I think that google golang style guide has error strings as lower case, while log lines as upper case (next section following the link), so I think the log lines should remain upper case, just the errors like the one here should be lower-cased?

Mmm, you're right... I was mixing up the error strings (fmt.Error....) and the log error traces (log.Errorf...), so:

log.* --> uppercased

fmt.Errorf --> lowercased

Reverting changes in 1696b83 then :P

absoludity · 2023-08-22T01:32:40Z

cmd/apprepository-controller/server/controller.go

-		"--database-url=" + config.DBURL,
-		"--database-user=" + config.DBUser,
-		"--database-name=" + config.DBName,
-	}
 }


Just noting that there's a lot of deletions in this file - guessing they'll be re-added below. EDIT: yep, see them in the job_utils. Mild preference for keeping moves of code like this to separate PRs, but realise it's not always possible when the code is changing too. Sometimes I do the code changes and refrain from moving/reorganising until a follow-up PR, just so it's easier to follow what has actually changed.

Yeah... I should have done that too. In the beginning, I wasn't planning to move the code... but while coding I noticed the endless one-file was starting to be unbearable and I just splitted it apart. Sorry for the inconvenience when reviewing it 😅

absoludity · 2023-08-22T01:38:20Z

cmd/apprepository-controller/server/utils_test.go

+			}
+		})
+	}
+}


Thanks for all the new tests for the new functionality too.

absoludity · 2023-08-22T01:39:37Z

site/content/docs/latest/tutorials/managing-package-repositories.md

+          image: docker.io/kubeapps/asset-syncer:latest
+          name: delete
+EOF
+```


And thanks for the doc update!

Signed-off-by: Antonio Gamez Diaz <[email protected]>

antgamdia added 11 commits August 17, 2023 10:05

Apply same fix in go dockerfiles

98c6122

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Prevent duplicated yaml keys

841dbc7

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Minor fix in test

3320292

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Add missing args in tests

8459c1c

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Remove old code comment

30f6e4a

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Add missing old cron job handling

c8b2128

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Pass security context in apprepo chart tpl properly

52acc2c

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Merge branch 'main' into 6545-add-seccontext-apprepo-1

67c32b6

Update copyright header

feed62d

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Fix wrong values metadata. Generate readme

2f00e80

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Use native finalizers to handle the cleanup

6e70910

Signed-off-by: Antonio Gamez Diaz <[email protected]>

vmwclabot added the cla-not-required label Aug 17, 2023

antgamdia mentioned this pull request Aug 17, 2023

Pass SecurityContext in all the AppRepo Controller's jobs #6605

Closed

antgamdia added 3 commits August 17, 2023 17:12

Refactor appcontroller files

bba00c3

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Add SuccessfulJobsHistoryLimit, FailedJobsHistoryLimit and `Concu…

bfa2f37

…rrencyPolicy` configuration Signed-off-by: Antonio Gamez Diaz <[email protected]>

Fix linter and test issues

5647566

Signed-off-by: Antonio Gamez Diaz <[email protected]>

antgamdia commented Aug 17, 2023

View reviewed changes

antgamdia added 6 commits August 21, 2023 08:56

Merge branch 'main' into 6545-add-seccontext-apprepo-2

5a909f8

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Merge branch 'main' into 6545-add-seccontext-apprepo-2

b87d502

Allow apprepo-ctrl's SA updating AppRepos cluster-wide

30861d4

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Improve logs and refactor code

39480fb

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Add docs

1241da1

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Fix linter issue

d511f1f

Signed-off-by: Antonio Gamez Diaz <[email protected]>

absoludity approved these changes Aug 22, 2023

View reviewed changes

antgamdia added 3 commits August 22, 2023 09:53

Remove useless ns from cluster-scoped rbac

68784d1

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Lower-case log traces

1696b83

Signed-off-by: Antonio Gamez Diaz <[email protected]>

Fix error case

eb46c13

Signed-off-by: Antonio Gamez Diaz <[email protected]>

antgamdia force-pushed the 6545-add-seccontext-apprepo-2 branch from 7ceb5ce to eb46c13 Compare August 22, 2023 09:12

antgamdia merged commit 4e5ac7a into vmware-tanzu:main Aug 22, 2023

antgamdia deleted the 6545-add-seccontext-apprepo-2 branch August 22, 2023 10:20

antgamdia mentioned this pull request Aug 22, 2023

Normalize log/error capitalization #6672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use finalizers to spin up AppRepo clean-up jobs #6647

Use finalizers to spin up AppRepo clean-up jobs #6647

antgamdia commented Aug 17, 2023 •

edited

Loading

netlify bot commented Aug 17, 2023 •

edited

Loading

antgamdia left a comment

antgamdia Aug 17, 2023

antgamdia Aug 17, 2023

antgamdia Aug 17, 2023

absoludity commented Aug 21, 2023

antgamdia commented Aug 21, 2023

absoludity left a comment

absoludity Aug 22, 2023

antgamdia Aug 22, 2023

absoludity Aug 22, 2023

antgamdia Aug 22, 2023

absoludity Aug 22, 2023

antgamdia Aug 22, 2023

absoludity Aug 22, 2023

antgamdia Aug 22, 2023

absoludity Aug 22, 2023

absoludity Aug 22, 2023

Use finalizers to spin up AppRepo clean-up jobs #6647

Use finalizers to spin up AppRepo clean-up jobs #6647

Conversation

antgamdia commented Aug 17, 2023 • edited Loading

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

netlify bot commented Aug 17, 2023 • edited Loading

✅ Deploy Preview for kubeapps-dev canceled.

antgamdia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

absoludity commented Aug 21, 2023

antgamdia commented Aug 21, 2023

absoludity left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antgamdia commented Aug 17, 2023 •

edited

Loading

netlify bot commented Aug 17, 2023 •

edited

Loading