Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add delete backup cmd using finalizer and simplify GC process #252

Merged
merged 3 commits into from
Jan 2, 2018

Conversation

skriss
Copy link
Contributor

@skriss skriss commented Dec 15, 2017

Signed-off-by: Steve Kriss [email protected]

Fixes #212
Fixes #77

@ncdc PTAL and provide feedback before I fix unit tests. I've manually tested successfully.

@skriss skriss requested review from ncdc and nrb December 15, 2017 01:03
Copy link
Contributor

@ncdc ncdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Need to do another pass to review the actual gc changes

@@ -0,0 +1,41 @@
/*
Copyright 2017 Heptio Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project ark contribs

@@ -229,6 +229,9 @@ func (controller *backupController) processBackup(key string) error {
// set backup version
backup.Status.Version = backupVersion

// add GC finalizer
backup.Finalizers = append(backup.Finalizers, gcFinalizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only add if it doesn't exist

var _ Interface = &gcController{}
backupInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
UpdateFunc: func(old interface{}, new interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a real function so we can unit test it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since new is a reserved keyword, it might be better not to use it. Looking at the kube source, I've seen

  • func(oldObj, newObj interface{})
  • func(old, cur interface{})

I think I prefer the first form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since you're not using old, you can just do func(_, newObj interface{})

}
log.Debugf("Backup has finalizers %s", backup.Finalizers)

for x, finalizer := range backup.Finalizers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd extract this to a function that just returns a bool. No need for idx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return
}

// no more finalizers: we don't need to issue another delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "There are other finalizers, so we won't issue a delete as other controllers still need to process" along with a len >= 1 check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure on how this is supposed to work. I'm going off of https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/#finalizers which says:

...Each controller then removes its finalizer from the list and issues the delete request again. This request only deletes the object if the list of finalizers is now empty, meaning all finalizers are done.

However, the behavior I was seeing was that if I issued a delete after removing the last/only finalizer, it would fail with object not found; the object appeared to be deleted once the last finalizer had been removed.

@skriss skriss force-pushed the delete-backup branch 4 times, most recently from 4e95706 to 9ccb14e Compare December 15, 2017 21:16
@skriss skriss changed the title [WIP] add delete backup cmd using finalizer and simplify GC process add delete backup cmd using finalizer and simplify GC process Dec 18, 2017
@skriss skriss added this to the v0.7.0 milestone Dec 18, 2017
Copy link
Contributor

@ncdc ncdc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: CRD finalizers require kube 1.7.5 or newer.

backupCommand.Aliases = []string{"backups"}

c.AddCommand(
backupCommand,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wire in schedules and restores too? Or in a follow-up?

log.WithError(err).Error("Error marshaling finalizers patch")
}

upd, err := c.backupClient.Backups(api.DefaultNamespace).Patch(backup.Name, types.MergePatchType, patchBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use backup.Namespace instead of api.DefaultNamespace so we don't have to come back later and change it (to support an arbitrary namespace for ark)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to pull in client-go's retry package and use retry.RetryOnConflict. Or we could just stop here on conflict and let the next run of the loop try again.

c.processBackups()
}
now := c.clock.Now()
c.logger.Info("Garbage-collecting backups that have expired as of now")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove "that have expired as of now" 😄

for _, backup := range backups {
log := c.logger.WithField("backup", kube.NamespaceAndName(backup))
if backup.Status.Expiration.Time.After(now) {
log.Info("Backup has not expired yet, skipping")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need this at the info level still?

// since backups have a finalizer, this will actually have the effect of setting a deletionTimestamp and calling
// an update. The update will be handled by this controller and will result in a deletion of the obj storage
// files and the API object.
if err := c.backupClient.Backups(api.DefaultNamespace).Delete(backup.Name, &metav1.DeleteOptions{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup.Namespace

return
}

// no more finalizers: we don't need to issue another delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to api machinery: no need to issue additional deletes. The only action we need to perform is the patch to remove our finalizer.

}

log.Infof("Garbage-collecting backup")
if err := c.garbageCollect(backup, log); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this call errors once, it's entirely possible (likely even?) that it will continue to error the next time we try it for the same backup. It might be worth considering switching to a rate limited queue (like our other controllers), or keeping track of how many times we've tried to delete (via annotations) and stopping after some fixed limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is PR-blocking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

expectedObjectStorageDeletions: sets.NewString("backup-1"),
},
{
name: "if snapshot deletion fails, don't delete kube backup and return error",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't deleting the kube backup from garbageCollect any more

snapshots sets.String
backupFiles sets.String
backupMetadataFiles sets.String
restores []*api.Restore
nilSnapshotService bool
expectedErr bool
expectedRestoreDeletes []string
expectedBackupDelete string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used any more by this test

snapshots sets.String
backupFiles sets.String
backupMetadataFiles sets.String
restores []*api.Restore
nilSnapshotService bool
expectedErr bool
expectedRestoreDeletes []string
expectedBackupDelete string
expectedSnapshots sets.String
expectedObjectStorageDeletions sets.String
}{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may just want to convert this to a few test cases:

  1. nil snapshot server, backup has snapshots, return error (ensure that no DeleteSnapshot, DeleteBackupDir, restore Delete calls are issued (handled by mock.AssertExpectations etc))
  2. happy path
  3. do as much as you can (some DeleteSnapshot errors & successes, DeleteBackupDir error & success, some restore Delete errors & successes) and return error(s) encountered

@ncdc
Copy link
Contributor

ncdc commented Dec 20, 2017

Need unit test coverage on handleFinalizer

upd, err := c.backupClient.Backups(api.DefaultNamespace).Patch(backup.Name, types.MergePatchType, patchBytes)
if err != nil {
if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
_, err = c.backupClient.Backups(backup.Namespace).Patch(backup.Name, types.MergePatchType, patchBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use RetryOnConflict, we have to regenerate the patchMap in the anonymous func, based on an updated copy of the backup (either from the lister, or using a real client to fetch the data from the apiserver). Otherwise we'll keep sending the same patch and getting the same conflict each time. That's why I was suggesting it might be easier to just let the next iteration handle it.

@skriss
Copy link
Contributor Author

skriss commented Dec 20, 2017

addressed everything except the rate-limited queue comment.

@ncdc
Copy link
Contributor

ncdc commented Dec 21, 2017

This also fixes #77, doesn't it?

}
log.Debugf("Backup has finalizers %s", backup.Finalizers)

// only process the GC finalizer when it's the first item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the kube code and can't find any place where it only acts on a finalizer if it's the first element in the slice. Let's go back to proceeding if the slice contains our finalizer, regardless of position. Sorry for the flip-flop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, easy to switch back.

continue
restoreLog.Info("Deleting restore referencing backup")
if err := c.restoreClient.Restores(restore.Namespace).Delete(restore.Name, &metav1.DeleteOptions{}); err != nil {
restoreLog.WithError(errors.WithStack(err)).Error("Error deleting restore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should just be an error so we consider the GC of the backup a failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, in my opinion no -- I'm not too worried about orphaned restores (since they don't cost any cloud resources), plus that would have the result of leaving the backup API object around, despite its tarball/etc. potentially having been deleted, which could make a user believe it's still restorable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

@skriss
Copy link
Contributor Author

skriss commented Dec 21, 2017

Re: #77 , it doesn't address it directly but does mean that Ark won't orphan files in obj storage. However, if a user manually deletes a backup API obj, or manually adds files to obj storage, Ark won't be able to clean them up.

Actually, if a user manually deletes an API obj, it'll get re-synced so that case is fine. So the only case we don't handle is files manually added to the bucket outside of Ark. I don't think we need to handle that.

@ncdc
Copy link
Contributor

ncdc commented Dec 21, 2017

So the only case we don't handle is files manually added to the bucket outside of Ark

I'm pretty sure BackupService.DeleteBackupDir deletes everything

@skriss
Copy link
Contributor Author

skriss commented Dec 21, 2017

everything within a backup dir, yes. I think we can consider #77 addressed and if some other related issue pops up we can create a new one.

@nrb
Copy link
Contributor

nrb commented Dec 21, 2017

Probably worth making a release note for the version, though I don't think we currently have a process for doing so til release time.

cmd.CheckError(err)

if !serverVersion.AtLeast(controller.MinVersionForDelete) {
cmd.CheckError(fmt.Errorf("this command requires the Kubernetes server version to be at least %s", controller.MinVersionForDelete))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to do errors.Errorf for consistency with the rest of the code base? (Apparently we're still using it in pkg/restore/restore.go when adding warnings & errors to the restore result.)

@ncdc
Copy link
Contributor

ncdc commented Dec 22, 2017

1 minor q

@skriss
Copy link
Contributor Author

skriss commented Dec 22, 2017

sure, updated.

@skriss
Copy link
Contributor Author

skriss commented Jan 2, 2018

@ncdc did you have anything further here? this is ready to merge from my perspective.

@ncdc
Copy link
Contributor

ncdc commented Jan 2, 2018

Nope, merging

@ncdc ncdc merged commit 6b0b637 into vmware-tanzu:master Jan 2, 2018
@skriss skriss deleted the delete-backup branch January 2, 2018 19:27
shubham-pampattiwar pushed a commit to shubham-pampattiwar/velero that referenced this pull request Apr 6, 2023
…ore-priority

Add vsb to restore priority
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants