Graceful deletion of a VirtualMachineInstance#145
Conversation
0658eb0 to
14fabcf
Compare
Pull Request Test Coverage Report for Build 2785024320
💛 - Coveralls |
davidvossel
left a comment
There was a problem hiding this comment.
great start!
I left a few comments in line and have one kind of higher level comment.
We need a drain timeout, where if a drain doesn't complete after x minutes (regardless of how many times drain is retried), we delete the VMI regardless. This timeout is meant to ensure that we won't block the removal of a VMI indefinitely when a infra node is being drained.
| // KubeVirt will set the EvacuationNodeName field in case of guest node eviction. If the field is not set, there is | ||
| // nothing to do. | ||
| nodeName := vmi.Status.EvacuationNodeName | ||
| if len(nodeName) == 0 { // no need to drain |
There was a problem hiding this comment.
In addition to the nodeName != "" check here, we should also have EvictionStrategy == External.
This is kind of a forward looking check that ensures in the future we only attempt to drain nodes that aren't live migratable.
| // now, when the node is drained, we can safely delete the VMI | ||
| propagationPolicy := metav1.DeletePropagationForeground | ||
| err = r.Delete(ctx, vmi, &client.DeleteOptions{PropagationPolicy: &propagationPolicy}) |
There was a problem hiding this comment.
I think it would make sense to wrap this section in a if vmi.DeletionTimestamp == nil and only delete if the VMI is not already marked for deletion.
There was a problem hiding this comment.
Agree, can be validated even before calling to drainNode
| if !apierrors.IsAlreadyExists(err) { | ||
| return ctrl.Result{RequeueAfter: 20 * time.Second}, err | ||
| } |
There was a problem hiding this comment.
if we check for the DeletionTimestamp, we shouldn't have to qualify what error occurred here. We also shouldn't need the RequeueAfter.
| @@ -0,0 +1,212 @@ | |||
| package controllers | |||
There was a problem hiding this comment.
I think need to give more meaningful name to this file, something like 'vmievacuation_controller.go
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| nodeDrained, retryDuration, err := r.drainNode(ctx, cluster, nodeName, logger) |
There was a problem hiding this comment.
Why there should be cases returning nodeDrained=false instead of just returning the error?
There was a problem hiding this comment.
Why there should be cases returning
nodeDrained=falseinstead of just returning the error?
Because for most cases (but not all of them), we don't want to retry. returning error with the result will cause retry. The calling function can't know which error to return with the result and which error to ignore.
| // now, when the node is drained, we can safely delete the VMI | ||
| propagationPolicy := metav1.DeletePropagationForeground | ||
| err = r.Delete(ctx, vmi, &client.DeleteOptions{PropagationPolicy: &propagationPolicy}) |
There was a problem hiding this comment.
Agree, can be validated even before calling to drainNode
davidvossel
left a comment
There was a problem hiding this comment.
great stuff, i left one comment in line.
Have you thought about how to e2e test this?
| if err = kubedrain.RunNodeDrain(drainer, node.Name); err != nil { | ||
| // Machine will be re-reconciled after a drain failure. | ||
| logger.Error(err, "Drain failed, retry in 20s", "node name", nodeName) | ||
| return false, 20 * time.Second, nil |
There was a problem hiding this comment.
we need a global timeout for drain as well. Something that ensures we eventually give up attempting to drain and delete the VMI after x minutes. 10 minutes would likely be a pretty conservative number to use.
There was a problem hiding this comment.
an annotation on the VMI could be used to track when the drain begins. The timeout could be anchored to the start time recorded in the annotatino.
|
/ok-to-test |
cf1e68c to
61f8a47
Compare
24ff909 to
a1ec316
Compare
| }) | ||
| }).WithTimeout(time.Minute*5). | ||
| WithPolling(time.Second*10). | ||
| Should(Succeed(), "pod eviction failed") |
There was a problem hiding this comment.
it's okay for eviction to fail, this is what we check for in the kubevirt e2e test that exercises similar behavior
err = virtClient.CoreV1().Pods(vmi.Namespace).EvictV1beta1(context.Background(), &policyv1beta1.Eviction{ObjectMeta: metav1.ObjectMeta{Name: pod.Name}})
// The "too many requests" err is what get's returned when an
// eviction would invalidate a pdb. This is what we want to see here.
Expect(errors.IsTooManyRequests(err)).To(BeTrue())
| vmiName := chosenVMI.Name | ||
| vmiUID := chosenVMI.GetUID() |
There was a problem hiding this comment.
I found it confusing to declare a value for this vmiUID and then have it immediately overwritten 3 lines down.
| DeleteOptions: &metav1.DeleteOptions{ | ||
| GracePeriodSeconds: pointer.Int64(60 * 10), // 10 minutes | ||
| }, | ||
| })).ShouldNot(Succeed()) |
There was a problem hiding this comment.
to be accurate, we're looking for a specific error here, Expect(errors.IsTooManyRequests(err)).To(BeTrue()).
| vmi, err = virtClient.VirtualMachineInstance(namespace).Get(vmiName, &metav1.GetOptions{}) | ||
| g.Expect(err).ShouldNot(HaveOccurred()) | ||
|
|
||
| vmiDebugPrintout(vmi) | ||
|
|
||
| g.Expect(vmi.Status.EvacuationNodeName).ShouldNot(BeEmpty()) | ||
| g.Expect(vmi.DeletionTimestamp).ShouldNot(BeNil()) |
There was a problem hiding this comment.
Since we have a controller that is deleting a VMI when evacuation node name is set, I think you should put a finalizer on the vmi in order to reliably observe EvacuationNodeName. otherwise it's possible the VMI could disappear before we see it in the functional test.
| By("Read the worker node from the tenant cluster, and validate its IP") | ||
| Eventually(func(g Gomega) bool { | ||
| // reading the node and the VMI again and again, because it takes time to the IPs to be synchronized | ||
| node, err := clientSet.CoreV1().Nodes().Get(context.Background(), vmiName, metav1.GetOptions{}) | ||
| g.Expect(err).ToNot(HaveOccurred()) | ||
|
|
||
| var nodeIp string | ||
| for _, address := range node.Status.Addresses { | ||
| if address.Type == "InternalIP" { | ||
| nodeIp = address.Address | ||
| } | ||
| } | ||
|
|
||
| g.Expect(nodeIp).ShouldNot(BeEmpty(), "node's IP is not set") | ||
|
|
||
| vmi, err := virtClient.VirtualMachineInstance(namespace).Get(chosenVMI.Name, &metav1.GetOptions{}) | ||
|
|
||
| g.Expect(err).ShouldNot(HaveOccurred()) | ||
| g.Expect(vmi).ShouldNot(BeNil()) | ||
|
|
||
| for _, ifs := range vmi.Status.Interfaces { | ||
| for _, ip := range ifs.IPs { | ||
| if ip == nodeIp { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
| return false | ||
|
|
||
| }).WithTimeout(5 * time.Minute).WithPolling(10 * time.Second).Should(BeTrue()) |
There was a problem hiding this comment.
i think it would make sense to move this to a helper function
| nodeDrained, retryDuration, err := r.drainNode(ctx, cluster, vmi.Status.EvacuationNodeName, logger) | ||
| if err != nil { | ||
| return ctrl.Result{RequeueAfter: retryDuration}, err | ||
| } | ||
|
|
||
| if !nodeDrained && r.waitingForTimeout(ctx, vmi, logger) { | ||
| return ctrl.Result{RequeueAfter: retryDuration}, nil | ||
| } |
There was a problem hiding this comment.
The drain can block indefinitely due to the way this logic is structured. The timeout needs to be checked either regardless if the drainNode() function returns an error or before drainNode is even called. If the global timeout occurs, the VMI should be deleted.
We don't want anything within the guest cluster to indefinitely block an infra cluster node from being drained.
When a host node is deleted, the guest VMIs are evicted, killing the running tenant node in this VMI. This PR gracefully delete the VMI, by first drain the guest node. Only if the drain successful, CAPK will delete the VMI. The PR is based on a feature in KubeVirt, where by setting the eviction strategy to "external" KubeVirt will not delete the VMI, but will set the `vmi.Status.EvacuationNodeName` field to signal to the external controller (CAPK in this case) that the VMI should be evicted. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Also, add unit tests. Needed to changed the logic to get the guest cluster client, to use the workloadClient type, because the previous code was not testable. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
If drain is failing for more than 10 minutes, delete the VMI anyway. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
e7dfa69 to
b1e5a0b
Compare
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
davidvossel
left a comment
There was a problem hiding this comment.
/lgtm
/approve
excellent work 👍
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel, nunnatsa The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When a host node is deleted, the guest VMIs are evicted, killing the
running tenant node in this VMI.
This PR gracefully delete the VMI, by first drain the guest node. Only
if the drain successful, CAPK will delete the VMI.
The PR is based on a feature in KubeVirt, where by setting the eviction
strategy to "external" KubeVirt will not delete the VMI, but will set
the
vmi.Status.EvacuationNodeNamefield to signal to the externalcontroller (CAPK in this case) that the VMI should be evicted.
Known Issues
Signed-off-by: Nahshon Unna-Tsameret nunnatsa@redhat.com
Release notes: