-
Notifications
You must be signed in to change notification settings - Fork 523
Add vcluster snapshot create
command
#3164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
75a63d3
to
dc28fef
Compare
pkg/cli/snapshot_helm.go
Outdated
responseBody := &bytes.Buffer{} | ||
_, err = responseBody.ReadFrom(response.Body) | ||
if err != nil { | ||
return fmt.Errorf("failed to read response body: %w", err) | ||
} | ||
snapshotRequestResultJSON := responseBody.Bytes() | ||
var snapshotRequestResult snapshot.Request | ||
err = json.Unmarshal(snapshotRequestResultJSON, &snapshotRequestResult) | ||
if err != nil { | ||
return fmt.Errorf("failed to unmarshal snapshot request result: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can accomplish the body reading and unmarshalling with fewer intermediate steps. Something like
var snapshotRequestResult snapshot.Request
if err = json.NewDecoder(response.Body).Decode(&snapshotRequestResult); err != nil {
return fmt.Errorf("failed to unmarshal snapshot request result: %w", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will check this out!
pkg/server/routes/snapshots.go
Outdated
|
||
// create the snapshot request Secret and ConfigMap | ||
// - for shared nodes, create resources in the host cluster in the vCluster namespace | ||
// - for private nodes, create resources in the virtual cluster in the kube-system namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For private nodes can we not use the vCluster namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For private nodes the snapshot request resources are created inside of the virtual cluster, so theoretically we can create them in any namespace. I would save them in some namespace like vcluster-system
, but AFAIK we don't have that, so I went with kube-system
.
For shared nodes OTOH, the snapshot request resources are created inside of the host cluster, so vCluster namespace makes the most sense.
P.S. Theoretically we could also use vCluster namespace on the host for private nodes as well, but during the design phase (multiple meets) we have decided to use the virtual cluster for private nodes, and the host cluster for shared nodes.
pkg/snapshot/request.go
Outdated
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: vClusterNamespace, | ||
Labels: map[string]string{ | ||
RequestLabel: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be good to have the vcluster name in the label values? in case we want to filter down snapshots request from an specific vCluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can filter them by vcluster namespace, e.g.
kubectl get configmaps -n $VCLUSTER_NAME -l vcluster.loft.sh/snapshot-request
NAME DATA AGE
snapshot-request-2jvb9 1 15h
snapshot-request-6cxg8 1 15s
snapshot-request-bv7pm 1 14h
snapshot-request-c8dbt 1 18s
snapshot-request-clqkf 1 19h
snapshot-request-ftnhd 1 18h
snapshot-request-n6rc9 1 15h
snapshot-request-vm5dg 1 14h
Also we should also add a new command for listing the snapshots (and in-progress snapshot requests), e.g. vcluster snapshot list
, so checking the snapshots should be even easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the vcluster label in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should also add a new command for listing the snapshots
That is the reason I was asking but I guess many snapshots can be happening in parallel and even to see the history of snapshots that have been taken
localPort := RandomPort() | ||
errorChan := make(chan error) | ||
go func() { | ||
errorChan <- portforward.StartPortForwardingWithRestart(ctx, kubeConfig, "127.0.0.1", podName, vCluster.Namespace, strconv.Itoa(localPort), "8443", make(chan struct{}), portForwardingOptions.StdOut, portForwardingOptions.StdErr, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen with the port forwarding if the command is interrupted? Would this process get cancelled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure TBH, I have just used the existing function (moved it here so it can be reused) for getting the virtual cluster config, so I didn't dig into it to see how it works in detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I was wondering because we have an issue where the vcluster was being paused and not resumed due to the restore command being interrupted https://linear.app/loft/issue/ENG-8545/bug-vcluster-restore-leaves-pod-scaled-down-if-interrupted-preventing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here specifically, if you interrupt the CLI command while it's trying to get the kubeconfig for accessing the virtual cluster, then the snapshot request shouldn't be even sent, because it first has to connect in order to get the access to the virtual server.
This reverts commit f1413bc.
Co-authored-by: Ryan Swanson <[email protected]>
550cdb2
to
35ca0c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions for improvement in the e2e tests
}) | ||
// check deployment is deleted | ||
Eventually(func() error { | ||
_, err := f.VClusterClient.CoreV1().Secrets(testNamespaceName).List(f.Context, metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should check if the deployment was deleted here
if container.State.Running == nil || !container.Ready { | ||
return fmt.Errorf("pod %s container %s is not running", pod.Name, container.Name) | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for not found?
Containers: []corev1.Container{ | ||
{ | ||
Name: "example-container", | ||
Image: "nginx:1.25.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move this into a constant
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce}, | ||
Resources: corev1.VolumeResourceRequirements{ | ||
Requests: corev1.ResourceList{ | ||
corev1.ResourceStorage: resource.MustParse("5Gi"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move this to a constant
return fmt.Errorf("pod %s has no container status", pod.Name) | ||
} | ||
// check configmap is deleted | ||
Eventually(func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can move this function to a generic one verifyResourceDeleted(resourceType) error
What issue type does this pull request address? (keep at least one, remove the others)
/kind enhancement
What does this pull request do? Which issues does it resolve? (use
resolves #<issue_number>
if possible)resolves ENG-8530 and ENG-8531
Please provide a short message that should be published in the vcluster release notes
Add
vcluster snapshot create
command that creates snapshots asynchronously.What else do we need to know?
vcluster snapshot create
creates a snapshot request, which is then reconciled by a vCluster snapshot controller that creates snapshots in the background.vcluster.loft.sh/snapshot-request
label (ATM the ConfigMap is used mostlyvcluster snapshot create
command sends a POST request to the vCluster/vcluster/snapshots
route/vcluster/snapshots
route handler creates snapshot request resourcesOld
vcluster snapshot
command has been deprecated and it continues to work like before.e2e snapshot test have been updated to test both the old CLI-based
vcluster snapshot
command and the new controller-basedvcluster snapshot create
command.vcluster snapshot
command, so it's important to keep running the tests for the old command, until the platform is updated to use the newvcluster snapshot create
command.vcluster snapshot create
creates the snapshot in the container on the pathcontainer:///snapshot-data/snapshot.tar.gz
. The snapshot container path is from a PVC (created in the CI) that is mounted to vcluster container, so that restore pod can use it when restoring. The/data/...
container path could not have been be used, because some e2e tests (e.g. HA setup) useemptyDir
for/data
, which is not accessible by a separate restore pod.