Add `vcluster snapshot create` command #3164

nprokopic · 2025-09-02T21:56:45Z

What issue type does this pull request address? (keep at least one, remove the others)
/kind enhancement

What does this pull request do? Which issues does it resolve? (use resolves #<issue_number> if possible)
resolves ENG-8530 and ENG-8531

Please provide a short message that should be published in the vcluster release notes
Add vcluster snapshot create command that creates snapshots asynchronously.

What else do we need to know?
vcluster snapshot create creates a snapshot request, which is then reconciled by a vCluster snapshot controller that creates snapshots in the background.

Snapshot request consists of two resources:
- ConfigMap: vCluster snapshot controller reconciles the ConfigMaps with a special vcluster.loft.sh/snapshot-request label (ATM the ConfigMap is used mostly
- Secret: Used to temporarily save snapshot options (including credentials), so that vCluster snapshot controller can access the snapshot location. The snapshot request Secret is deleted as soon as the snapshot request has been processed (either completed successfully or failed)
vcluster snapshot create command sends a POST request to the vCluster /vcluster/snapshots route
/vcluster/snapshots route handler creates snapshot request resources
for vCluster with shared nodes, snapshot request resources are created in the host cluster
for vCluster with private nodes, snapshot request resources are created in the virtual cluster

Old vcluster snapshot command has been deprecated and it continues to work like before.

e2e snapshot test have been updated to test both the old CLI-based vcluster snapshot command and the new controller-based vcluster snapshot create command.

The latest platform release uses the old vcluster snapshot command, so it's important to keep running the tests for the old command, until the platform is updated to use the new vcluster snapshot create command.
Tests specs are identical for both CLI-based and controller-based snapshot command.
The test for vcluster snapshot create creates the snapshot in the container on the path container:///snapshot-data/snapshot.tar.gz. The snapshot container path is from a PVC (created in the CI) that is mounted to vcluster container, so that restore pod can use it when restoring. The /data/... container path could not have been be used, because some e2e tests (e.g. HA setup) use emptyDir for /data, which is not accessible by a separate restore pod.

pkg/util/clihelper/virtual.go

pkg/cli/snapshot_helm.go

pkg/snapshot/controller.go

pkg/server/routes/snapshots.go

pkg/cli/snapshot_helm.go

pkg/server/routes/snapshots.go

pkg/snapshot/controller.go

chart/templates/role.yaml

zerbitx · 2025-09-10T19:49:46Z

pkg/cli/snapshot_helm.go

+	responseBody := &bytes.Buffer{}
+	_, err = responseBody.ReadFrom(response.Body)
+	if err != nil {
+		return fmt.Errorf("failed to read response body: %w", err)
+	}
+	snapshotRequestResultJSON := responseBody.Bytes()
+	var snapshotRequestResult snapshot.Request
+	err = json.Unmarshal(snapshotRequestResultJSON, &snapshotRequestResult)
+	if err != nil {
+		return fmt.Errorf("failed to unmarshal snapshot request result: %w", err)
+	}


nit: You can accomplish the body reading and unmarshalling with fewer intermediate steps. Something like

var snapshotRequestResult snapshot.Request if err = json.NewDecoder(response.Body).Decode(&snapshotRequestResult); err != nil { return fmt.Errorf("failed to unmarshal snapshot request result: %w", err) }

Thanks, will check this out!

pkg/cli/snapshot_helm.go

jjaferson · 2025-09-11T09:00:59Z

pkg/server/routes/snapshots.go

+
+		// create the snapshot request Secret and ConfigMap
+		// - for shared nodes, create resources in the host cluster in the vCluster namespace
+		// - for private nodes, create resources in the virtual cluster in the kube-system namespace


For private nodes can we not use the vCluster namespace?

For private nodes the snapshot request resources are created inside of the virtual cluster, so theoretically we can create them in any namespace. I would save them in some namespace like vcluster-system, but AFAIK we don't have that, so I went with kube-system.

For shared nodes OTOH, the snapshot request resources are created inside of the host cluster, so vCluster namespace makes the most sense.

P.S. Theoretically we could also use vCluster namespace on the host for private nodes as well, but during the design phase (multiple meets) we have decided to use the virtual cluster for private nodes, and the host cluster for shared nodes.

pkg/server/routes/snapshots.go

jjaferson · 2025-09-11T09:08:23Z

pkg/snapshot/request.go

+		ObjectMeta: metav1.ObjectMeta{
+			Namespace: vClusterNamespace,
+			Labels: map[string]string{
+				RequestLabel: "",


Wouldn't it be good to have the vcluster name in the label values? in case we want to filter down snapshots request from an specific vCluster?

You can filter them by vcluster namespace, e.g.

kubectl get configmaps -n $VCLUSTER_NAME -l vcluster.loft.sh/snapshot-request NAME DATA AGE snapshot-request-2jvb9 1 15h snapshot-request-6cxg8 1 15s snapshot-request-bv7pm 1 14h snapshot-request-c8dbt 1 18s snapshot-request-clqkf 1 19h snapshot-request-ftnhd 1 18h snapshot-request-n6rc9 1 15h snapshot-request-vm5dg 1 14h

Also we should also add a new command for listing the snapshots (and in-progress snapshot requests), e.g. vcluster snapshot list, so checking the snapshots should be even easier.

we can add the vcluster label in any case

Also we should also add a new command for listing the snapshots

That is the reason I was asking but I guess many snapshots can be happening in parallel and even to see the history of snapshots that have been taken

jjaferson · 2025-09-11T09:11:31Z

pkg/util/clihelper/virtual.go

+	localPort := RandomPort()
+	errorChan := make(chan error)
+	go func() {
+		errorChan <- portforward.StartPortForwardingWithRestart(ctx, kubeConfig, "127.0.0.1", podName, vCluster.Namespace, strconv.Itoa(localPort), "8443", make(chan struct{}), portForwardingOptions.StdOut, portForwardingOptions.StdErr, log)


What would happen with the port forwarding if the command is interrupted? Would this process get cancelled?

Not sure TBH, I have just used the existing function (moved it here so it can be reused) for getting the virtual cluster config, so I didn't dig into it to see how it works in detail.

I see, I was wondering because we have an issue where the vcluster was being paused and not resumed due to the restore command being interrupted https://linear.app/loft/issue/ENG-8545/bug-vcluster-restore-leaves-pod-scaled-down-if-interrupted-preventing

Here specifically, if you interrupt the CLI command while it's trying to get the kubeconfig for accessing the virtual cluster, then the snapshot request shouldn't be even sent, because it first has to connect in order to get the access to the virtual server.

This reverts commit f1413bc.

Co-authored-by: Ryan Swanson <[email protected]>

pkg/snapshot/controller.go

jjaferson

Left a few suggestions for improvement in the e2e tests

jjaferson · 2025-09-12T12:54:22Z

test/e2e/snapshot/snapshot.go

-			})
+			// check deployment is deleted
+			Eventually(func() error {
+				_, err := f.VClusterClient.CoreV1().Secrets(testNamespaceName).List(f.Context, metav1.ListOptions{


I guess we should check if the deployment was deleted here

jjaferson · 2025-09-12T12:56:17Z

test/e2e/snapshot/snapshot.go

-					if container.State.Running == nil || !container.Ready {
-						return fmt.Errorf("pod %s container %s is not running", pod.Name, container.Name)
-					}
+				if err != nil {


Should we check for not found?

jjaferson · 2025-09-12T13:02:14Z

test/e2e/snapshot/snapshot.go

+						Containers: []corev1.Container{
+							{
+								Name:  "example-container",
+								Image: "nginx:1.25.0",


Nit: move this into a constant

jjaferson · 2025-09-12T13:02:43Z

test/e2e/snapshot/snapshot.go

+					AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
+					Resources: corev1.VolumeResourceRequirements{
+						Requests: corev1.ResourceList{
+							corev1.ResourceStorage: resource.MustParse("5Gi"),


Nit: move this to a constant

jjaferson · 2025-09-12T13:10:27Z

test/e2e/snapshot/snapshot.go

-					return fmt.Errorf("pod %s has no container status", pod.Name)
-				}
+			// check configmap is deleted
+			Eventually(func() error {


maybe we can move this function to a generic one verifyResourceDeleted(resourceType) error

nprokopic force-pushed the feature/snapshot-controller/etcd-snapshot branch from 75a63d3 to dc28fef Compare September 3, 2025 10:45

nprokopic commented Sep 9, 2025

View reviewed changes

pkg/util/clihelper/virtual.go Outdated Show resolved Hide resolved

nprokopic marked this pull request as ready for review September 9, 2025 23:17

nprokopic requested review from FabianKramm, a team, Piotr1215 and sydorovdmytro as code owners September 9, 2025 23:17

hidalgopl reviewed Sep 10, 2025

View reviewed changes

pkg/cli/snapshot_helm.go Outdated Show resolved Hide resolved

pkg/snapshot/controller.go Outdated Show resolved Hide resolved

pkg/server/routes/snapshots.go Outdated Show resolved Hide resolved

pkg/cli/snapshot_helm.go Outdated Show resolved Hide resolved

FabianKramm requested changes Sep 10, 2025

View reviewed changes

pkg/cli/snapshot_helm.go Outdated Show resolved Hide resolved

pkg/server/routes/snapshots.go Outdated Show resolved Hide resolved

pkg/snapshot/controller.go Outdated Show resolved Hide resolved

pkg/snapshot/controller.go Outdated Show resolved Hide resolved

cbron requested a review from jjaferson September 10, 2025 19:34

zerbitx reviewed Sep 10, 2025

View reviewed changes

chart/templates/role.yaml Outdated Show resolved Hide resolved

zerbitx reviewed Sep 10, 2025

View reviewed changes

nprokopic requested review from FabianKramm and hidalgopl September 10, 2025 19:59

jjaferson reviewed Sep 11, 2025

View reviewed changes

pkg/cli/snapshot_helm.go Show resolved Hide resolved

jjaferson reviewed Sep 11, 2025

View reviewed changes

pkg/server/routes/snapshots.go Outdated Show resolved Hide resolved

jjaferson reviewed Sep 11, 2025

View reviewed changes

nprokopic added 10 commits September 11, 2025 13:11

Add empty vcluster snapshot controller

dc83e86

Add more logging

553a325

Add in-progress controller implementation

f7866c2

Add finalizer and handle snapshot request deletion

2c308fe

Add vcluster snapshot create command

0113aac

Automatically generate snapshot request names

8e0ca57

Change snapshot request annotation to label

1a6ed56

Fix issues that linter found

c4c9417

Refactor snapshot controller Reconcile

288890a

Keep snapshot requests (without secret), improve error handling

715407a

nprokopic and others added 20 commits September 11, 2025 13:11

Make test namespace a test parameter

cdd7fc7

Create snapshot in vcluster container without mounting volume

45de631

Add tests for controller-based snapshots

46ecc65

Add dotimport whitelist

aaa6760

Remove pod options flags from new snapshot command

7a77b54

Use pod options with CLI-based snapshots

e6136c5

Allow dotimports for test packages

bbb4854

Fix command vars

14ef943

Fix AfterAll

7db1ec2

Check snapshot temp location only for snapshot command

3d53d19

Revert "Check snapshot temp location only for snapshot command"

2930ae7

This reverts commit f1413bc.

Save snapshot in e2e tests in a pre-mounted PVC

7d93af1

Add snapshot data PVC in tests that create snapshots

c9d2ed8

Add missing snapshot PVC mounts

ece2a13

Fix log messages capitalization

0f1f78b

Fix vcluster snapshot create output to be only logs

1331afd

Explicitly set MaxConcurrentReconciles for vcluster snapshot controller

ee3d91e

Check vCluster version in vcluster snapshot create command

dd4949a

Update chart/templates/role.yaml

049c632

Co-authored-by: Ryan Swanson <[email protected]>

Update 'vcluster snapshot' deprecation notice

35ca0c4

nprokopic force-pushed the feature/snapshot-controller/etcd-snapshot branch from 550cdb2 to 35ca0c4 Compare September 11, 2025 11:15

FabianKramm reviewed Sep 11, 2025

View reviewed changes

pkg/snapshot/controller.go Show resolved Hide resolved

hidalgopl previously approved these changes Sep 11, 2025

View reviewed changes

nprokopic mentioned this pull request Sep 12, 2025

Volume snapshots config and upstream controller and CRDs #3192

Draft

jjaferson reviewed Sep 12, 2025

View reviewed changes

Create snapshot request resources in the host

95d387d

nprokopic dismissed hidalgopl’s stale review via 95d387d September 12, 2025 21:53

nprokopic added 2 commits September 12, 2025 23:59

Fix linter issues

bbebbdc

Add vCluster labels to the snapshot request resources

269dbe5

nprokopic requested a review from FabianKramm September 13, 2025 10:40

Add vcluster snapshot create command #3164

Are you sure you want to change the base?

Add vcluster snapshot create command #3164

Uh oh!

Conversation

nprokopic commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nprokopic Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjaferson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Add `vcluster snapshot create` command #3164

Add `vcluster snapshot create` command #3164

nprokopic commented Sep 2, 2025 •

edited

Loading

nprokopic Sep 11, 2025 •

edited

Loading