[Epic] Backup Replication #103

jrnt30 · 2017-09-25T11:59:57Z

User Stories
As a cluster administrator, I would like to define a replication policy for my backups which will ensure that copies exist in other availability zones or regions. This will allow me to restore a cluster in case of an AZ or region failure.

Non-Goals

Cross-cloud replication of backups
Cross-account replication of backups

Features

?

Original Issue Description

There are a few different dimensions of a DR strategy that may be worth consideration. For AWS deployments the trade-offs the complexity of running Multi-AZ are fairly negligible if you stay in the same region. As such the Single Region/Multi-AZ deployment is extremely common.

An additional requirement often is having the ability to restore in another region with more relaxed RTO/RPO in the case of an entire region going down.

Looking over #101 brought a few things to mind, and a large wish list might include:

Ability to specify additional block storage providers for syncing to additional regions (or a different type of block storage provider that would simply execute the clone to a different region)
Ability to map AZs for a restoration (maybe similar to Namespaces but preferably just transparently for the user) to allow for something like us-east-1a -> us-west-2b.
Writing backup data to an additional bucket in alternate region

Some of these are certainly available today to users (copying snapshots and s3 data) but require additional external integrations to function properly. As a user it would be more convenient if this were able to be done in a consolidated way.

The text was updated successfully, but these errors were encountered:

ncdc · 2017-09-25T16:46:30Z

@jbeda some of what @jrnt30 is describing sounds similar to your idea of "backup targets"

jimzim · 2017-11-13T20:50:08Z

I just was going to post this as a feature request. :)

I just tried to do this from eastus to westus in Azure and started to think about how we could copy the snapshot and create the disk in the correct region. We could possibly have a restore target config? I also like the idea of creating multiple backups to other regions in case a region goes down or a cluster and its resources get deleted.

ncdc · 2017-11-13T20:53:16Z

@jimzim this is definitely something we need to spec out and do! We've been kicking around the idea of a "backup target", which would replace the current Config kind. You could define as many targets as you wish, and when you perform a backup, you would then specify which target to use. There are some UX issues to reason through here...

jimzim · 2017-11-29T22:59:56Z

@ncdc Maybe we can discuss this briefly at KubeCon? I have begun to make this work on Azure, but before I go too much further it would be good to talk about what your planned architecture is.

ncdc · 2017-11-29T23:03:40Z

Sounds great!

…

On Wed, Nov 29, 2017 at 5:59 PM Jim Zimmerman ***@***.***> wrote: @ncdc <https://github.com/ncdc> Maybe we can discuss this briefly at KubeCon? I have begun to make this work on Azure, but before I go too much further it would be good to talk about what your planned architecture is. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#103 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAABYoh9BCcUoAc0fXI8AEDsX5VP9gqjks5s7eHsgaJpZM4PingG> .

jbeda · 2017-12-01T23:23:30Z

This is very much what i'm thinking. We need to think about backup targets, restore sources and ways to munge stuff with a pipeline. Sounds like we are all thinking similar things.

rocketraman · 2018-01-09T16:29:02Z

On Azure, you can create a snapshot into a different resource group than the one that the persistent disk is on, which means the snapshots could be created directly into the AZURE_BACKUP_RESOURCE_GROUP instead of AZURE_RESOURCE_GROUP.

Then, cross-RG restores should be quite simple as the source of the data will always be consistent and there should be no refs to AZURE_RESOURCE_GROUP.

I'm not sure if same-Location is a limitation of this -- I've only tried this on two resource groups that are in the same Azure Location.

The command/output I used to test this:

az snapshot create --name foo --resource-group Ark_Dev-Kube --source '/subscriptions/xxx/resourceGroups/my-Dev-Kube1/providers/Microsoft.Compute/disks/devkube1-dynamic-pvc-0bbf7e11-9e82-11e7-a717-000d3af4357e'
  DiskSizeGb  Location    Name    ProvisioningState    ResourceGroup    TimeCreated
------------  ----------  ------  -------------------  ---------------  --------------------------------
           5  canadaeast  foo     Succeeded            Ark_Dev-Kube     2018-01-09T16:21:58.398476+00:00

and the foo snapshot was created in Ark_Dev-Kube even though the disk is in my-Dev-Kube1.

rosskukulinski · 2018-06-24T18:13:59Z

For reference, this is the current Ark Backup Replication design.

nrb · 2018-07-10T21:14:41Z

We've created a document of scenarios that we'll use to inform the design decisions for this project.

We also have a document where we're discussing more detailed changes to the Ark codebase from which we'll generate a list of specific work items.

Members of the [email protected] google group have comment access to both of these documents for anyone who would like to share their thoughts on these.

dijitali · 2019-06-03T13:34:37Z

Similar scenario for us, I think, and we are using the following manual workaround:

# Make a backup on the first cluster
kubectx my-first-cluster
velero backup create my-backup

# Switch to new cluster and restore the backup
kubectx my-second-cluster
velero restore create --from-backup my-backup

# Find the restored disk name
gcloud config configurations activate my-second-project
gcloud compute disks list

# Move the disk to the necessary region
gcloud compute disks move restore-xyz --destination-zone $my-second-cluster-zone

# Ensure the PV is set to use the retain reclaim policy then delete the old resources
kubectl patch pv mongo-volume-mongodb-0 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl delete statefulset mongodb
kubectl delete pvc mongo-volume-mongodb-0

# Recreate the restored stateful set with references for the new volume
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongodb
spec:
  selector:
    matchLabels:
      app: mongodb
  serviceName: "mongodb"
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
        - name: mongo
          image: mongo
          command:
            - mongod
            - "--bind_ip"
            - 0.0.0.0
            - "--smallfiles"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: mongo-volume
              mountPath: /data/db
  volumeClaimTemplates:
  - metadata:
      name: mongo-volume
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
      storageClassName: ""
      volumeName: "mongo-volume-mongodb-0"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mongo-volume-mongodb-0
spec:
  storageClassName: ""
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    pdName: "restore-xyz"
    fsType: ext4

EOF

jujugrrr · 2020-06-04T10:35:21Z

Hi, is there any ETA for this? It sounds like a basic feature to be able to use backup to recover from an AZ failure.

https://docs.google.com/document/d/1vGz53OVAPynrgi5sF0xSfKKr32NogQP-xgXA1PB6xMc/edit#heading=h.yuq6zfblfpvs sounded promising

skriss · 2020-06-04T18:25:57Z

@jujugrrr we have cross-AZ/region backup & restore on our roadmap. If you're interested in contributing in any way (requirements, design work, etc), please let us know!

cc @stephbman

kmadel · 2020-08-10T14:30:12Z

You don't need backup replication to support multi-zone and multi-region for GCP/GKE with the K8s VolumeSnapshot beta support of Velero v1.4. See #1624 (comment)

fluffyf-x · 2021-06-29T07:09:16Z

Hey, I was wondering if there was any update on this? Or a breakdown of tasks required to complete this epic?

My team is running an AKS cluster with the csi plugin, we've tried rustic as well as restoring VHD from blob to move the snapshots into another region which resulted in:

StatusCode: 409, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 409, RawError: {
  "error": {
    "code": "OperationNotAllowed",
    "message": "Addition of a blob based disk to VM with managed disks is not supported.",
    "target": "dataDisk"
  }
}

…-1.5.0 skip backuping projected volume

jkupidura14 · 2021-09-07T20:03:53Z

Is there any update to this? I feel like this could be easily solved by not storing a specific volume ID (snapshot id in the case of AWS) that you want to restore from, but to make a custom tag with a randomly generated ID that Velero uses as a reference when trying to restore. This would make it so that no matter what region or az you copy the storage backup to, Velero would still be able to restore from it when it has the correct ID tag. Just a thought.

joostvdg · 2021-10-22T10:50:13Z

Any update to this?

We are looking into helping customers replicate volume backups across Cloud Regions (e.g., AWS us-east-1 to us-west-1) with Velero. We did some AWS specific investigations but it was closed because you have something else lined up. Is this ticket the place where we can track this?

johnroach · 2021-11-10T04:13:40Z

Hi is there any updates in regards to this? Any way someone can help with this?

jglick · 2021-11-10T14:12:39Z

My very limited understanding from comments by @dsu-igeek at the community meeting of 2021-11-02 is that this sort of feature is on hold pending #4077 and a rewrite of volume snapshotters to a new architecture based on Astrolabe, because while it is not particularly hard to implement replication in a particular plugin without a general framework, subtle timing issues (#2888) could lead to anomalous behaviors in certain applications which do not tolerate a simple copy of volumes.

iamsamwood · 2022-05-04T19:55:49Z

Hello, also wondering if there are any updates on this and wondering how I can help.

jcockroft64 · 2022-09-27T20:05:45Z

I too am wondering about an update. Was this accepted into 1.10?

antonmatsiuk · 2023-03-07T09:10:09Z

any updates on the topic?

veerendra2 · 2024-03-02T23:15:12Z

Hello, Any updates on this? We are hoping get this feature soon.

Right now we are trying to implement this by copying the azure disk snapshots to other region with shell/python scripts and update velero output files(to make restore smooth in case).

I was also wondering, anyone tried using CSI Snapshot Data Movement to make backups available in cross region?

UPDATE 16.05.2024

We enabled CSI Snapshot Data Movement to store backups in storage account to make backups available in cross region(With GRS option Azure Storage Account)
- kopia tool as backup management
- Enabled useAAD:" "true" to avoid Azure storage account api rate limits
- Fix volumes dir mounts -> Velero pod crashing due to error "velero-plugin-for-aws: permission denied" #4339 (comment)
- Increased cache EmptyDir size to 16Gi
- But ran into issues
  - node-agent is not waiting for Dataupload completion before eviction during node image upgrade #7759
  - High memory usage and OOM killed during maintenance tasks #7510

* Use CDI api CDI API has smaller/simpler dependencies. This all we need to cooperate with the kubevirt cluster. Signed-off-by: Bartosz Rybacki <[email protected]> * Update code to use cdi-api Signed-off-by: Bartosz Rybacki <[email protected]> * Go mod tidy & vendor Signed-off-by: Bartosz Rybacki <[email protected]> Signed-off-by: Bartosz Rybacki <[email protected]>

ncdc mentioned this issue Nov 9, 2017

Supporting cross region restore. #194

Closed

ncdc mentioned this issue Mar 6, 2018

Ability to keep a backup from being garbage collected #251

Closed

ncdc self-assigned this Mar 9, 2018

ncdc added enhancement labels Mar 9, 2018

ncdc added this to the v0.8.0 milestone Mar 9, 2018

ncdc modified the milestones: v0.8.0, v0.9.0 Apr 24, 2018

nrb mentioned this issue May 7, 2018

Feature request: restore Azure snapshots across regions (VHD export/import) #399

Closed

ncdc removed this from the v0.9.0 milestone Jun 8, 2018

rosskukulinski added the Enhancement/User End-User Enhancement to Velero label Jun 24, 2018

rosskukulinski added this to the v1.0.0 milestone Jun 24, 2018

This was referenced Jun 24, 2018

Replace Config CRD #538

Closed

refresh the Ark config definition doc #197

Closed

Enable sharing bucket between Ark and Restic #576

Closed

rosskukulinski removed the Enhancement label Jun 25, 2018

ncdc added the Breaking change Impacts backwards compatibility label Jun 27, 2018

nrb modified the milestones: v1.0.0, v0.10.0 Jul 18, 2018

rosskukulinski added Epic and removed Epic labels Jul 18, 2018

dijitali mentioned this issue Jul 17, 2019

Add support for restoring to another zone [GCP] #1624

Closed

skriss removed the P1 - Important label Feb 19, 2020

skriss mentioned this issue Mar 12, 2020

Restore is failing when trying to backup from a cluster in one region to a cluster in another region. #2334

Closed

nrb mentioned this issue Aug 12, 2020

Velero HA azure #2694

Closed

nrb removed this from the v2.0 milestone Dec 8, 2020

eleanor-millman added the Icebox We see the value, but it is not slated for the next couple releases. label May 3, 2021

dsu-igeek added the Reviewed Q2 2021 label May 3, 2021

jmontleon pushed a commit to jmontleon/velero that referenced this issue Jul 7, 2021

Merge pull request vmware-tanzu#103 from sseago/projected-vols-restic…

8a8af88

…-1.5.0 skip backuping projected volume

This was referenced Oct 1, 2021

Enable cross-AZ/cross-cloud restore/migration with PV snapshots #3772

Open

Support cross-region disaster recovery vmware-tanzu/velero-plugin-for-aws#90

Closed

reasonerjt added the kind/requirement label May 20, 2022

eleanor-millman added the 1.10-candidate The label used for 1.10 planning discussion. label May 25, 2022

eleanor-millman removed the 1.10-candidate The label used for 1.10 planning discussion. label Jun 2, 2022

Lyndon-Li self-assigned this Mar 16, 2023

This was referenced Jun 29, 2023

[FEAT] Restore a snapshot to a different zone Lirt/velero-plugin-for-openstack#80

Closed

Velero doesn't support OpenStack CSI topology #6440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Backup Replication #103

[Epic] Backup Replication #103

jrnt30 commented Sep 25, 2017 •

edited by rosskukulinski

Loading

ncdc commented Sep 25, 2017

jimzim commented Nov 13, 2017

ncdc commented Nov 13, 2017

jimzim commented Nov 29, 2017

ncdc commented Nov 29, 2017 via email

jbeda commented Dec 1, 2017

rocketraman commented Jan 9, 2018

rosskukulinski commented Jun 24, 2018

nrb commented Jul 10, 2018 •

edited

Loading

dijitali commented Jun 3, 2019

jujugrrr commented Jun 4, 2020

skriss commented Jun 4, 2020

kmadel commented Aug 10, 2020

fluffyf-x commented Jun 29, 2021

jkupidura14 commented Sep 7, 2021

joostvdg commented Oct 22, 2021

johnroach commented Nov 10, 2021

jglick commented Nov 10, 2021 •

edited

Loading

iamsamwood commented May 4, 2022

jcockroft64 commented Sep 27, 2022

antonmatsiuk commented Mar 7, 2023

veerendra2 commented Mar 2, 2024 •

edited

Loading

[Epic] Backup Replication #103

[Epic] Backup Replication #103

Comments

jrnt30 commented Sep 25, 2017 • edited by rosskukulinski Loading

ncdc commented Sep 25, 2017

jimzim commented Nov 13, 2017

ncdc commented Nov 13, 2017

jimzim commented Nov 29, 2017

ncdc commented Nov 29, 2017 via email

jbeda commented Dec 1, 2017

rocketraman commented Jan 9, 2018

rosskukulinski commented Jun 24, 2018

nrb commented Jul 10, 2018 • edited Loading

dijitali commented Jun 3, 2019

jujugrrr commented Jun 4, 2020

skriss commented Jun 4, 2020

kmadel commented Aug 10, 2020

fluffyf-x commented Jun 29, 2021

jkupidura14 commented Sep 7, 2021

joostvdg commented Oct 22, 2021

johnroach commented Nov 10, 2021

jglick commented Nov 10, 2021 • edited Loading

iamsamwood commented May 4, 2022

jcockroft64 commented Sep 27, 2022

antonmatsiuk commented Mar 7, 2023

veerendra2 commented Mar 2, 2024 • edited Loading

jrnt30 commented Sep 25, 2017 •

edited by rosskukulinski

Loading

nrb commented Jul 10, 2018 •

edited

Loading

jglick commented Nov 10, 2021 •

edited

Loading

veerendra2 commented Mar 2, 2024 •

edited

Loading