Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support restoring a PV into the same cluster #192

Closed
ncdc opened this issue Nov 8, 2017 · 39 comments · Fixed by #1779
Closed

Support restoring a PV into the same cluster #192

ncdc opened this issue Nov 8, 2017 · 39 comments · Fixed by #1779
Assignees
Labels
Enhancement/User End-User Enhancement to Velero Help wanted
Milestone

Comments

@ncdc
Copy link
Contributor

ncdc commented Nov 8, 2017

Forked from #189

Scenario:

  • PVC in namespace N1, bound to PV P1
  • Create backup (snapshotting the PV)
  • Restore backup, remapping namespace N1 to N2
  • Have the PVC in N2 reference a new PV P2, created from P1's snapshot

Currently, this is not possible, because Ark does not change the name of the PV when restoring it. The result is 2 PVCs both trying to claim the same (original) PV P1.

Questions:

  • Should this be done automatically?
  • Should this be optional?
  • What sort of controls/flags should we expose to the user?
  • How do we cover the 80-90% use case?

cc @evaldasou

@ncdc
Copy link
Contributor Author

ncdc commented Nov 8, 2017

@rosskukulinski would you find this useful?

@dgoodwin
Copy link
Contributor

dgoodwin commented Nov 8, 2017

Our usage is planned to always delete a namespace after backup. (as we're using it to archive it) As such I don't think we'd hit a scenario where we backed up, but still had a live namespace using the PV.

However restoring with a remapped namespace and preserving the PV would definitely be useful, we don't plan to do this for a bit but it is on the radar. (I forget if Ark can handle remapped namespace with PV snapshots if nothing is still using the PV)

@rosskukulinski
Copy link
Contributor

@ncdc reading through #189, yes I think this would be a useful (and expected IMO) feature for CI/CD and testing use-cases.

As a more specific example, this feature request enables what you describe in the ephemeral ci testing env w/ ark blog post but supports this behavior within a single cluster.

@evaldasou
Copy link

evaldasou commented Nov 8, 2017

wow, awesome! So as @rosskukulinski mentioned , it looks like in Your blog post @ncdc, ephemeral ci testing env w/ ark - it is supported to do this type of restore from one cluster to another? (without deleting original PV/PVC?). I am going to test this!

Regarding new feature, I think there should be optional flag , so we can select to re-create PVC/PV or not. Something like --restore-volumes-recreate (to recreate PVC/PV) flag.

@ncdc
Copy link
Contributor Author

ncdc commented Dec 15, 2017

Current thinking is ark restore create NAME --clone-volumes would take each PV p in the backup, restore as PV p', and ensure any PVC being restored that references p is updated to reference p'.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 8, 2018

cc @jbeda on UX

@jbeda
Copy link

jbeda commented Mar 8, 2018

It might be worthwhile to think about this in terms of scenarios and the flags that will usually exist together. 90% of the time that users remap namespaces they'll want to rename/clone PVs too. We don't want to have people flip a zillion flags to do the common case.

I'm reminded of the flags for rsync:

-a, --archive
    This is equivalent to -rlptgoD. It is a quick way of saying you want recursion and
    want to preserve almost everything.

Another analogy/pattern: git. While git isn't the best, most usable thing but there is the idea of plumbing vs. porcelain. https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain. Think about what is the underlying granular toolkit (plumbing) and what is the usable outer shell (porcelain).

@temujin9
Copy link

Is this on a near-term roadmap, or should I be finding a workaround?

@temujin9
Copy link

(Also, are you interested in outside contribution, or are you tackling this internally?)

@ncdc
Copy link
Contributor Author

ncdc commented Apr 20, 2018 via email

@rosskukulinski
Copy link
Contributor

Following up on this: I think this is a critical feature. Key use-case is enabling duplication of namespaces within the same cluster.

This should not be done automatically [for cross-cluster restores or for DR, should keep the PV names the same].

Suggestion: --unique-pv. If there are other features/flags required to enable expected behavior for namespace duplication, we can create a short-code that enables multiple flags (like how rsync does).

@agolomoodysaada
Copy link

Are there any plans on implementing this? I'm happy to submit a PR with a few guidelines from authors.
Thanks for the great work you guys are doing ❤️

@ncdc
Copy link
Contributor Author

ncdc commented Sep 18, 2018

@agolomoodysaada we do hope to be able to include this feature. We'll be doing some more project planning in the next week or two, and hopefully we'll have time to discuss this one. We'd also be thrilled if you wanted to contribute a PR for this. But before starting, I do think this is a feature in particular that needs a solid design for UI/UX. Please read the comments in this issue, and let us know if you have any ideas. Thanks!

@ncdc ncdc modified the milestones: v1.0.0, TBD Nov 27, 2018
@jengo
Copy link

jengo commented Mar 20, 2019

Any updates on this ?

@arianitu
Copy link

@skriss any update on this, have you found an approach, is it a good idea for us to start looking at implementation for this?

@skriss
Copy link
Contributor

skriss commented Jul 24, 2019

@arianitu I've been looking into it. Definitely interested in your view on what the right UX is. Should this be automatic behavior when using the --namespace-mappings flag? Should it be a separate flag on velero restore create, e.g. --rename-pvs? Something else?

@robertgates55
Copy link

The 3 use cases where you’d use the —namespace-mapping that I see:

  • Cross-cluster copy (with namespace name change)
  • in-cluster fork (new namespace that looks the same as another)
  • in-cluster move (rename namespace)

Only the fork scenario requires new PV names I think - so an additional —rename-pvs flag would work nicely for that case?

@agolomoodysaada
Copy link

agolomoodysaada commented Jul 24, 2019

I like your first suggestion @skriss .

velero backup create --include-namespaces ns1 ns1-backup
# in the same cluster
velero restore create --namespace-mappings ns1:ns2 --from-backup ns1-backup

The problem with same-namespace restores and the --rename-pvs flag is that you would need to rename more than just PVs; deployment names, Statefulset names, etc... I think it's okay to release an early implementation of this feature that requires the --namespace-mappings to be implemented for all PVs that are to be restored. It could fail if any PV restore does not have a namespace mapping.

Unless... unless we can do something like --prefix-names restored- which would rename all resources.

velero restore create --prefix-names "restored-" --from-backup ns1-backup
# mydeployment -> restored-mydeployment
# mypvc -> restored-mypvc
# mystatefulset -> restored-mystatefulset

@arianitu
Copy link

@skriss I think the behaviour should be automatic if you're going from one namespace to another. In most cases, if you are using a PVC in one namespace, the next namespace is not going to work correctly since almost always you're going to run into an issue where containers will complain about the PVC already being mounted.

Would there be a specific reason to having a —rename-pvs flag? Also, —rename-pvs makes it sound like the old PVs are no longer going to be used when I think we want new PVs from a snapshot to be used and we would keep the old PVs around. (is this possible with Velero?)

@skriss
Copy link
Contributor

skriss commented Jul 25, 2019

@robertgates55 re: #192 (comment) -- you make a good point that it's only necessary to rename the PV in one of those scenarios; however, do you see a problem if we also rename the PV for the other scenarios? Given that PV names are usually system-generated random IDs, I wonder how much it matters if the name gets changed on restore (as long as its related PVC gets updated as well).

One option is to have this be automatic behavior when using the --namespace-mappings flag, but to only rename the PV if there's already one in the target cluster with the current name.

@agolomoodysaada
Copy link

I'm not sure if it's applicable here. But cloning a volume could be done using the new PVC DataSource (aka Volume Cloning) introduced in Kubernetes 1.15 as an alpha feature.
My understanding is that you'll be able to clone PVCs with PVs that have the same disk contents. Perhaps a tight integration with this feature might resolve this issue?

More info: https://github.com/kubernetes/enhancements/pull/1147/files

@robertgates55
Copy link

@skriss I think that renaming the pv only if the existing ones are already in use in the cluster would be ideal

@arianitu
Copy link

Thank you for your work @skriss, looking forward to testing this, is there a version I can test on our systems?

@skriss
Copy link
Contributor

skriss commented Aug 29, 2019

Yes! you could try using the gcr.io/heptio-images/velero:master image in your deployment. Note that this is a dev build, so I wouldn't recommend using it production.

You shouldn't need an updated client, as the changes are all server-side.

@arianitu
Copy link

arianitu commented May 3, 2020

@skriss Sorry swinging back on this after a long time, but I am on:

Client:
	Version: v1.3.1
	Git commit: -
Server:
	Version: v1.2.0

But I still get

  Type     Reason         Age    From                         Message
  ----     ------         ----   ----                         -------
  Warning  ClaimMisbound  5m26s  persistentvolume-controller  Two claims are bound to the same volume, this one is bound incorrectly

Do you know which Velero version supports this?

@arianitu
Copy link

arianitu commented May 3, 2020

Also to describe what we are doing:

  1. We have a namespace A that we create a backup with.
  2. We restore backup to namespace B.
  3. Namespace A still has original PV and PVC

Is this use case supported? We are basically looking for a namespace clone for testing purposes.

@viktor1298-dev
Copy link

@skriss Hey, is it still not possible to restore a backup with PVCs to a new namespace on the same cluster ?

@jpuskar
Copy link

jpuskar commented Feb 25, 2022

I found this issue because I want to verify that my restic restores actually contain the data which I expect. But there seems to be no way for me to inspect at the file level, or to restore without first deleting the PV which I want to verify.

There's an old saying: "backups that haven't been tested don't exist".

I'm just trying to figure out how to safely test the backups.

@aaronariasperez
Copy link
Contributor

I found this issue because I want to verify that my restic restores actually contain the data which I expect. But there seems to be no way for me to inspect at the file level, or to restore without first deleting the PV which I want to verify.

There's an old saying: "backups that haven't been tested don't exist".

I'm just trying to figure out how to safely test the backups.

Hello @jpuskar ! I have the same case, I don't know how to verify the backups without deleting/modifying the original deployment...
Did you find a way to check them?

@lupant
Copy link

lupant commented May 21, 2023

Still facing the same problem here

@liogate
Copy link

liogate commented Oct 14, 2023

Hello 👋,

It appears that Velero with Restic doesn't support the restoration of PV/PVC in the same cluster but in a different namespace. I'm currently using Velero version v1.12.0.

Is there any update regarding the support for this feature? 🙏

If there's nothing in progress on your end, please let me know and I'll try to add support of this feature. Thanks !

@sseago
Copy link
Collaborator

sseago commented Oct 17, 2023

@liogate We already support namespace mappings on restore -- so this should do what you want. Have you tested this? Maybe you've found a bug with this functionality?

@liogate
Copy link

liogate commented Oct 19, 2023

Hi @sseago, thank you for your feedback. We're testing a PV with a Retain policy on the Scaleway provider. Here are the steps we've followed:

  • Create a cluster-wide backup with Restic enabled, ensuring the namespace containing my PVC is included.
  • I've successfully deleted and restored my namespace.
  • When I try to restore the backup using namespace mapping <ns-a>:<ns-b>, which includes pv, pvc, and pods for this restore and few extra ressources.
velero restore create \
  --from-backup $VELERO_SELECTED_BACKUP \
  --include-namespaces $VELERO_SELECTED_NAMESPACE \
  --namespace-mappings $VELERO_SELECTED_NAMESPACE:$VELERO_RESTORE_NAMESPACE \
  --include-resources pods,configmaps,secrets,serviceaccount,persistentvolumeclaims,persistentvolumes \
  --restore-volumes=true

The result -- PVC Restore is stuck with could not restore, PersistentVolume "pvc-xxx" already exists.. We have exactly same PV id on both namespace VELERO_SELECTED_NAMESPACE and VELERO_RESTORE_NAMESPACE.

Thanks.

@sseago
Copy link
Collaborator

sseago commented Oct 19, 2023

@liogate Hmm. That seems like a bug, possibly a recent regression. See the PR linked above (#1779 ) -- that was the fix for the problem you are describing. It might be worth opening a new issue reporting this as a regression. It's possible that the kopia refactor broke this, but I'm not really sure.

@mikephung
Copy link

@sseago i also used the version v1.9.1 currently and tried to restore using namespace mapping.

velero restore create --from-backup backup-20.10.2023 --namespace-mappings NS1:NS2

i restored it on cluster 2 after backed up it from cluster 1, the restore was working but i can't use the PVC because the PV pvc-786d37a0-796d-4fd3-8236-5b9c107fe1f5 is actually existing in cluster 1, and now i try to use the same named PV from cluster 2 and i think this causes problems for the CSI driver since it looks in the datastore of cluster 1? Is there a Velero feature or workaround to handle the PV naming conflicts across clusters?

Any quick help would be greatly appreciated. Thank you!

Log while trying to mount:
Warning FailedAttachVolume 47s (x26 over 37m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-786d37a0-796d-4fd3-8236-5b9c107fe1f5" : rpc error: code = Internal desc = failed to attach disk: "9f4748ef-7378-4363-9f6d-04e84d1b5bfb" with node: "42068c42-3ede-3321-2bd4-b2dffb1233c1" err ServerFaultCode: NoPermission

@sseago
Copy link
Collaborator

sseago commented Oct 24, 2023

@mikephung Hmm. So it's possible that this use case doesn't work properly with CSI, so while the fix associated with this (now closed) issue worked fine with native snapshots and fs backup, there may be a separate problem with CSI snapshots and namespace mappings. This issue was fixed and closed four years ago, before the CSI plugin even existed. I'd suggest that if you're having a CSI-specific problem, a new bug should be opened that is specific to the problem you're hitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement/User End-User Enhancement to Velero Help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.