Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional PV backup opt-in. #317

Closed
jacobstr opened this issue Feb 15, 2018 · 12 comments
Closed

Conditional PV backup opt-in. #317

jacobstr opened this issue Feb 15, 2018 · 12 comments
Labels
Breaking change Impacts backwards compatibility Enhancement/User End-User Enhancement to Velero Needs Product Blocked needing input or feedback from Product Volumes Relating to volume backup and restore

Comments

@jacobstr
Copy link

jacobstr commented Feb 15, 2018

Feature Suggestion

The thought has come up that we may want users to opt-in to PV backups e.g. via a label or annotation. We advise folks to use PV's in general to guarantee disk space for things such as local scratch space. We do a lot of image processing, which results in large, fungible cache PV's.

Host disk is currently unbounded by resource limits ala cpu/memory. The effort to improve this is under way in kubernetes/community#306.

@ncdc
Copy link
Contributor

ncdc commented May 8, 2018

@jbeda would appreciate any input you might have on this one

@rosskukulinski
Copy link
Contributor

Hi @jacobstr - the use-case you're describing makes sense. Do you have additional details around how PV/PVCs tend to be provisioned in your environments?

I'm wondering whether the existing --selector flag (with or without negation) or maybe a --ignore-selector would be sufficient for these uses.

Teams could label their PVs with backup: false and then define a negative selector on your backup.

@rosskukulinski rosskukulinski added Enhancement/User End-User Enhancement to Velero waiting for info labels Jun 17, 2018
@ncdc
Copy link
Contributor

ncdc commented Jun 19, 2018

@rosskukulinski I don't think the existing selector flag sounds sufficient for this use case. It sounds like there needs to be an additional flag that selects just PVCs/PVs.

@ncdc
Copy link
Contributor

ncdc commented Jun 26, 2018

@rosskukulinski another data point - let's say you have the following items:

  1. pod mysql, labeled app=mysql, using PVC data
  2. pvc data, labeled backup=false, referencing PV pvc-23423-23-2342-34-23
  3. PV pvc-23423-23-2342-34-23, dynamically provisioned, no labels

If you do ark backup create --selector app=mysql, Ark:

  1. Uses the label selector and gets pod mysql
  2. Runs backup item actions for the pod, specifically podAction, which sees the pod has the PVC data and tells Ark to make sure to back up that PVC
  3. Ark immediately processes the PVC data, runs backup item actions for it, specifically backupPVAction, which finds the PV pvc-23423-23-2342-34-23 associated with the PVC and tells Ark to back it up
  4. Ark immediately processes PV pvc-23423-23-2342-34-23 and backs it up (snapshot or restic)

This is our approach to making sure that if you use a label selector to back up things (such as pods), we make sure to include the associated PVs. Our original documentation & example before we added this logic required the user to manually label the PVC (if needed) and PV so they'd be backed up. This is especially true for dynamically provisioned PVs.

xref #591

@ncdc ncdc added the Breaking change Impacts backwards compatibility label Jun 27, 2018
@skriss
Copy link
Contributor

skriss commented Oct 12, 2018

xref #929

@ncdc ncdc added this to the v1.x milestone Nov 9, 2018
@skriss skriss modified the milestones: v1.x, v1.0.0 Nov 28, 2018
@skriss skriss removed this from the v1.0.0 milestone Mar 14, 2019
@stanislavb
Copy link

I have a hard time excluding a PV auto-provisioned from a PVC. Label based filtering on pod and PVC level does not help since the PV does not inherit the label. Ideally for my use case, label based filtering of the PVC would affect whether the associated PV would be backed up or not, but looking at test code currently that is not a supported feature.

@karlkfi
Copy link

karlkfi commented Aug 16, 2019

We have some related use cases:

  • Some jobs use PVCs for scratch disks that are larger than the node disk size and/or to isolate between jobs to allow higher pod per node destiny. These get deleted when the pod exist and never need to be backed up by Velero.
  • Some jobs use PVs for caching that needs to be shared between consecutive or concurrent jobs. These caches are giant but can be recreated easily, so they also never need to be backed up by Velero.

I don't want Velero to spend tebibytes of backup space and hours of time backing up disks we don't need backed up, but we have plenty of other disks that do need backing up.

@carlisia carlisia added Needs info Waiting for information and removed Waiting for info labels Dec 14, 2019
@mithuns
Copy link

mithuns commented Dec 18, 2019

Building on the example scenario given by @ncdc above.
What can be done in the case when pvc needs to be backed-up but pv needs to be excluded ?
We have an operator that creates STS/PVC/PV but since PVs dont inherit labels from PVC we are not able to exclude PVs.
Also, to have another operator that deals with PV only and applies labels to them directly, is also not the best solution because PV are cluster scoped , so we don't want to make an operator that has cluster scope.
Any suggestions would be welcome.

Update: I think I got the answer from here
#2151

@mithuns
Copy link

mithuns commented Feb 25, 2020

Just out of curiosity , is this on some roadmap or planned yet ?

@skriss
Copy link
Contributor

skriss commented Feb 26, 2020

@mithuns it is not currently prioritized. If you're interested in this feature, could you provide some more information on your use case? Thanks!

@mithuns
Copy link

mithuns commented Feb 26, 2020

So, as described above by @karlkfi and @stanislavb , there are multiple cases where auto-provisioned pv is hard to exclude or be applied velero exclude label.
In my use-case , my application is just such in nature that the pv created is never really needed to be backed up, (the entire cluster state needs to be backedup in a timely way)
Thus, I had to create a kubernetes operator just to look for pv creation and pvc updation events, so then pv (from certain pvc with certain labels on them) can be marked to be excluded from backups.

Ideally if there is a way to setup such rules in velero itself, that would be great. Kind of like a matrix (rule book, config whatever we wish to call it),

  1. PV created from pvc with these labels will automatically be marked for exclusion.
  2. PVC created from StatefulSets with these labels will automatically be marked for exclusion.
  3. StatefulSets created from Custom Resources with these specific labels will automatically be marked for exclusion.
    I think velero already is running at a cluster scope (some might argue that velero shouldnt listen for more cluster-scoped or namespace-scoped events) , might be that can be a turn off/on feature with a flag ?

@dsu-igeek dsu-igeek added Needs Product Blocked needing input or feedback from Product and removed Needs info Waiting for information labels Oct 21, 2020
@carlisia carlisia added the Volumes Relating to volume backup and restore label Oct 22, 2020
@nrb
Copy link
Contributor

nrb commented May 3, 2021

I don't think this will be something Velero tackles any time soon. It may be solved as part of our move to Astrolabe.

@nrb nrb closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking change Impacts backwards compatibility Enhancement/User End-User Enhancement to Velero Needs Product Blocked needing input or feedback from Product Volumes Relating to volume backup and restore
Projects
None yet
Development

No branches or pull requests

10 participants