-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete snapshots before deleting RBD image #3416
Comments
@vriabyk |
@Rakshith-R, yes, snapshot was created manually via rbd cli. Any problem to add this logic from your side? We can create simple PR for that. You may at least implement some option which will be disabled by default, but if enabled - will delete snapshots before deleting image. Otherwise, ceph csi shouldn't delete pvc/pv from k8s if the image has snapshots in ceph and throw error message like: "Can't delete pvc because it has snapshots in ceph". |
@vriabyk cc @ceph/ceph-csi-contributors |
yes agree to above point, if user created the snapshots its user responsibility to delete it before delete the pvc or snapshots. i dont we should purge snapshots in the rbd image before deleting. |
This does sound like something that can be handled at CSI level, provisioning and deprovisioning volumes are the main things expected of an interface. Things in ceph ecosystem usually handle snapshots that way, they are transparently purged upon RBD volume deletion unless they are protected. Most common scenario here is not that they are "manually" created by some user, but some backup routine created them talking directly to ceph to have a reliable versioned source volume, made its backups and will only need it for diffing purposes if this volume continues to exist and will need to be backed up again. How do you imagine this working automatically at any scale? When a pvc gets deleted from kubernetes, ceph-csi successfully removes k8s objects, nothing else but the CSI controller gets to receive the deletion request and it virtually does nothing about it, deceiving the user the volume has been deleted while it's been not. So if anyone is expected to approach this issue in an automated way, the solution would have to either be a some sort of a cronjob or to source data by means of dumpster diving, both of which sound like extreme workarounds for something that can be handled properly. It would seem logical that a CSI implementation should be able to properly handle volume deletion in this "edge" case, if this is deemed to dangerous this could be an opt-in flag, or even something more granular like an annotation. |
IMHO its not a best option to handle it at the cephcsi level which is created by some other entity. if some external component is doing some operation at the ceph level we should expect to redo the operation it has done before deleting the volume. providing this functionality behind a flag is not a problem but it doesnt sound right. if snapshots are created by some external entity we should expect it to delete it also. |
How would the entity responsible for snapshot creation know to delete the snapshots before ceph-csi attempts to delete the volume? As far as I'm aware, there is currently no way to configure an external webhook dependency for the finalizer that would cause ceph-csi to wait until the snapshots are deleted either. The issue here is that it's a responsibility scope for the CSI implementation to either do something about it or delegate the problem to something external, however at the moment it does neither. |
@crabique It is not responsible for anything that is done at ceph level by another user/entity on the images created by cephcsi. They need to cleanup after themselves, maybe by having a routine that checks images in trash for snaps created by it and purging the related snap? Snap purge at cephcsi does not sound like a good idea, cephcsi also creates/deletes k8s snapshot, clones of pvc which have snapshots links to the parent images and still continue to exist when parent pvcs are deleted. |
Ceph csi couldn't remove pvc with snapshots that was created directly in ceph using rbd or other clients. As a result, the rbd image remains permanently in the trash. Fixes: ceph#3416 Signed-off-by: ruslanloman <[email protected]>
Ceph csi couldn't remove pvc with snapshots that was created directly in ceph using rbd or other clients. As a result, the rbd image remains permanently in the trash. Fixes: ceph#3416 Signed-off-by: ruslanloman <[email protected]>
Ceph csi couldn't remove pvc with snapshots that was created directly in ceph using rbd or other clients. As a result, the rbd image remains permanently in the trash. Fixes: ceph#3416 Signed-off-by: ruslanloman <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Ceph csi couldn't remove pvc with snapshots that was created directly in ceph using rbd or other clients. As a result, the rbd image remains permanently in the trash. Fixes: ceph#3416 Signed-off-by: ruslanloman <[email protected]>
Ceph csi couldn't remove pvc with snapshots that was created directly in ceph using rbd or other clients. As a result, the rbd image remains permanently in the trash. Fixes: ceph#3416 Signed-off-by: ruslanloman <[email protected]>
Describe the feature you'd like to have
I want image snapshots to be deleted on image delete request. So basically smth like this:
What is the value to the end user? (why is it a priority?)
It is important because images which have snapshots cannot be deleted from ceph and get stuck in trash. If you try to delete k8s pvc(pv) which has rbd snapshots in ceph, the image won't be actually deleted from ceph and will get stuck in trash. There will be a lot of messages in ceph mgr logs like this:
How will we know we have a good solution? (acceptance criteria)
The image isn't getting stuck in trash.
Additional context
As I can see this problem was solved in Ceph dashboard some time ago:
https://tracker.ceph.com/issues/36404
So they delete snapshots and then delete image.
Ceph CSI stops provisioning new volumes once ceph trash is full. k8s pvcs are just waiting in Provision state and logs are full of messages like:
The text was updated successfully, but these errors were encountered: