Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait and retry if VM is locked #95

Open
1 task
dani opened this issue Dec 9, 2024 · 5 comments
Open
1 task

Wait and retry if VM is locked #95

dani opened this issue Dec 9, 2024 · 5 comments

Comments

@dani
Copy link

dani commented Dec 9, 2024

What happened?

I'm using autosnap on a 10 nodes cluster with ~100 VM running. It mostly works great, but sometimes, it fails when a VM which should be snapshotted is locked. In my case, the lock usually from one of these cases

  • A backup is running
  • Another cv4pve-autosnap is running (eg daily snap vs hourly snap)

This cluster is running on Ceph, which might not be the fastest to create snapshots (although it has more than decent perf globally, 40 NVMe OSD with a 2x25Gbps dedicated network)

I tried to spread the various jobs at different times (eg, hourly runs each hour past 4min, daily at 00:08, weekly at 00:12 on sunday etc. and backups only starts at xxh20), but I still have errors from time to time

Expected behavior

cv4pve-autosnap could wait and retry later if a VM is locked

Relevant log output

No response

Proxmox VE Version

8.2.8

Version (bug)

1.1.11

Version (working)

No response

On what operating system are you experiencing the issue?

Linux

Pull Request

  • I would like to do a Pull Request
@franklupo
Copy link
Member

If the vm is busy with other operations you can't run snap, even from web GUI. Wait how long?
Why should multiple cv4pve-autosnaps overlap?

@dani
Copy link
Author

dani commented Dec 9, 2024

I have overlaps because I run several cron, one for each snapshot label, eg

*/30 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label frequently --keep 6 --only-running || echo "autosnap failed"
4 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label hourly --keep 6 --only-running || echo "autosnap failed"
8 0 * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label daily --keep 4 --only-running || echo "autosnap failed"
12 0 * * 0 root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label weekly --keep 2 --only-running || echo "autosnap failed

On rare occasions, the first job (frequently) is still running when the second (hourly) starts. There's also cases where backups are running (say I'm starting the backup at 20:20, then at 20:30, when the frequently cv4pve-autosnap is triggered, backups are not always done, holding a lock on VM)

@franklupo
Copy link
Member

yes, even backups fail if a snap is being created. You could use a script that does not take snapshots when there are backups, or take snapshots at the end of the backup.

https://git.proxmox.com/?p=pve-manager.git;a=blob;f=vzdump-hook-script.pl;h=a93eeec80bd09128e70a4a9775438ab658da2191;hb=refs/heads/master

@franklupo
Copy link
Member

news?

@dani
Copy link
Author

dani commented Jan 3, 2025

As a workaround, I now disable autosnap during the backup window

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants