metrics suggestion: backup jobs, replication jobs #112

steveej · 2022-05-03T12:06:28Z

hey @znerol, thank you for creating this helpful exporter 🙌

i'd like to track and set up alerts for failed or absent backups, replications, and on high IO delay (the one that's displayed in the webui for each node).

cheers 👋

znerol · 2022-05-03T19:22:01Z

This exporter is using the PVE REST API. Looking through the API docs I have found the following interesting routes possibly covering your requirements (at least partly):

absent backups:
cluster/backup-info/not-backet-up lists all guests (qemu and lxc) which are not covered by any backup plan.
failed backups:
Maybe this is extractable from /cluster/backup.
failed replications:
Maybe this is extractable from /cluster/replication

Regarding high IO delay I recommend to take a look at node_exporter. For node level metrics, this is usually the better option.

steveej · 2022-05-04T11:23:00Z

thanks @znerol

cluster/backup-info/not-backet-up lists all guests (qemu and lxc) which are not covered by any backup plan.

while i originally meant backup jobs who for some reason didn't execute, i also like the idea of alerting when a VM doesn't have a backup job at all.

for the rest i'll also have a look at the API to see which items would be useful to add.

Regarding high IO delay I recommend to take a look at node_exporter. For node level metrics, this is usually the better option.

indeed, thanks! i thought PVE was doing something special but according to the frontend code it evaluates the system's wait load, which can be gathered otherwise.

xziy · 2023-10-01T17:41:07Z

Hello everyone, is there any progress? I faced a similar problem. I need to know which machines were left without backup, or there was an error.

StarkZarn · 2024-02-19T19:14:52Z

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

znerol · 2024-02-20T07:37:55Z

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

Please use node_exporter for the iowait metric. Take a look at this blog post for a start.

StarkZarn · 2024-02-21T01:04:08Z

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

Please use node_exporter for the iowait metric. Take a look at this blog post for a start.

Thank you!

Add replication metrics as requested in issue #112. * Replication Metrics are fetched per node * The metrics can be enabled or disabled Based on the original PR #166 adapted the new file structure. --------- Signed-off-by: Sven Gerber <[email protected]> Co-authored-by: znerol <[email protected]> Co-authored-by: Marian Koreniuk <[email protected]>

znerol · 2024-04-27T10:13:27Z

Thenks to @svengerber and @themoriarti, replication metrics are available as of release v3.3.0.

znerol changed the title ~~metrics suggestion: backup jobs, replication jobs, and IO delay~~ metrics suggestion: backup jobs, replication jobs Feb 20, 2024

svengerber mentioned this issue Apr 18, 2024

Add ZFS replication metrics #243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics suggestion: backup jobs, replication jobs #112

metrics suggestion: backup jobs, replication jobs #112

steveej commented May 3, 2022

znerol commented May 3, 2022 •

edited

Loading

steveej commented May 4, 2022

xziy commented Oct 1, 2023

StarkZarn commented Feb 19, 2024

znerol commented Feb 20, 2024

StarkZarn commented Feb 21, 2024

znerol commented Apr 27, 2024

metrics suggestion: backup jobs, replication jobs #112

metrics suggestion: backup jobs, replication jobs #112

Comments

steveej commented May 3, 2022

znerol commented May 3, 2022 • edited Loading

steveej commented May 4, 2022

xziy commented Oct 1, 2023

StarkZarn commented Feb 19, 2024

znerol commented Feb 20, 2024

StarkZarn commented Feb 21, 2024

znerol commented Apr 27, 2024

znerol commented May 3, 2022 •

edited

Loading