LXD cluster does not return metrics for the project if none instance form that project is run on the queried node. #12775

tregubovav-dev · 2024-01-26T06:17:31Z

Required information

Distribution: Ubuntu
Distribution version: 23.10 (Mantic) (arm64)
The output of "lxc info" or if that fails:
- Kernel version: 6.5.0-1009-raspi # 12-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 17 11:45:08 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
- LXD version: 5.19
- Storage backend in use: microceph

Issue description

LXD metric API does not return metrics for the project if none instance from project run on queried node. Query return metrics only for projects which nodes implicitly run on queried node.

Steps to reproduce

Create LXD cluster with 3+ nodes.
Create one or more additional project(s) in the cluster
Deploy and start several instances to all of projects. Be sure that each node hosts instances from every project
Run lxc query /1.0/metrics command on all nodes and ensure that the query returns metrics for all instances in all projects in the cluster.
Stop instances from project default hosted on one of the nodes (be sure that other instances from project "default" continue running on other nodes) and then run lxc query /1.0/metrics command on that node. Query return metrics for all instances from all project except project "default".
run lxc query /1.0/metrics command on other nodes and ensure that the query returns metrics for all instances in all projects in the cluster.

This behavior garbles metric collected by external scrapes and external dashboards like Prometheus+Graphana.

The text was updated successfully, but these errors were encountered:

tomponline · 2024-02-21T13:17:23Z

@tregubovav-dev is this still an issue with LXD 5.20?

@simondeziel would you mind seeing if you can validate if this remains an issue?

simondeziel · 2024-03-13T18:41:41Z

Since the introduction of metrics_instances_count extension, this bug is fixed. Here's how I did the initial reproduction with 5.19/stable:

$ lxc launch ubuntu-daily:22.04 c1 -c security.nesting=true -c security.devlxd.images=true
$ lxc shell c1
# snap refresh lxd --channel 5.19/stable
lxd (5.19/stable) 5.19-8635f82 from Canonical✓ refreshed
# lxd init --auto
# lxc init ubuntu-minimal-daily:22.04 c2
# lxc query /1.0/metrics | grep -v ^lxd_go | grep -v ^#
lxd_operations_total 1
lxd_warnings_total 3
lxd_uptime_seconds 65.457576337

This confirms stopped instances are not reported about. Now with 5.21/edge that includes the metrics_instances_count extension, offline instances are reported:

# snap refresh lxd --channel 5.21/edge
# lxc query /1.0/metrics | grep -v ^lxd_go | grep -v ^# | grep -wF c2
lxd_cpu_seconds_total{cpu="0",mode="system",name="c2",project="default",state="STOPPED",type="container"} 0
lxd_cpu_seconds_total{cpu="0",mode="user",name="c2",project="default",state="STOPPED",type="container"} 0
lxd_cpu_effective_total{name="c2",project="default",state="STOPPED",type="container"} -1
lxd_filesystem_avail_bytes{device="",fstype="zfs",mountpoint="/",name="c2",project="default",state="STOPPED",type="container"} 1.5333982208e+11
lxd_filesystem_free_bytes{device="",fstype="zfs",mountpoint="/",name="c2",project="default",state="STOPPED",type="container"} 1.5333982208e+11
lxd_filesystem_size_bytes{device="",fstype="zfs",mountpoint="/",name="c2",project="default",state="STOPPED",type="container"} 1.54700218368e+11
lxd_memory_Active_bytes{name="c2",project="default",state="STOPPED",type="container"} 0
lxd_memory_Inactive_bytes{name="c2",project="default",state="STOPPED",type="container"} 0
lxd_memory_MemAvailable_bytes{name="c2",project="default",state="STOPPED",type="container"} 3.1642516001e+10
lxd_memory_MemFree_bytes{name="c2",project="default",state="STOPPED",type="container"} 3.1642516001e+10
lxd_memory_MemTotal_bytes{name="c2",project="default",state="STOPPED",type="container"} 3.1642516e+10
lxd_memory_Swap_bytes{name="c2",project="default",state="STOPPED",type="container"} -1
lxd_memory_OOM_kills_total{name="c2",project="default",state="STOPPED",type="container"} -1
lxd_procs_total{name="c2",project="default",state="STOPPED",type="container"} 0

So I believe your specific bug is fixed but since I have not use the exact same reproducing steps (cluster setup), please do re-open the bug if not fixed in 5.21 or later.

tomponline assigned simondeziel Feb 21, 2024

simondeziel closed this as completed Mar 13, 2024

simondeziel mentioned this issue Mar 13, 2024

Bogus metrics for offline instances #13136

Closed

simondeziel mentioned this issue Mar 26, 2024

lxd_containers and lxd_vms metrics are not showing in lxc query /1.0/metrics #13217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LXD cluster does not return metrics for the project if none instance form that project is run on the queried node. #12775

LXD cluster does not return metrics for the project if none instance form that project is run on the queried node. #12775

tregubovav-dev commented Jan 26, 2024 •

edited

Loading

tomponline commented Feb 21, 2024

simondeziel commented Mar 13, 2024

LXD cluster does not return metrics for the project if none instance form that project is run on the queried node. #12775

LXD cluster does not return metrics for the project if none instance form that project is run on the queried node. #12775

Comments

tregubovav-dev commented Jan 26, 2024 • edited Loading

Required information

Issue description

Steps to reproduce

tomponline commented Feb 21, 2024

simondeziel commented Mar 13, 2024

tregubovav-dev commented Jan 26, 2024 •

edited

Loading