Prometheus metrics request #3307

LinFor · 2024-10-06T08:57:27Z

Thanks for great prometheus exporter feature!

Please add following metrics for alerting purposes:

Current raw value (not filtered). This is necessary to track the situation when prev-value is mistakenly set far ahead, while value stops updating and it is unclear whether there are real movements of the counter. Issuing a raw-value will allow you to recognize the fact that the counter is moving, so the presence of changes in the raw-value with the value unchanged for some time is a good target for alerting.
The counter of failed (or vice versa, successful) rounds. Along with ai_on_the_edge_device_rounds_total, such a counter allow to highlight failed rounds and build an alert when the failure threshold is exceeded.
Export status of last round ([9d07h48m01s] 2024-10-06T11:23:19 <INF> [POSTPROC] main: Raw: 10473.9835, Value: 10473.9835, **Status: no error**), I suggest it in the form of a label with constant "1" metric value.
The time that took the entire last round (via gauge-type metric), or probably a cumulative total time spend by all rounds (via counter-type metric). In comparison with ai_on_the_edge_device_uptime_seconds, this metric allows you to assess the load on the system and notify when the threshold is exceeded.

The text was updated successfully, but these errors were encountered:

LinFor added the enhancement New feature or request label Oct 6, 2024

Provide feedback