-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3666 - Queue Monitoring - Enable Prometheus Metrics #4012
Conversation
sources/packages/backend/apps/queue-consumers/src/queues-metrics.module.module.ts
Show resolved
Hide resolved
|
if (!this.monitoredQueueProviders) { | ||
const queues = await this.queueService.queueConfigurationModel(); | ||
this.monitoredQueueProviders = queues | ||
.filter((queueModel) => queueModel.isActive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not ignoring the FT queues here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full-time toggle is not checked here. If the queues are not executing they will not produce any event.
The events will be associated but never triggered. I do not see any harm in it. Please let me know if further discussion would be required on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required and I am good based on what we discussed.
*/ | ||
setGlobalMetricsConfigurations(): void { | ||
register.setDefaultLabels({ app: DEFAULT_METRICS_APP_LABEL }); | ||
collectDefaultMetrics({ labels: { app: DEFAULT_METRICS_APP_LABEL } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for collect default metrics, is it not necesscary to set the register. Is it because we are using default registry? let me know if I am missing something.
collectDefaultMetrics({
labels: { app: DEFAULT_METRICS_APP_LABEL, register },
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is no need to set the register unless using a custom one.
https://github.com/siimon/prom-client?tab=readme-ov-file#default-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work and amazing solution to share metrics to Sysdig.
Thanks for doing the change. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work @andrewsignori-aot 👍 Thank you for the PR walkthrough.
/metrics
endpoint onqueue-consumers
to expose Prometheus metrics following BC Gov docs.active
,completed
,failed
,delayed
, and 'waiting' jobs. This represents the same status observed in the Bull Board. The garage metrics rely on Redis queries and will always get the most updated values from Redis.metrics
endpoint is invoked. This metric is not needed to achieve queue monitoring but seems a great addition and useful data to support future analysis.queueName
,queueEvent
, andqueueType
to allow querying in Sysdig.Sysdig POCs
The Sysdig configurations are not final and were created to support the validation of the code in this PR but should not be considered final or part of the PR evaluation.
Alerts generated for a queue with a failed job.
Same alerts are configured to be sent using an email channel.
Sample Dashboard
The Queues Overview dashboard has some examples of data but should not be considered final or part of the PR evaluation.
Note: the sysdig users and roles were updated and added to this PR, and it is already deployed to both tools environments. If time allows, further effort can be made to enhance the current process but any action beyond the user list update is not part of this PR.