You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whether it's done through kube-state-metrics support for monitoring Custom Resources, or if we actually go to the trouble of emitting Prometheus metrics, we should have some AlertManager controls around "what if things go wrong"
I think the web server should emit the health metrics and it should be passively monitoring the database
However that conflicts with the serverless design, which says that nothing passively monitors the database on a continuous basis (how do you expect to get wall clock usage down if we're doing continuous monitoring? no, we load test at rollout time, do canary analysis, and then scale down until an event requires scaling back up...)
Anyway, in the context of all of that, we need an alert that will tell us when Production is not staying fresh.
It can be that Production is kept fresh by the GHA workflows from #22 – the alert should not be firing simply because we don't see a cronjob that has run recently.
We can dial back the KEDA frequency to match what we've told GitHub once we have KEDA, and monitor kube-state-metrics without any resident process required, just something to reconcile once before the health checking times out.
Not sure how to handle this yet. Punting to a later release, as I'm out of release tokens for today.
The text was updated successfully, but these errors were encountered:
kingdonb
pushed a commit
to kingdonb/bootstrap-repo
that referenced
this issue
Jun 27, 2023
Whether it's done through kube-state-metrics support for monitoring Custom Resources, or if we actually go to the trouble of emitting Prometheus metrics, we should have some AlertManager controls around "what if things go wrong"
I think the web server should emit the health metrics and it should be passively monitoring the database
However that conflicts with the serverless design, which says that nothing passively monitors the database on a continuous basis (how do you expect to get wall clock usage down if we're doing continuous monitoring? no, we load test at rollout time, do canary analysis, and then scale down until an event requires scaling back up...)
Anyway, in the context of all of that, we need an alert that will tell us when Production is not staying fresh.
It can be that Production is kept fresh by the GHA workflows from #22 – the alert should not be firing simply because we don't see a cronjob that has run recently.
We can dial back the KEDA frequency to match what we've told GitHub once we have KEDA, and monitor kube-state-metrics without any resident process required, just something to reconcile once before the health checking times out.
Not sure how to handle this yet. Punting to a later release, as I'm out of release tokens for today.
The text was updated successfully, but these errors were encountered: