Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions manifests/0000_90_ingress-operator_03_prometheusrules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: ingress-operator
namespace: openshift-ingress-operator
labels:
role: alert-rules
spec:
groups:
- name: openshift-ingress.rules
rules:
- alert: HAProxyReloadFail
expr: increase(template_router_reload_fails[5m]) > 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often does it reload? Because some failure every 5 minutes still seems ok to me.
If we make it alert on the first failure I'm afraid it will be noisy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HAProxy reloads often. How can we change this expression to not alert on the first failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, if HAProxy fails to reload, no successive reloads will succeed and the router will become "wedged". So I think we want to alert on the first reload failure. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift/router#190 needs to be back-ported to 4.5 to reduce the reload alert noise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so if it fails once, it's consecutively going to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes.

for: 5m
labels:
severity: critical
annotations:
message: "HAProxy reloads have failed on {{ $labels.pod }}. Router is not respecting recently created or modified routes"
- alert: HAProxyDown
expr: haproxy_up == 0
for: 5m
labels:
severity: critical

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's add a warning if more are up than down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RiRa12621 the default ingress controller run 2 HAProxy instances. With that in mind, is it not a good idea to alert when any HAProxy instances report haproxy_up == 0?

annotations:
message: "HAProxy metrics are reporting that the router is down"
24 changes: 24 additions & 0 deletions pkg/manifests/bindata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.