-
Notifications
You must be signed in to change notification settings - Fork 219
NE-199 Phase 2: Add periodic canary route HTTP checks w/ metrics & basic status reporting #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-199 Phase 2: Add periodic canary route HTTP checks w/ metrics & basic status reporting #493
Conversation
14a2028 to
1db6ee0
Compare
1db6ee0 to
30f7e61
Compare
|
/test e2e-aws-operator so I can verify the e2e tests I drafted |
|
Soliciting initial feedback 🚢 |
4920a28 to
fd7eac0
Compare
|
need to investigate why the new canary e2e test is not passing |
5af1d60 to
c7dfe44
Compare
|
/retest |
f57b083 to
c0085e8
Compare
c0085e8 to
0f519d7
Compare
|
|
|
/retest |
1 similar comment
|
/retest |
0f519d7 to
ae2d98c
Compare
|
/retest |
| ) | ||
|
|
||
| var ( | ||
| CanaryRequestTime = prometheus.NewHistogramVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RiRa12621 Hey if you have a second could you glance at these metrics and make sure they look good to you? Asking since we've collaborated on Ingress metrics in the past 😄
|
/retest |
|
Looks good, but #498 has caused a conflict. |
Add tcnksm/go-httpstat to go.mod and vendor/ The go-httpstat dep wraps the go httptrace library with boiler-plate timing definitions. Add prometheus/client_golang to go.mod and vendor/ This allows the ingress operator to create it's own metrics.
Use the stop channel used to control the operator in cmd/ingress-operator/start.go to control the Canary metrics and canary route polling loop.
ecdc75c to
b8cee02
Compare
Fixed! Ready to go 🚢 |
|
CI is failing on VPC limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just one case where we don't handle an err in the tests that needs fixing. The rest observations/comments/suggestions.
pkg/operator/controller/canary: Add basic HTTP checks for the canary route created in NE-199 phase 1. Add canary metrics to the canary controller and populate them with the results of the canary checks. Add a canary check status condition to the default ingress controller to mark the ingress operator as degraded when canary checks are unsucessful for a sustained period of time. pkg/operator/controller/canary/http.go: New file to define the canary route check HTTP funcs using go-httpstat. pkg/operator/controller/canary/metrics.go: New file to define the canary route check metrics to be collected via prometheus.
pkg/operator/controller/ingress/controller.go: Define the canary status condition name. pkg/operator/controller/ingress/status.go: Mark the default ingress controller as degraded when the canary status condition has been set to false for a sustained period of time. pkg/operator/controller/ingress/status_test.go: Add test cases to verify the default ingress controller canary status condition changes.
Add a test to test/e2e/operator_test.go that verifies the ingress clusteroperator's status related objects. Add 2 tests to a new test file, test/e2e/canary_test.go. The first test verifies that the Ingress canary route and echo pod (hello-openshift) work as intended. The second test verifies that the default ingress controller reports the canary check success status condition after a short period of time.
`origin-hello-openshift` is now publicly available on quay.io. Use the publicly available image in `manifests/image-references` (instead of the outdated `hello-openshift` image hosted on DockerHub). Note: This only affects ingress operator development, as the CVO overrides this reference to use the hello-openshift image in the release payload anyways.
b8cee02 to
91b9929
Compare
|
#493 (comment) made me realize, |
I believe so. I'll make a note to fix that in a follow up. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah, sgreene570 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
|
/retest |
|
/refresh |
|
@sgreene570: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
One more to go! |
NE-199 Phase 2: Add periodic canary route HTTP checks w/ metrics & basic status reporting
Follow up PR to #476 in support of NE-199.
This PR adds logic to periodically test the canary route. Canary probe results are reported very verbosely via the ingress operator's logs, somewhat verbosely via newly declared Prometheus metrics, and very granularly through the default ingress controllers status conditions.
This PR adds the
go-httpstat, which provides boiler plate code for measuring HTTP latencies off of go's nativehttpstatlibrary.