-
Notifications
You must be signed in to change notification settings - Fork 231
Add liveness/readiness probes #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ import ( | |
| "flag" | ||
| "fmt" | ||
| "os" | ||
| "time" | ||
|
|
||
| configv1 "github.com/openshift/api/config/v1" | ||
| "github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1" | ||
|
|
@@ -14,6 +15,7 @@ import ( | |
| "github.com/openshift/machine-api-operator/pkg/version" | ||
| "k8s.io/klog" | ||
| "sigs.k8s.io/controller-runtime/pkg/client/config" | ||
| "sigs.k8s.io/controller-runtime/pkg/healthz" | ||
| "sigs.k8s.io/controller-runtime/pkg/manager" | ||
| "sigs.k8s.io/controller-runtime/pkg/runtime/signals" | ||
| ) | ||
|
|
@@ -26,6 +28,11 @@ func main() { | |
| watchNamespace := flag.String("namespace", "", "Namespace that the controller watches to reconcile machine-api objects. If unspecified, the controller watches for machine-api objects across all namespaces.") | ||
| metricsAddress := flag.String("metrics-bind-address", metrics.DefaultMachineMetricsAddress, "Address for hosting metrics") | ||
| flag.Set("logtostderr", "true") | ||
| healthAddr := flag.String( | ||
| "health-addr", | ||
| ":9440", | ||
| "The address for health checking.", | ||
| ) | ||
| flag.Parse() | ||
|
|
||
| if printVersion { | ||
|
|
@@ -34,9 +41,12 @@ func main() { | |
| } | ||
|
|
||
| cfg := config.GetConfigOrDie() | ||
| syncPeriod := 10 * time.Minute | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At some point this fix was lost.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you please link the specific commit where this was lost and put it back in its own commit?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @enxebre fixed
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for splitting the commits! fwiw I didn't meant to necessarily cherry-pick back the syncPeriod commit but rather add a link in the description pointing to the commit which dropped it by accident. Can you please link the counter part PRs for the actuators in the PR description and elaborate a bit on the motivation behind this change (OCPCLOUD-785 is not something public) and also at minimum explain which others providers will be affected by this, e.g openstack/rhv/metal3. That along with the commits as they are broken down now would have dramatically reduce the friction and time to review this PR in the first place. It would also make extremely easier for people getting here with less context to understand the reasoning behind the change. People with less context includes ourselves in a month from now or doing context switching from other repos. Usually we elaborate the reasoning behind a change in
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They won't be affected immediately, even if it gets merged, I'll open issues in other repos.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mm wouldn't they break as soon as this get merged, as the health check will fail for them when the mao runs?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, let me open a couple of issues. |
||
|
|
||
| opts := manager.Options{ | ||
| MetricsBindAddress: *metricsAddress, | ||
| MetricsBindAddress: *metricsAddress, | ||
| HealthProbeBindAddress: *healthAddr, | ||
| SyncPeriod: &syncPeriod, | ||
| } | ||
| if *watchNamespace != "" { | ||
| opts.Namespace = *watchNamespace | ||
|
|
@@ -70,6 +80,14 @@ func main() { | |
|
|
||
| capimachine.AddWithActuator(mgr, machineActuator) | ||
|
|
||
| if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil { | ||
| klog.Fatal(err) | ||
| } | ||
|
|
||
| if err := mgr.AddHealthzCheck("ping", healthz.Ping); err != nil { | ||
| klog.Fatal(err) | ||
| } | ||
|
|
||
| if err := mgr.Start(signals.SetupSignalHandler()); err != nil { | ||
| klog.Fatalf("Failed to run manager: %v", err) | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,7 @@ import ( | |||||||||||||
| apierrors "k8s.io/apimachinery/pkg/api/errors" | ||||||||||||||
| "k8s.io/apimachinery/pkg/api/resource" | ||||||||||||||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||||||||||||||
| "k8s.io/apimachinery/pkg/util/intstr" | ||||||||||||||
| "k8s.io/apimachinery/pkg/util/wait" | ||||||||||||||
| "k8s.io/utils/pointer" | ||||||||||||||
| ) | ||||||||||||||
|
|
@@ -30,6 +31,9 @@ const ( | |||||||||||||
| machineExposeMetricsPort = 8441 | ||||||||||||||
| machineSetExposeMetricsPort = 8442 | ||||||||||||||
| machineHealthCheckExposeMetricsPort = 8444 | ||||||||||||||
| defaultMachineHealthPort = 9440 | ||||||||||||||
| defaultMachineSetHealthPort = 9441 | ||||||||||||||
| defaultMachineHealthCheckHealthPort = 9442 | ||||||||||||||
|
Comment on lines
+34
to
+36
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be being a bit pedantic, but would it be a pain to make these that same as the metrics ports but +1000 for consistency? WDYT?
Suggested change
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will be something breaking PRs across 5 repos 😁
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh really? 😓 We should be using constants for this really, but I'll let that be a future improvement |
||||||||||||||
| kubeRBACConfigName = "config" | ||||||||||||||
| certStoreName = "machine-api-controllers-tls" | ||||||||||||||
| ) | ||||||||||||||
|
|
@@ -379,6 +383,26 @@ func newContainers(config *OperatorConfig, features map[string]bool) []corev1.Co | |||||||||||||
| Name: "webhook-server", | ||||||||||||||
| ContainerPort: 8443, | ||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| Name: "healthz", | ||||||||||||||
| ContainerPort: defaultMachineSetHealthPort, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| ReadinessProbe: &corev1.Probe{ | ||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/healthz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| LivenessProbe: &corev1.Probe{ | ||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/readyz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| VolumeMounts: []corev1.VolumeMount{ | ||||||||||||||
| { | ||||||||||||||
|
|
@@ -404,6 +428,26 @@ func newContainers(config *OperatorConfig, features map[string]bool) []corev1.Co | |||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| Ports: []corev1.ContainerPort{{ | ||||||||||||||
| Name: "healthz", | ||||||||||||||
| ContainerPort: defaultMachineHealthPort, | ||||||||||||||
| }}, | ||||||||||||||
| ReadinessProbe: &corev1.Probe{ | ||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/healthz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| LivenessProbe: &corev1.Probe{ | ||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/readyz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| { | ||||||||||||||
| Name: "nodelink-controller", | ||||||||||||||
|
|
@@ -418,6 +462,28 @@ func newContainers(config *OperatorConfig, features map[string]bool) []corev1.Co | |||||||||||||
| Command: []string{"/machine-healthcheck"}, | ||||||||||||||
| Args: args, | ||||||||||||||
| Resources: resources, | ||||||||||||||
| Ports: []corev1.ContainerPort{ | ||||||||||||||
| { | ||||||||||||||
| Name: "healthz", | ||||||||||||||
| ContainerPort: defaultMachineHealthCheckHealthPort, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| ReadinessProbe: &corev1.Probe{ | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why does this show in both this and previous commit? |
||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/healthz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| LivenessProbe: &corev1.Probe{ | ||||||||||||||
| Handler: corev1.Handler{ | ||||||||||||||
| HTTPGet: &corev1.HTTPGetAction{ | ||||||||||||||
| Path: "/readyz", | ||||||||||||||
| Port: intstr.Parse("healthz"), | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| }, | ||||||||||||||
| } | ||||||||||||||
| return containers | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a raw number here but it's a constant here 6c8fa98#diff-fa45321336db7ad1cedc28bf643a4f97R34
can't we have a constant that we use everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this only for that PR without introducing circular dependency and having forcefully revendor MAO from a local branch for each of 5 pending provider implementation. I'm going to change it only here for now, if we still want to merge it all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in order to do this change I'll have to remove the rest of
glogusages, or it breaks flag initialization.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't do this right now. Let's prioritize
glogusage removal.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this relates to any revendor. The code I pointed live all in this repo.
Any way, not a blocker to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, i think making this a constant would be a good improvement for a followup