Bug 1709958: Use graceful shutdown for router by Miciah · Pull Request #240 · openshift/cluster-ingress-operator

Miciah · 2019-05-17T02:34:01Z

Configure the router deployment to ensure graceful shutdown:

Set a lifecycle pre-stop handler that runs the haproxy-router image's shutdown-haproxy script.
Set a 90-second termination grace period.
Create a pod disruption budget with a maximum of 25% unavailable pods.
manifests/00-cluster-role.yaml: Allow the operator to manage PodDisruptionBudget resources.
pkg/operator/controller/controller.go (ensureIngressController): Call ensureRouterPodDisruptionBudget.
pkg/operator/controller/controller_router_deployment.go (desiredRouterDeployment): Set a pre-stop handler. Set a grace period of 90 seconds.
(deploymentConfigChanged): Compare the current and expected container lifecycle handlers and pod termination grace period, and set them in the updated deployment.
pkg/operator/controller/controller_router_deployment_test.go (TestDeploymentConfigChanged): Add test cases for container lifecycle handlers and pod termination grace period.
pkg/operator/controller/names.go (RouterPodDisruptionBudgetName): New function.
pkg/operator/controller/router_poddisruptionbudget.go: New file.
(ensureRouterPodDisruptionBudget): New method. Ensure the appropriate pod disruption budget exists for the given ingress controller. Get the current one using currentRouterPodDisruptionBudget. If none exists, create one using the desiredRouterPodDisruptionBudget function and createRouterPodDisruptionBudget method. If one already exists, update it if necessary using updateRouterPodDisruptionBudget.
(desiredRouterPodDisruptionBudget): New function.
(currentRouterPodDisruptionBudget): New method.
(createRouterPodDisruptionBudget): New method.
(updateRouterPodDisruptionBudget): New method. Use podDisruptionBudgetChanged to determine whether an update is needed.
(podDisruptionBudgetChanged): New function.

Depends on openshift/router#28.

openshift-ci-robot · 2019-05-17T02:34:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danehans · 2019-05-20T16:19:53Z

pkg/operator/controller/controller_router_deployment_test.go

 	for _, tc := range testCases {
 		nineteen := int32(19)
 		fourTwenty := int32(420) // = 0644 octal.
+		thirty := int64(30)


Is thirty expected to match the termination grace period of openshift/router#28? If so, should this be documented here?

I only chose it because it's the default value, which means that it is the value that the operator will see on an existing ingress controller's deployment when the operator is upgraded. It really could be any value (including nil), as long as the "if the deployment terminationGracePeriodSeconds is changed" test case sets a distinct value.

Configure the router deployment to ensure graceful shutdown: * Set a lifecycle pre-stop handler that runs the haproxy-router image's shutdown-haproxy script. * Set a 120-second termination grace period. * Create a pod disruption budget with a maximum of 25% unavailable pods. * Create a NodePort service for the router pod's healthz endpoint, and use that NodePort service for the LoadBalancer service's health-check node-port. This commit is related to bug 1709958. https://bugzilla.redhat.com/show_bug.cgi?id=1709958 * assets/router/deployment.yaml: Use /healthz/live for the liveness probe and /healthz/ready for the readiness probe. * manifests/00-cluster-role.yaml: Allow the operator to manage PodDisruptionBudget resources. * pkg/manifests/bindata.go: Regenerate. * pkg/operator/controller/controller.go (ensureIngressController): Call ensureHealthzService. Pass healthz service to ensureLoadBalancerService. Call ensureRouterPodDisruptionBudget. * pkg/operator/controller/controller_healthz_service.go: New file. (ensureHealthzService): New method. Ensure a healthz NodePort service exists for the given ingress controller if needed. Get the current one using currentHealthzService. If one needs to be created, create one using the desiredHealthzService function and createHealthzService method. If one already exists and none is desired, delete it using deleteHealthzService. (currentHealthzService): New method. (desiredHealthzService): New function. (createHealthzService): New method. (deleteHealthzService): New method. * pkg/operator/controller/controller_internal_service.go: (ensureInternalIngressControllerService): Add a return value indicating whether the service exists. Use the new createInternalService and deleteInternalService methods in order to handle creation and deletion. (currentInternalIngressControllerService): Rename... (currentInternalService): ...to this. Add a return value indicating whether a service exists. (desiredInternalIngressControllerService): Add a return value indicating whether a service is desired. (createInternalService): New method. (deleteInternalService): New method. * pkg/operator/controller/controller_lb.go (ensureLoadBalancerService): Add parameters for specifying whether a NodePort healthz service should be used for the LoadBalancer service. Add a return value indicating whether the service exists. Really return the current service, even in error cases, in order to be consistent with the godoc. Return an error if a healthz service should be used but the provided one has no port. Pass the healthz service node port to desiredLoadBalancerService. Use the new createLoadBalancerService, deleteLoadBalancerService, and updateLoadBalancerService methods in order to handle creation, updating (in particular for .spec.healthCheckNodePort), and deletion. (loadBalancerServiceName): Rename, and move to names.go. (desiredLoadBalancerService): Add a parameter for healthz port. If it is provided, use the parameter value for the LoadBalancer service's HealthCheckNodePort. Add a return value indicating whether a service is desired. (currentLoadBalancerService): Add a return value indicating whether a service exists. (createLoadBalancerService): New method. (deleteLoadBalancerService): New method. (updateLoadBalancerService): New method. Use loadBalancerServicesEqual to determine whether an update is needed. (loadBalancerServicesEqual): New function. For now, only check .spec.healthCheckNodePort. * pkg/operator/controller/controller_router_deployment.go (desiredRouterDeployment): Set a pre-stop handler. Set a grace period of 120 seconds. (deploymentConfigChanged): Compare the current and expected container lifecycle handlers, pod termination grace period, and liveness and readiness probes, and set them in the updated deployment. * pkg/operator/controller/controller_router_deployment_test.go (TestDeploymentConfigChanged): Add test cases for container lifecycle handlers, pod termination grace period, and liveness and readiness probes. * pkg/operator/controller/names.go (RouterPodDisruptionBudgetName): (HealthzServiceName): New functions. (LoadBalancerServiceName): Moved from controller_controller_lb.go. * pkg/operator/controller/router_poddisruptionbudget.go: New file. (ensureRouterPodDisruptionBudget): New method. Ensure the appropriate pod disruption budget exists for the given ingress controller. Get the current one using currentRouterPodDisruptionBudget. If none exists, create one using the desiredRouterPodDisruptionBudget function and createRouterPodDisruptionBudget method. If one already exists, update it if necessary using updateRouterPodDisruptionBudget, or delete it using deleteRouterPodDisruptionBudget if none is desired. (desiredRouterPodDisruptionBudget): New function. (currentRouterPodDisruptionBudget): New method. (createRouterPodDisruptionBudget): New method. (deleteRouterPodDisruptionBudget): New method. (updateRouterPodDisruptionBudget): New method. Use podDisruptionBudgetChanged to determine whether an update is needed. (podDisruptionBudgetChanged): New function.

openshift-ci-robot · 2019-06-05T19:58:23Z

@Miciah: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/unit	`863a6c4`	link	`/test unit`
ci/prow/e2e-aws	`863a6c4`	link	`/test e2e-aws`
ci/prow/e2e-aws-operator	`863a6c4`	link	`/test e2e-aws-operator`
ci/prow/e2e-aws-upgrade	`863a6c4`	link	`/test e2e-aws-upgrade`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2019-06-11T17:41:32Z

@Miciah: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2019-10-23T23:53:00Z

/close
Superseded by #280.

openshift-ci-robot · 2019-10-23T23:53:01Z

@Miciah: Closed this PR.

Details

In response to this:

/close
Superseded by #280.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 17, 2019

openshift-ci-robot requested review from ironcladlou and knobunc May 17, 2019 02:34

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2019

danehans reviewed May 20, 2019

View reviewed changes

Miciah mentioned this pull request May 21, 2019

Bug 1709958: test/e2e/upgrade: Add ingress controller test openshift/origin#22852

Closed

Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch 5 times, most recently from 2d6fcbf to c6b5a68 Compare May 29, 2019 01:41

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 29, 2019

Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from c6b5a68 to c751959 Compare May 29, 2019 01:50

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2019

Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from c751959 to 034d05c Compare June 4, 2019 00:55

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2019

Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from 034d05c to f7ed23c Compare June 5, 2019 15:33

Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from f7ed23c to 863a6c4 Compare June 5, 2019 17:56

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2019

Miciah mentioned this pull request Jul 22, 2019

Add a pod disruption budget for ingress controllers #272

Merged

openshift-ci-robot closed this Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1709958: Use graceful shutdown for router#240

Bug 1709958: Use graceful shutdown for router#240
Miciah wants to merge 1 commit intoopenshift:masterfrom
Miciah:BZ1709958-use-graceful-shutdown-for-router

Miciah commented May 17, 2019 •

edited

Loading

Uh oh!

openshift-ci-robot commented May 17, 2019

Uh oh!

danehans May 20, 2019

Uh oh!

Miciah May 20, 2019

Uh oh!

openshift-ci-robot commented Jun 5, 2019

Uh oh!

openshift-ci-robot commented Jun 11, 2019

Uh oh!

Miciah commented Oct 23, 2019

Uh oh!

openshift-ci-robot commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Miciah commented May 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented May 17, 2019

Uh oh!

danehans May 20, 2019

Choose a reason for hiding this comment

Uh oh!

Miciah May 20, 2019

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Jun 5, 2019

Uh oh!

openshift-ci-robot commented Jun 11, 2019

Uh oh!

Miciah commented Oct 23, 2019

Uh oh!

openshift-ci-robot commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Miciah commented May 17, 2019 •

edited

Loading