Skip to content

Bug 1709958: Use graceful shutdown for router#240

Closed
Miciah wants to merge 1 commit intoopenshift:masterfrom
Miciah:BZ1709958-use-graceful-shutdown-for-router
Closed

Bug 1709958: Use graceful shutdown for router#240
Miciah wants to merge 1 commit intoopenshift:masterfrom
Miciah:BZ1709958-use-graceful-shutdown-for-router

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented May 17, 2019

Configure the router deployment to ensure graceful shutdown:

  • Set a lifecycle pre-stop handler that runs the haproxy-router image's shutdown-haproxy script.

  • Set a 90-second termination grace period.

  • Create a pod disruption budget with a maximum of 25% unavailable pods.

  • manifests/00-cluster-role.yaml: Allow the operator to manage PodDisruptionBudget resources.

  • pkg/operator/controller/controller.go (ensureIngressController): Call ensureRouterPodDisruptionBudget.

  • pkg/operator/controller/controller_router_deployment.go (desiredRouterDeployment): Set a pre-stop handler. Set a grace period of 90 seconds.
    (deploymentConfigChanged): Compare the current and expected container lifecycle handlers and pod termination grace period, and set them in the updated deployment.

  • pkg/operator/controller/controller_router_deployment_test.go (TestDeploymentConfigChanged): Add test cases for container lifecycle handlers and pod termination grace period.

  • pkg/operator/controller/names.go (RouterPodDisruptionBudgetName): New function.

  • pkg/operator/controller/router_poddisruptionbudget.go: New file.
    (ensureRouterPodDisruptionBudget): New method. Ensure the appropriate pod disruption budget exists for the given ingress controller. Get the current one using currentRouterPodDisruptionBudget. If none exists, create one using the desiredRouterPodDisruptionBudget function and createRouterPodDisruptionBudget method. If one already exists, update it if necessary using updateRouterPodDisruptionBudget.
    (desiredRouterPodDisruptionBudget): New function.
    (currentRouterPodDisruptionBudget): New method.
    (createRouterPodDisruptionBudget): New method.
    (updateRouterPodDisruptionBudget): New method. Use podDisruptionBudgetChanged to determine whether an update is needed.
    (podDisruptionBudgetChanged): New function.


Depends on openshift/router#28.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2019
for _, tc := range testCases {
nineteen := int32(19)
fourTwenty := int32(420) // = 0644 octal.
thirty := int64(30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is thirty expected to match the termination grace period of openshift/router#28? If so, should this be documented here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only chose it because it's the default value, which means that it is the value that the operator will see on an existing ingress controller's deployment when the operator is upgraded. It really could be any value (including nil), as long as the "if the deployment terminationGracePeriodSeconds is changed" test case sets a distinct value.

@Miciah Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch 5 times, most recently from 2d6fcbf to c6b5a68 Compare May 29, 2019 01:41
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 29, 2019
@Miciah Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from c6b5a68 to c751959 Compare May 29, 2019 01:50
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2019
@Miciah Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from c751959 to 034d05c Compare June 4, 2019 00:55
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2019
@Miciah Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from 034d05c to f7ed23c Compare June 5, 2019 15:33
Configure the router deployment to ensure graceful shutdown:

* Set a lifecycle pre-stop handler that runs the haproxy-router image's
  shutdown-haproxy script.
* Set a 120-second termination grace period.
* Create a pod disruption budget with a maximum of 25% unavailable pods.
* Create a NodePort service for the router pod's healthz endpoint, and use
  that NodePort service for the LoadBalancer service's health-check
  node-port.

This commit is related to bug 1709958.

https://bugzilla.redhat.com/show_bug.cgi?id=1709958

* assets/router/deployment.yaml: Use /healthz/live for the liveness probe
and /healthz/ready for the readiness probe.
* manifests/00-cluster-role.yaml: Allow the operator to manage
PodDisruptionBudget resources.
* pkg/manifests/bindata.go: Regenerate.
* pkg/operator/controller/controller.go (ensureIngressController): Call
ensureHealthzService.  Pass healthz service to ensureLoadBalancerService.
Call ensureRouterPodDisruptionBudget.
* pkg/operator/controller/controller_healthz_service.go: New file.
(ensureHealthzService): New method.  Ensure a healthz NodePort service
exists for the given ingress controller if needed.  Get the current one
using currentHealthzService.  If one needs to be created, create one using
the desiredHealthzService function and createHealthzService method.  If one
already exists and none is desired, delete
it using deleteHealthzService.
(currentHealthzService): New method.
(desiredHealthzService): New function.
(createHealthzService): New method.
(deleteHealthzService): New method.
* pkg/operator/controller/controller_internal_service.go:
(ensureInternalIngressControllerService): Add a return value indicating
whether the service exists.  Use the new createInternalService and
deleteInternalService methods in order to handle creation and deletion.
(currentInternalIngressControllerService): Rename...
(currentInternalService): ...to this.  Add a return value indicating
whether a service exists.
(desiredInternalIngressControllerService): Add a return value indicating
whether a service is desired.
(createInternalService): New method.
(deleteInternalService): New method.
* pkg/operator/controller/controller_lb.go
(ensureLoadBalancerService): Add parameters for specifying whether a
NodePort healthz service should be used for the LoadBalancer service.  Add
a return value indicating whether the service exists.  Really return the
current service, even in error cases, in order to be consistent with the
godoc.  Return an error if a healthz service should be used but the
provided one has no port.  Pass the healthz service node port to
desiredLoadBalancerService.  Use the new createLoadBalancerService,
deleteLoadBalancerService, and updateLoadBalancerService methods in order
to handle creation, updating (in particular for .spec.healthCheckNodePort),
and deletion.
(loadBalancerServiceName): Rename, and move to names.go.
(desiredLoadBalancerService): Add a parameter for healthz port.  If it is
provided, use the parameter value for the LoadBalancer service's
HealthCheckNodePort.  Add a return value indicating whether a service is
desired.
(currentLoadBalancerService): Add a return value indicating whether a
service exists.
(createLoadBalancerService): New method.
(deleteLoadBalancerService): New method.
(updateLoadBalancerService): New method.  Use loadBalancerServicesEqual to
determine whether an update is needed.
(loadBalancerServicesEqual): New function.  For now, only check
.spec.healthCheckNodePort.
* pkg/operator/controller/controller_router_deployment.go
(desiredRouterDeployment): Set a pre-stop handler.  Set a grace period of
120 seconds.
(deploymentConfigChanged): Compare the current and expected container
lifecycle handlers, pod termination grace period, and liveness and
readiness probes, and set them in the updated deployment.
* pkg/operator/controller/controller_router_deployment_test.go
(TestDeploymentConfigChanged): Add test cases for container lifecycle
handlers, pod termination grace period, and liveness and readiness probes.
* pkg/operator/controller/names.go (RouterPodDisruptionBudgetName):
(HealthzServiceName): New functions.
(LoadBalancerServiceName): Moved from controller_controller_lb.go.
* pkg/operator/controller/router_poddisruptionbudget.go: New file.
(ensureRouterPodDisruptionBudget): New method.  Ensure the appropriate pod
disruption budget exists for the given ingress controller.  Get the current
one using currentRouterPodDisruptionBudget.  If none exists, create one
using the desiredRouterPodDisruptionBudget function and
createRouterPodDisruptionBudget method.  If one already exists, update it
if necessary using updateRouterPodDisruptionBudget, or delete it using
deleteRouterPodDisruptionBudget if none is desired.
(desiredRouterPodDisruptionBudget): New function.
(currentRouterPodDisruptionBudget): New method.
(createRouterPodDisruptionBudget): New method.
(deleteRouterPodDisruptionBudget): New method.
(updateRouterPodDisruptionBudget): New method.  Use
podDisruptionBudgetChanged to determine whether an update is needed.
(podDisruptionBudgetChanged): New function.
@Miciah Miciah force-pushed the BZ1709958-use-graceful-shutdown-for-router branch from f7ed23c to 863a6c4 Compare June 5, 2019 17:56
@openshift-ci-robot
Copy link
Contributor

@Miciah: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/unit 863a6c4 link /test unit
ci/prow/e2e-aws 863a6c4 link /test e2e-aws
ci/prow/e2e-aws-operator 863a6c4 link /test e2e-aws-operator
ci/prow/e2e-aws-upgrade 863a6c4 link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

@Miciah: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2019
@Miciah
Copy link
Contributor Author

Miciah commented Oct 23, 2019

/close
Superseded by #280.

@openshift-ci-robot
Copy link
Contributor

@Miciah: Closed this PR.

Details

In response to this:

/close
Superseded by #280.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants