Skip to content

Add shell script for graceful shutdown#28

Closed
Miciah wants to merge 1 commit intoopenshift:masterfrom
Miciah:BZ1709958-add-shell-script-for-graceful-shutdown
Closed

Add shell script for graceful shutdown#28
Miciah wants to merge 1 commit intoopenshift:masterfrom
Miciah:BZ1709958-add-shell-script-for-graceful-shutdown

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented May 17, 2019

Add a shutdown-haproxy shell script, which can be used in as the pre-stop handler for a router deployment in order to provide graceful shutdown.

Whereas a hard shutdown simply sends a SIGTERM signal to all haproxy processes to make them immediately shut down, a graceful shutdown does the following:

  1. Create a marker file that tells the reload-haproxy script to exit immediately.

  2. Sleep for 30 seconds to allow other proxies such as kube-proxy and any cloud load-balancers to drain connections.

  3. Send a SIGUSR1 signal to all HAProxy processes to make them stop accepting new connections.

  4. Wait up to 30 seconds for all HAProxy processes to terminate.

  5. Perform a hard shutdown of any remaining HAProxy processes.

This commit is related to bug 1709958.

https://bugzilla.redhat.com/show_bug.cgi?id=1709958

  • images/router/haproxy/reload-haproxy: Exit immediately if the file /var/lib/haproxy/run/terminating is present.
  • images/router/haproxy/shutdown-haproxy: New file. If the file /var/lib/haproxy/run/terminating is present, perform a hard shutdown, or else perform a graceful shutdown as described above.

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 17, 2019
@lihongan
Copy link
Contributor

If long-lived connection like websocket exists in current HAProxy process, how to handle it? Wait 30s then perform a hard shutdown? But that sounds not a graceful shutdown.

  # Long timeout for WebSocket connections.
  timeout tunnel 1h

@danehans
Copy link
Contributor

/test e2e-aws

if [[ "$graceful_shutdown" = 'true' ]]; then
echo 'Performing a graceful shutdown:'
echo " - Sleeping $DRAIN_PERIOD seconds to let connections drain..."
sleep "$DRAIN_PERIOD"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted the reload script will be no-op'ing, if we haven't issued USR1 to haproxy it's not clear to me how connection draining is implemented with this sleep.

# GRACE_PERIOD is the number of seconds that this script will wait for HAProxy
# processes to terminate after it sends the SIGUSR1 signal, before it sends
# SIGTERM to any remaining HAProxy processes.
GRACE_PERIOD=30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should GRACE_PERIOD and DRAIN_PERIOD be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They easily could be, but I wasn't planning to add any API for these parameters to the ingress controller resource, so I don't know that it would be much use to make them configurable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingress controller is necessarily aware of how to configure the haproxy-router, I'm not sure that implies we have to expose this as public API. I was just suggesting that if we ever need to tweak these values, we might want to do it through an operator release rather than an haproxy-router release. Especially if we end up tweaking these values dynamically in the operator...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop hook script might even be possible to define in the operator itself rather than here, but that could be a little too coupled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just suggesting that if we ever need to tweak these values, we might want to do it through an operator release rather than an haproxy-router release. Especially if we end up tweaking these values dynamically in the operator...

Hm, that's a good point—the operator might know that one cloud provider needs a shorter or longer drain period than the typical cloud provider, in which case the operator could set a platform-appropriate value.

The stop hook script might even be possible to define in the operator itself rather than here, but that could be a little too coupled?

You mean with a configmap volume? We could do that, but (1) the reload and shutdown scripts are loosely coupled by the /var/lib/haproxy/run/terminating marker, and (2) it would be another piece that could break, so it adds real operational complexity to reduce hypothetical build complexity.

Copy link
Contributor Author

@Miciah Miciah May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest push makes TERMINATION_MARKER, GRACE_PERIOD, and DRAIN_PERIOD configurable.

@Miciah Miciah force-pushed the BZ1709958-add-shell-script-for-graceful-shutdown branch 2 times, most recently from 507b9c1 to 1c15371 Compare May 23, 2019 23:16
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Miciah Miciah force-pushed the BZ1709958-add-shell-script-for-graceful-shutdown branch from 1c15371 to b83f97f Compare May 23, 2019 23:25
Add a shutdown-haproxy shell script, which can be used as the pre-stop
handler for a router deployment in order to provide graceful shutdown.

Whereas a hard shutdown simply sends a SIGTERM signal to all haproxy
processes to make them immediately shut down, a graceful shutdown does the
following:

1. Create a marker file that tells the router's /healthz health check to
   return false.

2. Sleep for 90 seconds to allow time for external load-balancers to
   observe the failing health check and drain connections to the router.

3. Create a marker file that tells the reload-haproxy script to exit
   immediately.

3. Send a SIGUSR1 signal to all HAProxy processes to make them stop
   accepting new connections.

4. Wait up to 30 seconds for all HAProxy processes to terminate.

5. Perform a hard shutdown of any remaining HAProxy processes.

This commit is related to bug 1709958.

https://bugzilla.redhat.com/show_bug.cgi?id=1709958

* images/router/haproxy/reload-haproxy: Exit immediately if the file
/var/lib/haproxy/run/terminating is present.
* images/router/haproxy/shutdown-haproxy: New file.  Perform a graceful
shutdown as described above.
* pkg/cmd/infra/router/template.go (Run): Configure a listener for
readiness checks specifically intended for external load balancers.  In
particular, these checks include the new IsDraining check.
* pkg/router/metrics/health.go (IsDraining): New function.  Return a
healthz check that fails if the "draining" marker that the shutdown script
creates exists.
* pkg/router/metrics/metrics.go (Listener): Rename LiveChecks to
PodLiveChecks.  Rename ReadyChecks to PodReadyChecks.  Add a new field,
LBReadyChecks for readiness checks specific to external load balancers.
(handler): Register LBReadyChecks as /healthz, PodLiveChecks as
/healthz/live, and PodReadyChecks as /healthz/ready.
@Miciah Miciah force-pushed the BZ1709958-add-shell-script-for-graceful-shutdown branch from b83f97f to 77dc2da Compare May 29, 2019 00:07
@openshift-ci-robot
Copy link
Contributor

@Miciah: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-upgrade 77dc2da link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@smarterclayton
Copy link
Contributor

What’s the status of this?

@Miciah
Copy link
Contributor Author

Miciah commented Jun 22, 2019

What’s the status of this?

It does not work as is. As soon as the pod is deleted, the pod's endpoints are deleted, even if the pod has a pre-stop lifecycle handler. The service proxy observes that the node has no local endpoints for the LB service, and it starts blackholing traffic to the node.

To make draining work, we must initiate it before the pod is deleted. To do this, we need to make the load balancer's health check start failing before deleting the pod. Right now, the health check for LB services is serviced by the service proxy, so we would need to make the health check configurable (proof of concept: Miciah/origin@108a036).

Alternatively, we have explored using surge during rollout, using affinity rules to schedule new replicas on the same nodes as old replicas to ensure that local endpoints never disappear during a rollout (proof of concept: Miciah/cluster-ingress-operator@93517fd). This approach has other challenges, the most salient being that we need specific behavior when the old replicas are descheduled, namely to prefer descheduling old replicas on nodes where the new replicas are ready. (Someone suggested a change along these lines here: kubernetes/kubernetes#4301 (comment).) My next step is to create a proof-of-concept change to the replicaset controller to give us the desired behavior.

@Miciah
Copy link
Contributor Author

Miciah commented Oct 23, 2019

Superseded by openshift/cluster-ingress-operator#280.
/close

@openshift-ci-robot
Copy link
Contributor

@Miciah: Closed this PR.

Details

In response to this:

Superseded by openshift/cluster-ingress-operator#280.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants