[nginx] Rejected requests during update of ingress deployment #322

foxylion · 2017-02-22T14:39:49Z

We are using nginx-ingress and hope to get zero downtime deployments. This means that nginx should also be upgraded without any downtime.

Currently we have specified some things in our deployment to ensure that there is always an nginx pod running.

spec:
  ...
  replicas: 2
  minReadySeconds: 10
  template:
    terminationGracePeriodSeconds: 60
    containers:
      - name: ginx-ingress
        ...
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1

Kubernetes now ensures that there is at least on pod of nginx-ingress running.
But when using tools like siege or jmeter we still see a small window where requests get rejected (about half of a second).

        Date & Time,  Trans,  Elap Time,  Data Trans,  Resp Time,  Trans Rate,  Throughput,  Concurrent,    OKAY,   Failed
2017-02-22 15:06:36,   1178,      45.18,           0,       0.10,       26.07,        0.00,        2.52,    1178,       2

Is this caused by ingress or do we have something mis configured in our Kubernetes cluster?

The text was updated successfully, but these errors were encountered:

georgecrawford · 2017-02-23T21:57:43Z

I'm seeing exactly the same thing. I set replicas: 2, and during a deployment this scales to 4 then back to two. However, tailing the nginx logs with --v=2, I see a single config diff where two IPs are added and two are removed - at this point, curl fails for a second or two.

glerchundi · 2017-02-24T07:38:32Z

I don't know if you're already using it, but depending which version of k8s are you using, default rolling update didn't worked as expected, so I needed to explicitly write the strategy:

[...]
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 10
[...]

And of course follow best practices in terms of SIGTERMs, blabla...

georgecrawford · 2017-02-24T09:37:10Z

Thanks - I got to that same conclusion last night, and things are better now. I do wonder if there are improvements coming to this project which will make this easier?

foxylion · 2017-02-24T12:23:13Z

@glerchundi In my case I updated the ingress-nginx deployment and not one of my own deployments. I would expect that ingress does handle an update (new version) of itself without any downtime. Or am I wrong?

Also something about the configuration

    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Normally it should be enough to leave that to the default. This would result in terminating one instance, creating a new one and so on. This should graceful update the deployment when replicas is >= 2. Right?

foxylion · 2017-03-23T16:46:49Z

Okay, I found a solution for this particular problem. Delaying the termination of the controller pod for a second will not result in any lost requests. But I think this may be general thing in Kubernetes. If someone wan't to try this too, add this to the deployment configuration:

spec:
  replicas: 2
    spec:
      containers:
        ...
        lifecycle:
          preStop:
            exec:
              command: ["sleep, "1"]

aledbf · 2017-03-23T16:52:36Z

If someone wan't to try this too, add this to the deployment configuration:

Maybe you can add this as an example?

foxylion · 2017-03-23T16:56:31Z

@aledbf I'm not so sure it looks for me like it is not only an ingress problem. More of a general Kuberenetes issue. But it is also possible I'm misunderstanding something. Nonetheless it seems to work.

But sure I can add an example if you want.

I created an issue here: kubernetes/kubernetes#43576

foxylion · 2017-04-28T14:41:01Z

So here is a final update for everyone who is wondering how to achieve zero downtime deployments with the nginx ingress controller.

You have to delay the termination process of a pod for a significant amount of time.

In the worst case the nginx configuration is only updated every 10 seconds. This means it can take up to ten seconds until a terminating pod to be removed from the upstreams in nginx.

It is as mentioned above pretty easy to achieve this. Simply sleep for 15 seconds in a preStop hook:

lifecycle:
  preStop:
    exec:
      command: ["sleep, "15"]

Additionally your backend has to do a graceful shutdown. This means it must process all in flight requests and close keep alive connections correctly. (nginx, apache httpd and tomcat for example can handle this properly out of the box).

tyranron · 2017-06-21T13:28:11Z

@aledbf can we mention the info provided by @foxylion somewhere in docs?

Current Nginx Ingress Controller docs have section "Why endpoints and not services", but it's not obvious (especially for Kubernetes noobs) that this leads to the problem that rolling restart of the Pods does not work as expected out-of-the-box.

aledbf · 2017-06-21T14:04:14Z

@tyranron sure. Can you send a PR?

tyranron · 2017-06-27T16:22:22Z

@aledbf done.

philipbjorge · 2017-07-12T23:44:02Z

For anyone that looks at this thread, we've achieved zero downtime deploys by running the following code as a sidecar container in our pod and using the sleep command. Just using the sleep command was insufficient because traffic would still get sent to the pod during the sleep period and when the sleep ran out, those connections would get dropped.

package main

import (
	"fmt"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	// SIGTERM Handler
	var SIGTERM = false
	c := make(chan os.Signal)
	signal.Notify(c, syscall.SIGTERM)
	go func() {
		<-c
		SIGTERM = true
		fmt.Println("SIGTERM received...")
	}()

	var netClient = &http.Client{
		Timeout: time.Second * 10,
	}

	// Proxying healthcheck
	http.HandleFunc("/healthcheck", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if SIGTERM {
			w.WriteHeader(http.StatusServiceUnavailable)
			return
		}

		resp, err := netClient.Get(os.Args[2])
		if err != nil {
			w.WriteHeader(http.StatusServiceUnavailable)
			fmt.Println("Error getting proxied healthcheck")
			return
		}

		w.WriteHeader(resp.StatusCode)
	}))

	http.ListenAndServe(":"+os.Args[1], nil)
}

embano1 · 2017-07-14T21:17:27Z

@philipbjorge i implemented this (also in go) without the sidecar just by using a http pre-stop hook and a http readiness probe. of course, the server process must handle the sigterm gracefully so it does not drop connections. easy these days with http.shutdown.

this combination will definitely work (if you correctly set the timings for pre-stop and probes) for scale-ins (scale down replicaset) as well as rolling updates (even with max unavailable !=0).

the flow goes like this:

start scale-in or rolling update
kubernetes will call prestop hook ("/makeprestop")
the service will immediately handle this hook, in my case respond to readiness probes ("/ready") with 5xx and sleep the "/makeprestop" handler for 10s (so the SIGTERM will be delayed)
in parallel, kubernetes continuous to send readiness probes and gets the 5xx
it takes the service out of the endpoint list, kube-proxy will locally reflect this in iptables (all things async/ distributed)
after the 10s handler delay, SIGTERM will be send - since the http server will handle this with http.shutdown open connections will be gracefully drained.

basic code example below:

func makeReady(s *status) func(http.ResponseWriter, *http.Request) {
	return func(w http.ResponseWriter, r *http.Request) {
		s.RLock()
		defer s.RUnlock()
		if s.ready {
			w.WriteHeader(http.StatusOK)
			w.Write([]byte("OK"))
		} else {
			w.WriteHeader(http.StatusInternalServerError)
		}
	}
}

func makePreStop(s *status) func(http.ResponseWriter, *http.Request) {
	return func(w http.ResponseWriter, r *http.Request) {
		s.Lock()
		log.Println("Prestop hook called, disabling healthy ready feedback")
		s.ready = false
		s.Unlock()

		// Give readiness probe (called in parallel) time to update (read ready == false)
		time.Sleep(10 * time.Second)
	}
}

cc/ @timoreimann

timoreimann · 2017-07-16T21:42:43Z

@embano1 thanks for sharing. The pre-stop hook is a clever way to avoid having to move the 10 second sleep / delay into the application. 👍

Did you verify that you really have to make the readiness check fail artificially though? The docs state it shouldn't be necessary (last paragraph), though that might be incorrect.

My local tests do show that an explicit readiness check failure seems necessary. Can you confirm? I'm inclined to file a bug report (if it doesn't exist yet) in that case.

timoreimann · 2017-07-16T22:36:34Z

On further investigation, it looks like everything may work as documented: I got confused by the fact that a pod's Ready condition apparently does not flip to false once the termination sequence kicks off. I can see the endpoints being updated and the terminating pod getting removed as soon as the SIGTERM is sent.

Would be great to have someone double-check my observations.

caseylucas · 2017-07-17T20:33:16Z

I was also under the impression that we don't have to explicitly handle the prestop hook in the application (like @embano1 did). We are just using sleep to delay the SIGTERM and then letting the app exit on SIGTERM.

command: ["sleep", "15"]

My assumption (not validated) is that before the prestop is started (or maybe concurrently) kubernetes will mark the pod in such a way that the ingress controller will remove it from the list of upstreams. So as long as the pod is removed from the upstream list before it is sent SIGTERM, we don't lose transactions because the pod is effectively idle.

embano1 · 2017-07-17T21:06:25Z

Hi guys, thx for having this discussion. Here´s my setup (sorry for not strictly following ingress although this issue is about ingress):

Running K8s v1.6.4
Simple web server (go) with graceful shutdown on SIGTERM
Readiness probes (/healthz) and pre-stop hook as described
Hammering a rolling update or deployment scale in with 200k http requests ("hey")
I used various test setups, but only the artificially failing /healthz with a sufficient (graceful shutdown) enough pre-stop hook delay (>readiness delay) gave me 0 request lost.

With other attempts, I always had ~500 requests failing (out of 200k, minimal but still with UX impact in a real-world scenario).

My assumption (different from the docs): due to the distributed nature of involved k8s components (kube-proxy, service endpoints, controller mgr, API server, etcd, etc.), I´m hitting async issues causing the draining "flow" to break. If you could try to replicate this with an http hammer, that would be great.

My setup might be different due to directly exposing go http.server with shutdown instead of using nginx.

[UPDATE]
I just played with the (pre-stop/ ready) handlers as well as timing (at least 10s) a little bit and did several tests (rolling update to new image, scale out, scale in, rollback). As you already said, it (all scenarios with hey 200k requests, 0% loss) works by just delaying the prestop hook (10s in my safety case) w/out having to fail ready probe.

@timoreimann
I was also little bit confused by the various pod states during my tests. But the docs are right and this is the flow:

Pod is said to terminate (e.g. scale in)
PreStop called (sync call)
Pod enters "terminating" state, remains fully active (i.e. "ready")
At the same time, endpoints (on the API side) are immediately updated (e.g. scaling from 10 to 1 pod in a deployment immediately removes 9 endpoints, w/out delay on the server side)
After preStop delay exceeded, pod gets TERM (in my case shuts down gracefully although not really needed since after 10s every session should be gone...safety net)
Pod enters "not ready" state

@caseylucas
In my demo lab, I had to handle the delay (10s) in the handler since I´m running FROM scratch :)

Thx again for your feedback!

timoreimann · 2017-07-18T07:44:10Z

@embano1 appreciate the in-depth double-check. 👏 I'll go out on a limb and say this could be one of the most detailed and recent verifications we have on the subject as of today. 🎉

The thing I was kinda hoping Kubernetes to provide for me though was that you could have a single container inside a pod responsible for postponing the TERM signal through a pre-stop hook for all other containers inside the pod. That way, you wouldn't need to insert the delay/sleep in the primary application container but could move the responsibility to an auxiliary side-car, making things further decoupled. I ran a few tests myself to make sure it's really not possible, which unfortunately does not seem to be the case.

For containers coming with some minimalistic shell environment, this might not be a biggie as you can always do a sleep <seconds>. FROM scratch containers (i.e., most of Go)
don't have this (rather undesirable) luxury, however.

embano1 · 2017-07-18T09:48:28Z

@timoreimann really appreciate your feedback and comment. thx a ton! when i was benchmarking this setup, i was also thinking about using a sidecar w/out adding that handler to the app (depends on your setup, sidecars also consume resources and might lead to scheduling issues, e.g. insufficient resources when not configured correctly). but since i got stuck in the prestop handling, i focused on the app handler.

i can also reproduce your observation: the sidecar remains in "terminating" state for 10s (my delay) but the web container is getting killed almost immediately. the propagation of the service endpoint removal is not fast enough (by design) so that requests time out/ run into conn refused.

and imho the documentation is wrong, since (6) is not correct in case you run multiple containers and only one has a prestop hook attached.

timoreimann · 2017-07-18T10:20:39Z

@embano1 agree with you that the docs are misleading: the part you are citing should be focused on containers instead of pods. This misalignment is exactly what sparked my hope yesterday for a pod-centric pre-stop handler only be the crushed by reality. 😄

I'll try to get around to filing a PR that makes the docs more specific.

Thanks! 👏

kfox1111 · 2017-07-18T15:41:03Z

I put in the same request for orchestrating the prestop months ago. Currently, I have to add a bunch of prestop hooks and watch for temporary file creation to block termination. it works, but is a lot of work to get right.

timoreimann · 2017-07-18T16:20:44Z

@kfox1111 request as in feature request? If so, do you have a link?

kfox1111 · 2017-07-18T17:56:20Z

kubernetes/kubernetes#27618

foxylion · 2017-07-19T07:36:21Z

@embano1 @timoreimann Back then, when I implemented zero downtime in our deployments, I aimed for a minimal solution.
The "sleep 15" preStop hook is the minimalistic solution I found. When a container is terminated it is flagged as "terminating". This indication leads to a removal of the endpoint in the ingress controller. So there is really no readiness probe required.

In my opinion the readiness probe should be used to indicate if the application is ready to serve requests or if it temporarily can't serve new requests.

timoreimann · 2017-07-19T08:25:06Z

@foxylion full ACK to everything you said.

@embano1 and I kind of expanded the discussion to the question on where the pre-stop hook handler currently has to live (on the container that's supposed to shut down gracefully) vs. where it might be nice to have (a separate side-car container just responsible for delaying the transmission of the SIGTERM, nicely decoupled from the primary container).

Through @kfox1111's referenced issue, I found the proposal on deferContainers which seems to also target the simplified termination handling we'd love to have.

simonklb · 2017-08-20T01:55:02Z

The preStop sleep hook seem to work fine for backend pods. Doing rolling updates on the ingress-controller itself still causes timeouts though. Is this the case for you guys as well or are you able to update the ingress-controller with zero downtime?

My ingress-controller manifest:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ingress-controller
  namespace: default
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: ingress-controller
        layer: ingress
    spec:
      containers:
      - image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.8
        name: nginx
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          timeoutSeconds: 1
        ports:
        - containerPort: 80
        - containerPort: 443
        args:
        - /nginx-ingress-controller
        - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
        - --configmap=$(POD_NAMESPACE)/ingress-controller-config
        lifecycle:
          preStop:
            exec:
              command: ["sleep", "10"]

simonklb · 2017-08-20T12:29:11Z

After some further investigation I found that the culprit was the external-traffic: OnlyLocal annotation which is used on loadbalancer services to preserve the source IP of requests.

Removing it from the service made the sleep sufficient also for the ingress-controller.

foxylion · 2017-09-08T22:44:41Z

@simonklb Great to hear that others are also using the sleep solution successfully. :)

metral · 2017-09-17T19:14:10Z

Is it safe to assume that in order to have zero downtime at the moment, the lifecycle preStop sleep hack is a requirement of the primary containers in the backend Pods of the ingress controller?

Also IIUC, this seems to be a general k8s issue, and not specific to backend Pods of an ingress controllers only. In particular, this applies to any rolling-updates (ReplicationControllers) or roll out of Deployments, e.g. an nginx-ingress-controller itself needs to be updated. Is this accurate?

FWIW I've also been tracking this issue across:

timoreimann · 2017-09-17T20:43:20Z

@metral it's either the preStop sleep hack or a termination-delaying SIGTERM handler built into the containers in question.

And yes, it applies not only to Ingress controllers but all Pods in general that need to make sure requests are being drained prior to completing the shutdown procedure. Besides the rolling-upgrade use case, this might also be required due to other shutdown-inducing events including auto-scaling, pod evictions, and node cordoning.

HK250211 · 2021-11-03T10:27:52Z

Hello All, can you give me solution, when i deleted one pods, then zero down time will have take time create new pod

prydonius mentioned this issue Feb 22, 2017

Ingress controller causes short amount of downtime helm/monocular#179

Closed

foxylion changed the title ~~[nginx] Rejecting requests when doing a rolling update~~ [nginx] Rejected requests during update of ingress deployment Feb 24, 2017

foxylion closed this as completed Mar 23, 2017

philipbjorge mentioned this issue Jun 22, 2017

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. kubernetes-retired/contrib#2098

Closed

tyranron mentioned this issue Jun 27, 2017

Add mention about achieving zero-downtime rolling updates #912

Closed

embano1 mentioned this issue Jul 16, 2017

Add more information to readiness probes gravitational/workshop#17

Closed

timoreimann mentioned this issue Jul 18, 2017

Document recommended way to not fail requests during rolling update kubernetes/kubernetes#20473

Closed

simonklb mentioned this issue Aug 20, 2017

Timeouts when draining a node while using external-traffic: OnlyLocal annotation on loadbalancer kubernetes/kubernetes#50977

Closed

This was referenced Sep 17, 2017

Graceful shutdown for hook kubernetes/test-infra#3396

Closed

[Proposal] deferContainers - Pod Termination Semantics kubernetes/community#483

Closed

aledbf mentioned this issue Oct 25, 2017

Retry upstream when using service-upstream annotation #1586

Closed

ebabani mentioned this issue Nov 24, 2017

[stable/nginx-ingress] Add lifecycle hooks to the deployment template helm/charts#2855

Merged

kylegato mentioned this issue Nov 27, 2017

kube-dns: dnsmasq intermittent connection refused kubernetes/kubernetes#45976

Closed

aledbf mentioned this issue May 3, 2018

Ingress controller dropping websocket connections when performing backend reload #2461

Closed

dvlato mentioned this issue May 21, 2018

Allow connection pool to gradually drain down ExpediaGroup/styx#55

Closed

TrevinTeacutter mentioned this issue Nov 12, 2018

Adding a runner for automating sonobuoy runs. vmware-tanzu/sonobuoy#557

Closed

aledbf mentioned this issue Jul 25, 2019

Small amount of 502s during rollouts with NEG even with "mitigations" kubernetes/ingress-gce#769

Closed

BuddhiWathsala mentioned this issue Jul 29, 2019

NGINX 502 Bad Gateway when using a single replication #4375

Closed

r0bj mentioned this issue Jul 19, 2021

Add initial-delay parameter to shutdown-manager projectcontour/contour#3897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nginx] Rejected requests during update of ingress deployment #322

[nginx] Rejected requests during update of ingress deployment #322

foxylion commented Feb 22, 2017 •

edited

Loading

georgecrawford commented Feb 23, 2017

glerchundi commented Feb 24, 2017

georgecrawford commented Feb 24, 2017

foxylion commented Feb 24, 2017 •

edited

Loading

foxylion commented Mar 23, 2017

aledbf commented Mar 23, 2017

foxylion commented Mar 23, 2017 •

edited

Loading

foxylion commented Apr 28, 2017

tyranron commented Jun 21, 2017

aledbf commented Jun 21, 2017 •

edited

Loading

tyranron commented Jun 27, 2017

philipbjorge commented Jul 12, 2017

embano1 commented Jul 14, 2017 •

edited

Loading

timoreimann commented Jul 16, 2017 •

edited

Loading

timoreimann commented Jul 16, 2017

caseylucas commented Jul 17, 2017

embano1 commented Jul 17, 2017 •

edited

Loading

timoreimann commented Jul 18, 2017 •

edited

Loading

embano1 commented Jul 18, 2017

timoreimann commented Jul 18, 2017

kfox1111 commented Jul 18, 2017

timoreimann commented Jul 18, 2017

kfox1111 commented Jul 18, 2017

foxylion commented Jul 19, 2017 •

edited

Loading

timoreimann commented Jul 19, 2017

simonklb commented Aug 20, 2017

simonklb commented Aug 20, 2017

foxylion commented Sep 8, 2017

metral commented Sep 17, 2017 •

edited

Loading

timoreimann commented Sep 17, 2017

HK250211 commented Nov 3, 2021

[nginx] Rejected requests during update of ingress deployment #322

[nginx] Rejected requests during update of ingress deployment #322

Comments

foxylion commented Feb 22, 2017 • edited Loading

georgecrawford commented Feb 23, 2017

glerchundi commented Feb 24, 2017

georgecrawford commented Feb 24, 2017

foxylion commented Feb 24, 2017 • edited Loading

foxylion commented Mar 23, 2017

aledbf commented Mar 23, 2017

foxylion commented Mar 23, 2017 • edited Loading

foxylion commented Apr 28, 2017

tyranron commented Jun 21, 2017

aledbf commented Jun 21, 2017 • edited Loading

tyranron commented Jun 27, 2017

philipbjorge commented Jul 12, 2017

embano1 commented Jul 14, 2017 • edited Loading

timoreimann commented Jul 16, 2017 • edited Loading

timoreimann commented Jul 16, 2017

caseylucas commented Jul 17, 2017

embano1 commented Jul 17, 2017 • edited Loading

timoreimann commented Jul 18, 2017 • edited Loading

embano1 commented Jul 18, 2017

timoreimann commented Jul 18, 2017

kfox1111 commented Jul 18, 2017

timoreimann commented Jul 18, 2017

kfox1111 commented Jul 18, 2017

foxylion commented Jul 19, 2017 • edited Loading

timoreimann commented Jul 19, 2017

simonklb commented Aug 20, 2017

simonklb commented Aug 20, 2017

foxylion commented Sep 8, 2017

metral commented Sep 17, 2017 • edited Loading

timoreimann commented Sep 17, 2017

HK250211 commented Nov 3, 2021

foxylion commented Feb 22, 2017 •

edited

Loading

foxylion commented Feb 24, 2017 •

edited

Loading

foxylion commented Mar 23, 2017 •

edited

Loading

aledbf commented Jun 21, 2017 •

edited

Loading

embano1 commented Jul 14, 2017 •

edited

Loading

timoreimann commented Jul 16, 2017 •

edited

Loading

embano1 commented Jul 17, 2017 •

edited

Loading

timoreimann commented Jul 18, 2017 •

edited

Loading

foxylion commented Jul 19, 2017 •

edited

Loading

metral commented Sep 17, 2017 •

edited

Loading