[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

AlexRRR · 2016-11-30T15:26:45Z

Test System:

Baremetal CoreOS
Network: Flannel
K8s 1.4.6
nginx-ingress-controller:0.8.3

Steps to reproduce

create a Deployment consisting of a simple service like a bare nginx

apiVersion: extensions/v1beta1                                                      
kind: Deployment                                                                    
metadata:                                                                           
  name: weather-app                                                                 
spec:                                                                               
  replicas: 3                                                                       
  strategy:                                                                         
    rollingUpdate:                                                                  
      maxSurge: 1                                                                   
      maxUnavailable: 1                                                             
  template:                                                                         
    metadata:                                                                       
      labels:                                                                       
        app: weather-app                                                            
        track: stable                                                               
        attempt: "1"                                                                
    spec:                                                                           
      containers:                                                                   
      - name: weather-app                                                           
        lifecycle:                                                                  
            preStop:                                                                
                exec:                                                               
                    command: ["/usr/sbin/nginx","-s","quit"]                        
        image: nginx:latest                                                         
        ports:                                                                      
        - containerPort: 80                                                         
        imagePullPolicy: Always                                                     
        readinessProbe:                                                             
          httpGet:                                                                  
            path: /                                                                 
            port: 80                                                                
          failureThreshold: 3                                                       
          successThreshold: 1                                                       
          periodSeconds: 5                                                          
          timeoutSeconds: 1                                                         
          initialDelaySeconds: 0                                                    
        livenessProbe:                                                              
          httpGet:                                                                  
            path: /                                                                 
            port: 80                                                                
          failureThreshold: 3                                                       
          successThreshold: 1                                                       
          periodSeconds: 10                                                         
          timeoutSeconds: 1                                                         
          initialDelaySeconds: 15

Service

apiVersion: v1                                                       
kind: Service                                                        
metadata:                                                                            
  labels:                                                            
    app: weather-app                                                 
  name: weather-app                                                  
  namespace: k8-demo                                                 
  clusterIP: 10.3.0.47                                               
  ports:                                                             
  - port: 80                                                         
    protocol: TCP                                                    
    targetPort: 80                                                   
  selector:                                                          
    app: weather-app                                                 
  sessionAffinity: None                                              
  type: ClusterIP

Ingress Resource

apiVersion: v1                                                                           
items:                                                                                   
- apiVersion: extensions/v1beta1                                                         
  kind: Ingress                                                                          
  metadata:                                                                                                                                             
    name: weather-demo                                                                   
    namespace: k8-demo                                                                            
  spec:                                                                                  
    rules:                                                                               
    - host: weather-demo.internal.ch                                                
      http:                                                                              
        paths:                                                                           
        - backend:                                                                       
            serviceName: weather-app                                                     
            servicePort: 80                                                              
          path: /

Perform update by simply increasing the attempt label, and kubectl apply -f deploy.yaml
while in a terminal fetching the NGINX web page every second with watch -n "curl http://weather-demo.internal.ch"

Expected results

The Deployment is updated with no downtime

Actual results

New pods are created
Old pods are deleted once new ones have Running state
New pods answer
URL returns a 504 Error from NGINX resource controller for a few seconds
Service reestablished.

If at the same time I expose the service via NodePort and curl there, the service is always available.

The text was updated successfully, but these errors were encountered:

philipbjorge · 2017-06-22T18:36:56Z

I believe a fix for this can be found at kubernetes/ingress-nginx#322 (comment)

fejta-bot · 2017-12-29T16:28:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

edevil · 2018-01-22T15:43:04Z

Is this actually fixed? I still see this behaviour but I'm using a relatively old version of the nginx ingress (0.8.3).

fejta-bot · 2018-02-21T15:59:59Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-23T16:46:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

MahaGamal · 2018-05-14T11:44:48Z

I'm seeing exactly the same thing, I set replicas: 3, maxSurge: "100%" maxUnavailable: 0

joekohlsdorf · 2018-05-18T19:21:07Z

Try this in your Deployment manifest:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "15"]

Kubernetes sends the TERM signal to your container and the request to remove it from service to the apiserver at the same time. It takes a little bit of time for the container to be removed from the proxies, until that has happened (usually ~1sec) the container will keep receiving new connections.

This is due to the distributed nature of the system. Sadly there is no state-machine which will guarantee that a container is removed from service before the TERM signal is sent and the Kubernetes maintainers have spoken out against adding it for now.

Another problem is that in the worst-case the nginx configuration only gets updated every 10 seconds.

The preStop hook delays the TERM signal and give proxies and nginx time to update their configurations.

deviluper · 2018-10-16T11:22:37Z

/reopen

k8s-ci-robot · 2018-10-16T11:22:44Z

@deviluper: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AlexRRR changed the title ~~Downtime of service exposed via Ingress while doing upgrade.~~ [Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. Nov 30, 2016

AlexRRR mentioned this issue Nov 30, 2016

Add health checking from NGINX to individual endpoints. #2072

Closed

prydonius mentioned this issue Feb 22, 2017

Ingress controller causes short amount of downtime helm/monocular#179

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2018

k8s-ci-robot closed this as completed Mar 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

AlexRRR commented Nov 30, 2016

philipbjorge commented Jun 22, 2017

fejta-bot commented Dec 29, 2017

edevil commented Jan 22, 2018

fejta-bot commented Feb 21, 2018

fejta-bot commented Mar 23, 2018

MahaGamal commented May 14, 2018 •

edited

Loading

joekohlsdorf commented May 18, 2018 •

edited

Loading

deviluper commented Oct 16, 2018

k8s-ci-robot commented Oct 16, 2018

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

Comments

AlexRRR commented Nov 30, 2016

Steps to reproduce

Expected results

Actual results

philipbjorge commented Jun 22, 2017

fejta-bot commented Dec 29, 2017

edevil commented Jan 22, 2018

fejta-bot commented Feb 21, 2018

fejta-bot commented Mar 23, 2018

MahaGamal commented May 14, 2018 • edited Loading

joekohlsdorf commented May 18, 2018 • edited Loading

deviluper commented Oct 16, 2018

k8s-ci-robot commented Oct 16, 2018

MahaGamal commented May 14, 2018 •

edited

Loading

joekohlsdorf commented May 18, 2018 •

edited

Loading