Skip to content
This repository has been archived by the owner on Apr 17, 2019. It is now read-only.

[Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. #2098

Closed
AlexRRR opened this issue Nov 30, 2016 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@AlexRRR
Copy link

AlexRRR commented Nov 30, 2016

Test System:

  • Baremetal CoreOS
  • Network: Flannel
  • K8s 1.4.6
  • nginx-ingress-controller:0.8.3

Steps to reproduce

create a Deployment consisting of a simple service like a bare nginx

apiVersion: extensions/v1beta1                                                      
kind: Deployment                                                                    
metadata:                                                                           
  name: weather-app                                                                 
spec:                                                                               
  replicas: 3                                                                       
  strategy:                                                                         
    rollingUpdate:                                                                  
      maxSurge: 1                                                                   
      maxUnavailable: 1                                                             
  template:                                                                         
    metadata:                                                                       
      labels:                                                                       
        app: weather-app                                                            
        track: stable                                                               
        attempt: "1"                                                                
    spec:                                                                           
      containers:                                                                   
      - name: weather-app                                                           
        lifecycle:                                                                  
            preStop:                                                                
                exec:                                                               
                    command: ["/usr/sbin/nginx","-s","quit"]                        
        image: nginx:latest                                                         
        ports:                                                                      
        - containerPort: 80                                                         
        imagePullPolicy: Always                                                     
        readinessProbe:                                                             
          httpGet:                                                                  
            path: /                                                                 
            port: 80                                                                
          failureThreshold: 3                                                       
          successThreshold: 1                                                       
          periodSeconds: 5                                                          
          timeoutSeconds: 1                                                         
          initialDelaySeconds: 0                                                    
        livenessProbe:                                                              
          httpGet:                                                                  
            path: /                                                                 
            port: 80                                                                
          failureThreshold: 3                                                       
          successThreshold: 1                                                       
          periodSeconds: 10                                                         
          timeoutSeconds: 1                                                         
          initialDelaySeconds: 15 

Service

apiVersion: v1                                                       
kind: Service                                                        
metadata:                                                                            
  labels:                                                            
    app: weather-app                                                 
  name: weather-app                                                  
  namespace: k8-demo                                                 
  clusterIP: 10.3.0.47                                               
  ports:                                                             
  - port: 80                                                         
    protocol: TCP                                                    
    targetPort: 80                                                   
  selector:                                                          
    app: weather-app                                                 
  sessionAffinity: None                                              
  type: ClusterIP                                                    

Ingress Resource

apiVersion: v1                                                                           
items:                                                                                   
- apiVersion: extensions/v1beta1                                                         
  kind: Ingress                                                                          
  metadata:                                                                                                                                             
    name: weather-demo                                                                   
    namespace: k8-demo                                                                            
  spec:                                                                                  
    rules:                                                                               
    - host: weather-demo.internal.ch                                                
      http:                                                                              
        paths:                                                                           
        - backend:                                                                       
            serviceName: weather-app                                                     
            servicePort: 80                                                              
          path: /                                                                                                                                                                                           

Perform update by simply increasing the attempt label, and kubectl apply -f deploy.yaml
while in a terminal fetching the NGINX web page every second with watch -n "curl http://weather-demo.internal.ch"

Expected results

The Deployment is updated with no downtime

Actual results

  • New pods are created
  • Old pods are deleted once new ones have Running state
  • New pods answer
  • URL returns a 504 Error from NGINX resource controller for a few seconds
  • Service reestablished.

If at the same time I expose the service via NodePort and curl there, the service is always available.

@AlexRRR AlexRRR changed the title Downtime of service exposed via Ingress while doing upgrade. [Nginx ingress controller] Downtime of service exposed via Ingress while doing upgrade. Nov 30, 2016
@philipbjorge
Copy link

I believe a fix for this can be found at kubernetes/ingress-nginx#322 (comment)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017
@edevil
Copy link

edevil commented Jan 22, 2018

Is this actually fixed? I still see this behaviour but I'm using a relatively old version of the nginx ingress (0.8.3).

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@MahaGamal
Copy link

MahaGamal commented May 14, 2018

I'm seeing exactly the same thing, I set replicas: 3, maxSurge: "100%" maxUnavailable: 0

@joekohlsdorf
Copy link

joekohlsdorf commented May 18, 2018

Try this in your Deployment manifest:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "15"]

Kubernetes sends the TERM signal to your container and the request to remove it from service to the apiserver at the same time. It takes a little bit of time for the container to be removed from the proxies, until that has happened (usually ~1sec) the container will keep receiving new connections.

This is due to the distributed nature of the system. Sadly there is no state-machine which will guarantee that a container is removed from service before the TERM signal is sent and the Kubernetes maintainers have spoken out against adding it for now.

Another problem is that in the worst-case the nginx configuration only gets updated every 10 seconds.

The preStop hook delays the TERM signal and give proxies and nginx time to update their configurations.

@deviluper
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@deviluper: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants