Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry upstream when using service-upstream annotation #1586

Closed
mbugeia opened this issue Oct 25, 2017 · 6 comments
Closed

Retry upstream when using service-upstream annotation #1586

mbugeia opened this issue Oct 25, 2017 · 6 comments

Comments

@mbugeia
Copy link

mbugeia commented Oct 25, 2017

Is this a BUG REPORT or FEATURE REQUEST? :
FEATURE REQUEST

NGINX Ingress controller version:
0.9.0-beta.15

When using the annotation service-upstream, a race condition can happen when the underlying pod is removed from the service endpoint (eg: node lost, rolling-update,...), nginx will try the service only 1 time and that can cause 5xx on the client side.

My proposal would be to declare multiple time (at least 3, maybe configurable ?) the same server in nginx upstream. This would allow nginx to retry the request and mitigate the risk to serve 5xx response to the client.

Current generated configuration:

    upstream prod-myservice-80 {
        # Load balance algorithm; empty for round robin, which is the default
        least_conn;
        keepalive 32;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
    }

Proposed generated configuration:

    upstream prod-myservice-80 {
        # Load balance algorithm; empty for round robin, which is the default
        least_conn;
        keepalive 32;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
    }

This would work because there is this line already set in all location:
proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;

Note: this is kind of related to #1488

@aledbf
Copy link
Member

aledbf commented Oct 25, 2017

@mbugeia this is exactly the reason why the default uses endpoints and no services.
This works as expected. Please check https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/annotations.md#known-issues

@aledbf aledbf closed this as completed Oct 25, 2017
@mbugeia
Copy link
Author

mbugeia commented Oct 25, 2017

I understand, but using endpoint introduce other issues in our production workload. Making nginx to reload when a pod is lost|updated|scaleup cause a great increase of RAM usage in our case and we end up losing nginx pods. In this state of art we cannot achieve zero-downtime deployments:

  • if we use service, there is no retry
  • if we use endpoint, reload are killing ingress pod by using too many RAM
    My solution while not being perfect will at least mitigate the number of lost requests by using retry.

@aledbf
Copy link
Member

aledbf commented Oct 25, 2017

@mbugeia I am sorry but you need to chose one of the modes.
That being said, you can fork the code and add the logic you want or just use a custom template.

@aledbf
Copy link
Member

aledbf commented Oct 25, 2017

Keep in mind that using services you are adding an additional component in the mix, i.e. kube-proxy

@mbugeia
Copy link
Author

mbugeia commented Oct 25, 2017

I understand. It seems that this #912 can mitigate the behavior at least for voluntary disruptions.

@aledbf
Copy link
Member

aledbf commented Oct 25, 2017

@mbugeia please check the latest comment #322 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants