Retry upstream when using service-upstream annotation #1586

mbugeia · 2017-10-25T12:36:18Z

Is this a BUG REPORT or FEATURE REQUEST? :
FEATURE REQUEST

NGINX Ingress controller version:
0.9.0-beta.15

When using the annotation service-upstream, a race condition can happen when the underlying pod is removed from the service endpoint (eg: node lost, rolling-update,...), nginx will try the service only 1 time and that can cause 5xx on the client side.

My proposal would be to declare multiple time (at least 3, maybe configurable ?) the same server in nginx upstream. This would allow nginx to retry the request and mitigate the risk to serve 5xx response to the client.

Current generated configuration:

    upstream prod-myservice-80 {
        # Load balance algorithm; empty for round robin, which is the default
        least_conn;
        keepalive 32;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
    }

Proposed generated configuration:

    upstream prod-myservice-80 {
        # Load balance algorithm; empty for round robin, which is the default
        least_conn;
        keepalive 32;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
        server 100.96.97.153:80 max_fails=0 fail_timeout=0;
    }

This would work because there is this line already set in all location:
proxy_next_upstream error timeout invalid_header http_502 http_503 http_504;

Note: this is kind of related to #1488

The text was updated successfully, but these errors were encountered:

aledbf · 2017-10-25T12:53:18Z

@mbugeia this is exactly the reason why the default uses endpoints and no services.
This works as expected. Please check https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/annotations.md#known-issues

mbugeia · 2017-10-25T15:17:18Z

I understand, but using endpoint introduce other issues in our production workload. Making nginx to reload when a pod is lost|updated|scaleup cause a great increase of RAM usage in our case and we end up losing nginx pods. In this state of art we cannot achieve zero-downtime deployments:

if we use service, there is no retry
if we use endpoint, reload are killing ingress pod by using too many RAM
My solution while not being perfect will at least mitigate the number of lost requests by using retry.

aledbf · 2017-10-25T15:25:57Z

@mbugeia I am sorry but you need to chose one of the modes.
That being said, you can fork the code and add the logic you want or just use a custom template.

aledbf · 2017-10-25T15:26:54Z

Keep in mind that using services you are adding an additional component in the mix, i.e. kube-proxy

mbugeia · 2017-10-25T15:39:02Z

I understand. It seems that this #912 can mitigate the behavior at least for voluntary disruptions.

aledbf · 2017-10-25T15:41:00Z

@mbugeia please check the latest comment #322 (comment)

aledbf closed this as completed Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry upstream when using service-upstream annotation #1586

Retry upstream when using service-upstream annotation #1586

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017

aledbf commented Oct 25, 2017

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017

Retry upstream when using service-upstream annotation #1586

Retry upstream when using service-upstream annotation #1586

Comments

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017

aledbf commented Oct 25, 2017

mbugeia commented Oct 25, 2017

aledbf commented Oct 25, 2017