Long time-to-first-byte problem #245

djensen47 · 2018-05-01T23:00:23Z

I've been experiencing long wait times for time-to-first-byte (TTFB) using the ingress-gce on GKE.

I compared going through the ingress-gce versus connecting directly to a pod. Going directly to the pod via a portforward, TTFB times are in the 300ms range.

Via the ingress I have noticed:

TTFB times between 1 - 5s
Happens a lot on GET OPTION calls but not always
Randomly occurs on other calls
These are all fetch calls (ajax-style) and in a single browser reload, it happens only once

We have two rules in our configuration, three when the echoserver is up, and also tls.

I also tried this against the "echo server" and I see long >300ms TTFB on GET /favicon.ico

My best guess at reproduction is to:

Set up a cluster
Deploy the "echo server" gcr.io/google_containers/echoserver:1.4
Deploy another webserver that the ingress can communicate with
Create an ingress that has two backends and tls
Open Chrome with developer tools open to the Network tab
Hit the echo server
Try this several times
Notice that favicon.ico will very between acceptable TTFB times of <100ms to >300ms possibly even as high as 1s.

djensen47 · 2018-05-01T23:02:51Z

Here is out ingress config:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: brewd-ingress
spec:
  tls:
  - hosts:
    - stage-api2.example.com
    - stage-app2.example.com
    - stage-echo.example.com
    secretName: redacted
  rules:
  - host: stage-api2.example.com
    http:
      paths:
      - backend:
          serviceName: gateway-service
          servicePort: 7000
  - host: stage-app2.example.com
    http:
      paths:
      - backend:
          serviceName: web-service
          servicePort: 8080
  - host: stage-echo.example.com
    http:
      paths:
        - backend:
            serviceName: echoserver
            servicePort: 8080

And the gateway service config (the web service is similar):

apiVersion: v1
kind: Service
metadata:
  name: gateway-service
  labels: 
    app: gateway
spec:
  type: NodePort
  ports:
  - port: 7000
  selector:
    app: gateway
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: gateway-deployment
spec:
  selector:
    matchLabels:
      app: gateway
  replicas: 1
  template:
    metadata:
      labels:
        app: gateway
    spec:
      containers:
      - name: gateway
        image: us.gcr.io/redacted/gateway:1.3.0
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
        env:
        - name: REDACTED_ENV
          value: stage

nicksardo · 2018-05-04T01:12:30Z

As Ahmet mentioned in https://groups.google.com/forum/#!topic/kubernetes-users/omg-b8_FcBM, this is better answered by GCP support.

Could you try simplifying the repro by testing outside the context of GKE/kubernetes. Spin up an instance running echoserver and create an L7 LB through the GCP .Console.

djensen47 · 2018-05-15T01:33:16Z

I now have a GCP support ticket open.

However, at least 2 others have now chimed in that they are experiencing the same problem with a similar setup.

nicksardo · 2018-05-18T15:43:36Z

Copying my response to kubernetes-users which djensen47 indicates worked for him.

I created an HTTP LB setup on GCP using a golang HTTP server without kubernetes and was able to see rare long-tail latencies in >1 second. After I set IdleTimeout to larger than ten minutes, I stopped seeing those slow responses. The echoheaders image uses nginx and doesn't set keepalive_timeout (sent PR to update this).
This expected timeout behavior is explained in the GCP documentation at https://cloud.google.com/compute/docs/load-balancing/http/#timeouts_and_retries

djensen47 · 2018-05-18T18:42:01Z

It may not be the responsibility of the project but what about updating the GCP Ingress/LB tutorial to mention this "issue" with examples for different languages? If any GCP Google folks know how I can make this suggestion, then I can write up a draft.

…

On Fri, May 18, 2018, 8:43 AM Nick Sardo ***@***.***> wrote: Copying my response to kubernetes-users which djensen47 indicates worked for him. I created an HTTP LB setup on GCP using a golang HTTP server without kubernetes and was able to see rare long-tail latencies in >1 second. After I set IdleTimeout to larger than ten minutes, I stopped seeing those slow responses. The echoheaders image uses nginx and doesn't set keepalive_timeout (sent PR to update this). This expected timeout behavior is explained in the GCP documentation at https://cloud.google.com/compute/docs/load-balancing/http/#timeouts_and_retries — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#245 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAgN-Z3-3o67PNiy5ekbTPyPmwqeg86Nks5tzuwygaJpZM4Tunxb> .

djensen47 · 2018-05-31T00:11:34Z

Hi, this problem has returned. I'm not sure why I didn't experience the issue immediately after setting the IdleTimeout but a few days after, this problem has returned. Not only that, a few others are experiencing this same problem.

I've tried to bring this up with paid support but all they're doing is deflecting the ticket.

b99andla · 2020-01-08T13:20:44Z

@djensen47 Any news on this? We have the same problem, wordpress deployment that gets 5-6 second latencies intermittently when using GCP Ingress...

djensen47 · 2020-01-12T18:24:41Z

I did what @nicksardo suggested and it eventually worked, I think that plus an upgrade to the cluster is actually when it started working. Not sure how to fix it for WordPress. (My recommendation: don't use WordPress 😉).

fsjones · 2022-02-11T00:24:34Z

@djensen47 Any news on this? We have the same problem, wordpress deployment that gets 5-6 second latencies intermittently when using GCP Ingress...

Did you find a solution for this for wordpress? I may be having a similar issue.

nicksardo closed this as completed May 18, 2018

nicksardo mentioned this issue May 18, 2018

Rewrite ingress-gce general documentation #249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long time-to-first-byte problem #245

Long time-to-first-byte problem #245

djensen47 commented May 1, 2018

djensen47 commented May 1, 2018 •

edited

Loading

nicksardo commented May 4, 2018

djensen47 commented May 15, 2018

nicksardo commented May 18, 2018

djensen47 commented May 18, 2018 via email

djensen47 commented May 31, 2018

b99andla commented Jan 8, 2020

djensen47 commented Jan 12, 2020

fsjones commented Feb 11, 2022

Long time-to-first-byte problem #245

Long time-to-first-byte problem #245

Comments

djensen47 commented May 1, 2018

djensen47 commented May 1, 2018 • edited Loading

nicksardo commented May 4, 2018

djensen47 commented May 15, 2018

nicksardo commented May 18, 2018

djensen47 commented May 18, 2018 via email

djensen47 commented May 31, 2018

b99andla commented Jan 8, 2020

djensen47 commented Jan 12, 2020

fsjones commented Feb 11, 2022

djensen47 commented May 1, 2018 •

edited

Loading