From b46f54f75a7bbf431b05b7c92ba5727c88f8409d Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 9 Oct 2023 15:50:39 -0400 Subject: [PATCH 1/7] Add zero downtime upgrade test plan and results Problem: - We don't know if it is possible to upgrade NGF from a previous version without downtime. Solution: - Prepare a test plan to test for zero-downtime upgrades of NGF. - Test NFG and share the results (TO-DO). SOLVES -- https://github.com/nginxinc/nginx-gateway-fabric/issues/950 --- tests/zero-downtime-upgrades/README.md | 85 ++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 tests/zero-downtime-upgrades/README.md diff --git a/tests/zero-downtime-upgrades/README.md b/tests/zero-downtime-upgrades/README.md new file mode 100644 index 000000000..c21a64988 --- /dev/null +++ b/tests/zero-downtime-upgrades/README.md @@ -0,0 +1,85 @@ +# Zero-Downtime Upgrades + +This document describes a test plan for testing zero-downtime upgrades of NGF. + +*Zero-downtime upgrades* means that during an NGF upgrade clients don't experience any +interruptions to the traffic they send to applications exposed via NGF. + +## Goals + +- Ensure that upgrading NFG doesn't lead to any loss of traffic flowing through the data plane. +- Ensure that after an upgrade, NGF can process changes to resources. + +## Non-Goals + +During an upgrade, Kubernetes will shut down existing NGF pods by sending a SIGTERM. If the pod doesn't terminate in 30 +seconds (the default period) , Kubernetes will send a SIGKILL. + +When proxying Websocket or any long-lived connections, NGINX will not terminate until +that connection is closed by either the client or the backend. This means that unless all those connections are closed +by clients/backends before or during an upgrade (which is highly unlikely), NGINX will not terminate, which means +Kubernetes will kill NGINX. As a result, the clients will see the connections abruptly closed and thus experience +downtime. + +As a result, we *will not* use any long-live connections in this test, because NGF cannot support zero-downtime upgrades +in this case. + +## Test Environment + +- A Kubernetes cluster with 3 nodes on GKE + - Node: e2-medium (2 vCPU, 4GB memory) + - Enabled GKE logging. +- Tester VMs: + - Configuration: + - Debian + - Install packages: wrk + - Location - same zone as the Kubernetes cluster. + - First VM - for HTTP traffic + - Second VM - for sending HTTPs traffic +- NGF + - Deployment with 2 replicas + - Exposed via a Service with type LoadBalancer, private IP + - Gateway, two listeners - HTTP and HTTPs + - Two apps: + - Coffee - 3 replicas + - Tea - 3 replicas + - Two HTTPRoutes + - Coffee (HTTP) + - Tea (HTTPS) + +## Steps + +### Start + +- Create a cluster +- Deploy NGF, previous latest stable version with 2 replicas. +- Expose NGF via Service Load Balancer, internal (only accessible within the Google Cloud region) +- Deploy backend apps +- Configure Gateway +- Expose apps via HTTPRoutes +- Check statuses of the Gateway and HTTPRoutes +- Start sending traffic using wrk from tester VMs: HTTP and HTTPs. +- Check that NGINX access logs look good (all responses 200) +- Check that there are no errors in NGINX errors logs and NGF error logs. + +### Upgrade + +- Upgrade to the new version build from main branch. For example, `kubectl apply -f` the latest manifests. +- Check that the new pods are running and the old one are removed. + +## After Upgrade + +- Update the Gateway resource - add one listener -- and make sure NGF processed the update (it updates the status of + the Gateway resource accordingly). + +### Analyze + +- Stop wrk and save the output +- Check old pods logs (in Google Monitoring) + - NGINX Access logs - no errors + - NGINX Error logs - no errors + - NGF logs - no errors +- Check New pods (in Google Monitoring) + - NGINX Access logs - no errors + - NGINX Error logs - no errors + - NGF logs - no errors From 7121d7fc0b1b02290a150beb9fe34afabd97b1c3 Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Fri, 13 Oct 2023 19:38:40 -0400 Subject: [PATCH 2/7] Update the draft plan and add test results --- tests/zero-downtime-upgrades/README.md | 85 --- .../manifests/cafe-routes.yaml | 37 ++ .../manifests/cafe-secret.yaml | 8 + .../manifests/cafe.yaml | 65 ++ .../manifests/gateway-updated.yaml | 24 + .../manifests/gateway.yaml | 20 + tests/zero-downtime-upgrades/requests-plot.gp | 21 + .../results/1.0.0/1.0.0.md | 231 +++++++ .../results/1.0.0/http.csv | 600 ++++++++++++++++++ .../results/1.0.0/http.png | Bin 0 -> 6548 bytes .../results/1.0.0/https.csv | 600 ++++++++++++++++++ .../results/1.0.0/https.png | Bin 0 -> 6635 bytes .../zero-downtime-upgrades.md | 252 ++++++++ 13 files changed, 1858 insertions(+), 85 deletions(-) delete mode 100644 tests/zero-downtime-upgrades/README.md create mode 100644 tests/zero-downtime-upgrades/manifests/cafe-routes.yaml create mode 100644 tests/zero-downtime-upgrades/manifests/cafe-secret.yaml create mode 100644 tests/zero-downtime-upgrades/manifests/cafe.yaml create mode 100644 tests/zero-downtime-upgrades/manifests/gateway-updated.yaml create mode 100644 tests/zero-downtime-upgrades/manifests/gateway.yaml create mode 100644 tests/zero-downtime-upgrades/requests-plot.gp create mode 100644 tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md create mode 100644 tests/zero-downtime-upgrades/results/1.0.0/http.csv create mode 100644 tests/zero-downtime-upgrades/results/1.0.0/http.png create mode 100644 tests/zero-downtime-upgrades/results/1.0.0/https.csv create mode 100644 tests/zero-downtime-upgrades/results/1.0.0/https.png create mode 100644 tests/zero-downtime-upgrades/zero-downtime-upgrades.md diff --git a/tests/zero-downtime-upgrades/README.md b/tests/zero-downtime-upgrades/README.md deleted file mode 100644 index c21a64988..000000000 --- a/tests/zero-downtime-upgrades/README.md +++ /dev/null @@ -1,85 +0,0 @@ -# Zero-Downtime Upgrades - -This document describes a test plan for testing zero-downtime upgrades of NGF. - -*Zero-downtime upgrades* means that during an NGF upgrade clients don't experience any -interruptions to the traffic they send to applications exposed via NGF. - -## Goals - -- Ensure that upgrading NFG doesn't lead to any loss of traffic flowing through the data plane. -- Ensure that after an upgrade, NGF can process changes to resources. - -## Non-Goals - -During an upgrade, Kubernetes will shut down existing NGF pods by sending a SIGTERM. If the pod doesn't terminate in 30 -seconds (the default period) , Kubernetes will send a SIGKILL. - -When proxying Websocket or any long-lived connections, NGINX will not terminate until -that connection is closed by either the client or the backend. This means that unless all those connections are closed -by clients/backends before or during an upgrade (which is highly unlikely), NGINX will not terminate, which means -Kubernetes will kill NGINX. As a result, the clients will see the connections abruptly closed and thus experience -downtime. - -As a result, we *will not* use any long-live connections in this test, because NGF cannot support zero-downtime upgrades -in this case. - -## Test Environment - -- A Kubernetes cluster with 3 nodes on GKE - - Node: e2-medium (2 vCPU, 4GB memory) - - Enabled GKE logging. -- Tester VMs: - - Configuration: - - Debian - - Install packages: wrk - - Location - same zone as the Kubernetes cluster. - - First VM - for HTTP traffic - - Second VM - for sending HTTPs traffic -- NGF - - Deployment with 2 replicas - - Exposed via a Service with type LoadBalancer, private IP - - Gateway, two listeners - HTTP and HTTPs - - Two apps: - - Coffee - 3 replicas - - Tea - 3 replicas - - Two HTTPRoutes - - Coffee (HTTP) - - Tea (HTTPS) - -## Steps - -### Start - -- Create a cluster -- Deploy NGF, previous latest stable version with 2 replicas. -- Expose NGF via Service Load Balancer, internal (only accessible within the Google Cloud region) -- Deploy backend apps -- Configure Gateway -- Expose apps via HTTPRoutes -- Check statuses of the Gateway and HTTPRoutes -- Start sending traffic using wrk from tester VMs: HTTP and HTTPs. -- Check that NGINX access logs look good (all responses 200) -- Check that there are no errors in NGINX errors logs and NGF error logs. - -### Upgrade - -- Upgrade to the new version build from main branch. For example, `kubectl apply -f` the latest manifests. -- Check that the new pods are running and the old one are removed. - -## After Upgrade - -- Update the Gateway resource - add one listener -- and make sure NGF processed the update (it updates the status of - the Gateway resource accordingly). - -### Analyze - -- Stop wrk and save the output -- Check old pods logs (in Google Monitoring) - - NGINX Access logs - no errors - - NGINX Error logs - no errors - - NGF logs - no errors -- Check New pods (in Google Monitoring) - - NGINX Access logs - no errors - - NGINX Error logs - no errors - - NGF logs - no errors diff --git a/tests/zero-downtime-upgrades/manifests/cafe-routes.yaml b/tests/zero-downtime-upgrades/manifests/cafe-routes.yaml new file mode 100644 index 000000000..e679756d6 --- /dev/null +++ b/tests/zero-downtime-upgrades/manifests/cafe-routes.yaml @@ -0,0 +1,37 @@ +apiVersion: gateway.networking.k8s.io/v1beta1 +kind: HTTPRoute +metadata: + name: coffee +spec: + parentRefs: + - name: gateway + sectionName: http + hostnames: + - "cafe.example.com" + rules: + - matches: + - path: + type: PathPrefix + value: /coffee + backendRefs: + - name: coffee + port: 80 +--- +apiVersion: gateway.networking.k8s.io/v1beta1 +kind: HTTPRoute +metadata: + name: tea +spec: + parentRefs: + - name: gateway + sectionName: https + hostnames: + - "cafe.example.com" + rules: + - matches: + - path: + type: PathPrefix + value: /tea + backendRefs: + - name: tea + port: 80 diff --git a/tests/zero-downtime-upgrades/manifests/cafe-secret.yaml b/tests/zero-downtime-upgrades/manifests/cafe-secret.yaml new file mode 100644 index 000000000..4510460bb --- /dev/null +++ b/tests/zero-downtime-upgrades/manifests/cafe-secret.yaml @@ -0,0 +1,8 @@ +apiVersion: v1 +kind: Secret +metadata: + name: cafe-secret +type: kubernetes.io/tls +data: + tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNzakNDQVpvQ0NRQzdCdVdXdWRtRkNEQU5CZ2txaGtpRzl3MEJBUXNGQURBYk1Sa3dGd1lEVlFRRERCQmoKWVdabExtVjRZVzF3YkdVdVkyOXRNQjRYRFRJeU1EY3hOREl4TlRJek9Wb1hEVEl6TURjeE5ESXhOVEl6T1ZvdwpHekVaTUJjR0ExVUVBd3dRWTJGbVpTNWxlR0Z0Y0d4bExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFECmdnRVBBRENDQVFvQ2dnRUJBTHFZMnRHNFc5aStFYzJhdnV4Q2prb2tnUUx1ek10U1Rnc1RNaEhuK3ZRUmxIam8KVzFLRnMvQVdlS25UUStyTWVKVWNseis4M3QwRGtyRThwUisxR2NKSE50WlNMb0NEYUlRN0Nhck5nY1daS0o4Qgo1WDNnVS9YeVJHZjI2c1REd2xzU3NkSEQ1U2U3K2Vab3NPcTdHTVF3K25HR2NVZ0VtL1Q1UEMvY05PWE0zZWxGClRPL051MStoMzROVG9BbDNQdTF2QlpMcDNQVERtQ0thaEROV0NWbUJQUWpNNFI4VERsbFhhMHQ5Z1o1MTRSRzUKWHlZWTNtdzZpUzIrR1dYVXllMjFuWVV4UEhZbDV4RHY0c0FXaGRXbElweHlZQlNCRURjczN6QlI2bFF1OWkxZAp0R1k4dGJ3blVmcUVUR3NZdWxzc05qcU95V1VEcFdJelhibHhJZVVDQXdFQUFUQU5CZ2txaGtpRzl3MEJBUXNGCkFBT0NBUUVBcjkrZWJ0U1dzSnhLTGtLZlRkek1ISFhOd2Y5ZXFVbHNtTXZmMGdBdWVKTUpUR215dG1iWjlpbXQKL2RnWlpYVE9hTElHUG9oZ3BpS0l5eVVRZVdGQ2F0NHRxWkNPVWRhbUloOGk0Q1h6QVJYVHNvcUNOenNNLzZMRQphM25XbFZyS2lmZHYrWkxyRi8vblc0VVNvOEoxaCtQeDljY0tpRDZZU0RVUERDRGh1RUtFWXcvbHpoUDJVOXNmCnl6cEJKVGQ4enFyM3paTjNGWWlITmgzYlRhQS82di9jU2lyamNTK1EwQXg4RWpzQzYxRjRVMTc4QzdWNWRCKzQKcmtPTy9QNlA0UFlWNTRZZHMvRjE2WkZJTHFBNENCYnExRExuYWRxamxyN3NPbzl2ZzNnWFNMYXBVVkdtZ2todAp6VlZPWG1mU0Z4OS90MDBHUi95bUdPbERJbWlXMGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== + tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzZtTnJSdUZ2WXZoSE4KbXI3c1FvNUtKSUVDN3N6TFVrNExFeklSNS9yMEVaUjQ2RnRTaGJQd0ZuaXAwMFBxekhpVkhKYy92TjdkQTVLeApQS1VmdFJuQ1J6YldVaTZBZzJpRU93bXF6WUhGbVNpZkFlVjk0RlAxOGtSbjl1ckV3OEpiRXJIUncrVW51L25tCmFMRHF1eGpFTVBweGhuRklCSnYwK1R3djNEVGx6TjNwUlV6dnpidGZvZCtEVTZBSmR6N3Rid1dTNmR6MHc1Z2kKbW9RelZnbFpnVDBJek9FZkV3NVpWMnRMZllHZWRlRVJ1VjhtR041c09va3R2aGxsMU1udHRaMkZNVHgySmVjUQo3K0xBRm9YVnBTS2NjbUFVZ1JBM0xOOHdVZXBVTHZZdFhiUm1QTFc4SjFINmhFeHJHTHBiTERZNmpzbGxBNlZpCk0xMjVjU0hsQWdNQkFBRUNnZ0VBQnpaRE50bmVTdWxGdk9HZlFYaHRFWGFKdWZoSzJBenRVVVpEcUNlRUxvekQKWlV6dHdxbkNRNlJLczUyandWNTN4cU9kUU94bTNMbjNvSHdNa2NZcEliWW82MjJ2dUczYnkwaVEzaFlsVHVMVgpqQmZCcS9UUXFlL2NMdngvSkczQWhFNmJxdFRjZFlXeGFmTmY2eUtpR1dzZk11WVVXTWs4MGVJVUxuRmZaZ1pOCklYNTlSOHlqdE9CVm9Sa3hjYTVoMW1ZTDFsSlJNM3ZqVHNHTHFybmpOTjNBdWZ3ZGRpK1VDbGZVL2l0K1EvZkUKV216aFFoTlRpNVFkRWJLVStOTnYvNnYvb2JvandNb25HVVBCdEFTUE05cmxFemIralQ1WHdWQjgvLzRGY3VoSwoyVzNpcjhtNHVlQ1JHSVlrbGxlLzhuQmZ0eVhiVkNocVRyZFBlaGlPM1FLQmdRRGlrR3JTOTc3cjg3Y1JPOCtQClpoeXltNXo4NVIzTHVVbFNTazJiOTI1QlhvakpZL2RRZDVTdFVsSWE4OUZKZnNWc1JRcEhHaTFCYzBMaTY1YjIKazR0cE5xcVFoUmZ1UVh0UG9GYXRuQzlPRnJVTXJXbDVJN0ZFejZnNkNQMVBXMEg5d2hPemFKZUdpZVpNYjlYTQoybDdSSFZOcC9jTDlYbmhNMnN0Q1lua2Iwd0tCZ1FEUzF4K0crakEyUVNtRVFWNXA1RnRONGcyamsyZEFjMEhNClRIQ2tTazFDRjhkR0Z2UWtsWm5ZbUt0dXFYeXNtekJGcnZKdmt2eUhqbUNYYTducXlpajBEdDZtODViN3BGcVAKQWxtajdtbXI3Z1pUeG1ZMXBhRWFLMXY4SDNINGtRNVl3MWdrTWRybVJHcVAvaTBGaDVpaGtSZS9DOUtGTFVkSQpDcnJjTzhkUVp3S0JnSHA1MzRXVWNCMVZibzFlYStIMUxXWlFRUmxsTWlwRFM2TzBqeWZWSmtFb1BZSEJESnp2ClIrdzZLREJ4eFoyWmJsZ05LblV0YlhHSVFZd3lGelhNcFB5SGxNVHpiZkJhYmJLcDFyR2JVT2RCMXpXM09PRkgKcmppb21TUm1YNmxhaDk0SjRHU0lFZ0drNGw1SHhxZ3JGRDZ2UDd4NGRjUktJWFpLZ0w2dVJSSUpBb0dCQU1CVApaL2p5WStRNTBLdEtEZHUrYU9ORW4zaGxUN3hrNXRKN3NBek5rbWdGMU10RXlQUk9Xd1pQVGFJbWpRbk9qbHdpCldCZ2JGcXg0M2ZlQ1Z4ZXJ6V3ZEM0txaWJVbWpCTkNMTGtYeGh3ZEVteFQwVit2NzZGYzgwaTNNYVdSNnZZR08KditwVVovL0F6UXdJcWZ6dlVmV2ZxdStrMHlhVXhQOGNlcFBIRyt0bEFvR0FmQUtVVWhqeFU0Ym5vVzVwVUhKegpwWWZXZXZ5TW54NWZyT2VsSmRmNzlvNGMvMHhVSjh1eFBFWDFkRmNrZW96dHNpaVFTNkN6MENRY09XVWxtSkRwCnVrdERvVzM3VmNSQU1BVjY3NlgxQVZlM0UwNm5aL2g2Tkd4Z28rT042Q3pwL0lkMkJPUm9IMFAxa2RjY1NLT3kKMUtFZlNnb1B0c1N1eEpBZXdUZmxDMXc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K diff --git a/tests/zero-downtime-upgrades/manifests/cafe.yaml b/tests/zero-downtime-upgrades/manifests/cafe.yaml new file mode 100644 index 000000000..9c1a83548 --- /dev/null +++ b/tests/zero-downtime-upgrades/manifests/cafe.yaml @@ -0,0 +1,65 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: coffee +spec: + replicas: 3 + selector: + matchLabels: + app: coffee + template: + metadata: + labels: + app: coffee + spec: + containers: + - name: coffee + image: nginxdemos/nginx-hello:plain-text + ports: + - containerPort: 8080 +--- +apiVersion: v1 +kind: Service +metadata: + name: coffee +spec: + ports: + - port: 80 + targetPort: 8080 + protocol: TCP + name: http + selector: + app: coffee +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: tea +spec: + replicas: 3 + selector: + matchLabels: + app: tea + template: + metadata: + labels: + app: tea + spec: + containers: + - name: tea + image: nginxdemos/nginx-hello:plain-text + ports: + - containerPort: 8080 +--- +apiVersion: v1 +kind: Service +metadata: + name: tea +spec: + ports: + - port: 80 + targetPort: 8080 + protocol: TCP + name: http + selector: + app: tea diff --git a/tests/zero-downtime-upgrades/manifests/gateway-updated.yaml b/tests/zero-downtime-upgrades/manifests/gateway-updated.yaml new file mode 100644 index 000000000..c501ce839 --- /dev/null +++ b/tests/zero-downtime-upgrades/manifests/gateway-updated.yaml @@ -0,0 +1,24 @@ +apiVersion: gateway.networking.k8s.io/v1beta1 +kind: Gateway +metadata: + name: gateway +spec: + gatewayClassName: nginx + listeners: + - name: http + port: 80 + protocol: HTTP + hostname: "*.example.com" + - name: http-new + port: 80 + protocol: HTTP + hostname: "*.example.org" + - name: https + port: 443 + protocol: HTTPS + hostname: "*.example.com" + tls: + mode: Terminate + certificateRefs: + - kind: Secret + name: cafe-secret diff --git a/tests/zero-downtime-upgrades/manifests/gateway.yaml b/tests/zero-downtime-upgrades/manifests/gateway.yaml new file mode 100644 index 000000000..593d17e49 --- /dev/null +++ b/tests/zero-downtime-upgrades/manifests/gateway.yaml @@ -0,0 +1,20 @@ +apiVersion: gateway.networking.k8s.io/v1beta1 +kind: Gateway +metadata: + name: gateway +spec: + gatewayClassName: nginx + listeners: + - name: http + port: 80 + protocol: HTTP + hostname: "*.example.com" + - name: https + port: 443 + protocol: HTTPS + hostname: "*.example.com" + tls: + mode: Terminate + certificateRefs: + - kind: Secret + name: cafe-secret diff --git a/tests/zero-downtime-upgrades/requests-plot.gp b/tests/zero-downtime-upgrades/requests-plot.gp new file mode 100644 index 000000000..b08003d70 --- /dev/null +++ b/tests/zero-downtime-upgrades/requests-plot.gp @@ -0,0 +1,21 @@ +set terminal png size 800,600 +set output "graph.png" +set title "Request Outcomes Over Time" + +set xdata time +set timefmt "%Y-%m-%d %H:%M:%S" +set datafile separator "," + +# Y-axis settings +set yrange [-0.5:1.5] # Provide some padding around 0 and 1 for better visualization +set ytics ("Failed" 0, "Success" 1) +set grid ytics # Gridlines for Y + +# Define the palette: 0 for red (Failure) and 1 for green (Success) +set palette defined (0 "red", 1 "green") + +# Hide the colorbox +unset colorbox + +# Plotting data +plot "results.csv" using 1:2:2 with points palette pointtype 7 pointsize 1.5 title "Request Status" diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md new file mode 100644 index 000000000..4bc9e4cc9 --- /dev/null +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -0,0 +1,231 @@ +# Results + + +- [Results](#results) + - [Versions](#versions) + - [Start](#start) + - [Upgrades](#upgrades) + - [Analyze](#analyze) + - [Tester VMs](#tester-vms) + - [Old pods](#old-pods) + - [New pods](#new-pods) + - [Opened Issues](#opened-issues) + - [Future Improvements](#future-improvements) + + +## Versions + +Kubernetes: + +```text +Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3-gke.100", GitCommit:"6466b51b762a5c49ae3fb6c2c7233ffe1c96e48c", GitTreeState:"clean", BuildDate:"2023-06-23T09:27:28Z", GoVersion:"go1.20.5 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"} +``` + +Old NGF version: + +```text +"version":"0.6.0" +"commit":"803e6d612a9574362bda28868d4410943ffaf66a" +"date":"2023-08-31T16:32:37Z" +``` + +New NGF version: + +```text +version: "edge" +commit: "5324908e6e1145bec5f2f0ab80b312a809ad1744" +date: "2023-10-13T18:29:23Z" +``` + +## Start + +Deployed Pods: + +```text +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +nginx-gateway-55cb958549-nbdqz 2/2 Running 0 11s 10.112.7.8 gke-michael-2-default-pool-18ad0f59-19nb +nginx-gateway-55cb958549-nj95k 2/2 Running 0 11s 10.112.6.25 gke-michael-2-default-pool-18ad0f59-zfw4 +``` + +Logs check: + +- nginx-gateway-55cb958549-nbdqz + - NGINX logs - no errors + - NGF logs - 2 errors like below with failure to update status of a resource. + + ```json + { + "name": "nginx", + "ts": "2023-10-13T21:50:29Z", + "logger": "statusUpdater", + "error": "Operation cannot be fulfilled on gatewayclasses.gateway.networking.k8s.io \"nginx\": the object has been modified; please apply your changes to the latest version and try again", + "kind": "GatewayClass", + "namespace": "", + "msg": "Failed to update status", + "stacktrace": "github.com/nginxinc/nginx-kubernetes-gateway/internal/framework/status.(*updaterImpl).update\n\t/home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/framework/status/updater.go:166\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/framework/status.(*updaterImpl).Update\n\t/home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/framework/status/updater.go:90\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/mode/static.(*eventHandlerImpl).HandleEventBatch\n\t/home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/mode/static/handler.go:101\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/framework/events.(*EventLoop).Start.func1.1\n\t/home/runner/work/nginx-kubernetes-gateway/nginx-kubernetes-gateway/internal/framework/events/loop.go:68", + "level": "error" + } + ``` + + Such errors are expected in 0.6.0 because we're running two replicas and leader election wasn't implemented yet. + As a result, we see conflicts when both replicas try to update statuses. +- nginx-gateway-55cb958549-nj95k + - NGINX logs - no error + - NGF logs - 6 errors related to status updates. + +## Upgrades + +New pods: + +```text +nginx-gateway-578b49bc58-hmx5x 2/2 Running 0 11s 10.112.1.9 gke-michael-2-default-pool-18ad0f59-7dqr +nginx-gateway-578b49bc58-r4ckb 2/2 Running 0 17s 10.112.5.26 gke-michael-2-default-pool-18ad0f59-l0cq +``` + +Note: the new pods were scheduled on different from the old pods nodes, as we wanted. + +Check that one of the NGF pods became the leader: + +```text +I1013 22:02:32.226414 7 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election... +I1013 22:02:32.248857 7 leaderelection.go:260] successfully acquired lease nginx-gateway/nginx-gateway-leader-election +{"level":"info","ts":"2023-10-13T22:02:32Z","logger":"leaderElector","msg":"Started leading"} +``` + +Pod nginx-gateway-578b49bc58-r4ckb is the leader. + +Gateway status has been updated with the new listener. + +### Analyze + +#### Tester VMs + +Tester 1 wrk output: + +```text +Running 1m test @ http://cafe.example.com/coffee + 2 threads and 100 connections + Thread Stats Avg Stdev Max +/- Stdev + Latency 29.23ms 39.24ms 1.15s 95.23% + Req/Sec 1.47k 765.16 4.46k 82.25% + Latency Distribution + 50% 21.66ms + 75% 36.03ms + 90% 56.05ms + 99% 113.86ms + 175177 requests in 1.00m, 62.59MB read + Socket errors: connect 0, read 25, write 102, timeout 0 +Requests/sec: 2918.55 +Transfer/sec: 1.04MB +``` + +There are socket errors, but no timeout or connect errors. + +Tester 1 graph: + +![http.png](http.png) + +As we can see, there is period when curl failed to send requests +to the tea app. + +Tester 2 wrk output: + +```text +Running 1m test @ https://cafe.example.com/tea + 2 threads and 100 connections + Thread Stats Avg Stdev Max +/- Stdev + Latency 27.59ms 20.92ms 276.16ms 75.13% + Req/Sec 1.40k 717.86 3.85k 80.83% + Latency Distribution + 50% 22.65ms + 75% 36.58ms + 90% 55.70ms + 99% 96.40ms + 166933 requests in 1.00m, 58.48MB read + Socket errors: connect 85, read 43, write 0, timeout 0 +Requests/sec: 2780.49 +Transfer/sec: 0.97MB +``` + +There socket errors including 85 connect errors. + +Tester 2 graph: + +![https.png](https.png) + +As we can see, there is period where curl failed to send requests +to the coffee app. + +#### Old pods + +- nginx-gateway-55cb958549-nbdqz + - NFG handled a panic before exiting: + + ```text + INFO 2023-10-13T22:02:33.321381968Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Stopping and waiting for caches", "ts":"2023-10-13T22:02:33Z"} + INFO 2023-10-13T22:02:33.321821473Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Stopping and waiting for webhooks", "ts":"2023-10-13T22:02:33Z"} + INFO 2023-10-13T22:02:33.322852013Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Stopping and waiting for HTTP servers", "ts":"2023-10-13T22:02:33Z"} + ERROR 2023-10-13T22:02:33.323409193Z [resource.labels.containerName: nginx-gateway] [controller-runtime] log.SetLogger(...) was never called; logs will not be displayed. + ERROR 2023-10-13T22:02:33.323442347Z [resource.labels.containerName: nginx-gateway] Detected at: + ERROR 2023-10-13T22:02:33.323451863Z [resource.labels.containerName: nginx-gateway] > goroutine 67 [running]: + ERROR 2023-10-13T22:02:33.323457043Z [resource.labels.containerName: nginx-gateway] > runtime/debug.Stack() + ERROR 2023-10-13T22:02:33.323462448Z [resource.labels.containerName: nginx-gateway] > $GOROOT/src/runtime/debug/stack.go:24 +0x5e + ERROR 2023-10-13T22:02:33.323467879Z [resource.labels.containerName: nginx-gateway] > sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot() + ERROR 2023-10-13T22:02:33.323472975Z [resource.labels.containerName: nginx-gateway] > pkg/mod/sigs.k8s.io/controller-runtime@v0.16.0/pkg/log/log.go:60 +0xcd + ERROR 2023-10-13T22:02:33.323478263Z [resource.labels.containerName: nginx-gateway] > sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Enabled(0xc0003d6dc0, 0xc000288418?) + ERROR 2023-10-13T22:02:33.323483535Z [resource.labels.containerName: nginx-gateway] > pkg/mod/sigs.k8s.io/controller-runtime@v0.16.0/pkg/log/deleg.go:111 +0x32 + ERROR 2023-10-13T22:02:33.323488655Z [resource.labels.containerName: nginx-gateway] > github.com/go-logr/logr.Logger.Enabled(...) + ERROR 2023-10-13T22:02:33.323494496Z [resource.labels.containerName: nginx-gateway] > pkg/mod/github.com/go-logr/logr@v1.2.4/logr.go:261 + ERROR 2023-10-13T22:02:33.323499859Z [resource.labels.containerName: nginx-gateway] > github.com/go-logr/logr.Logger.Info({{0x1b58a48?, 0xc0003d6dc0?}, 0x1b55578?}, {0x19139af, 0x35}, {0x0, 0x0, 0x0}) + ERROR 2023-10-13T22:02:33.323504827Z [resource.labels.containerName: nginx-gateway] > pkg/mod/github.com/go-logr/logr@v1.2.4/logr.go:274 +0x72 + ERROR 2023-10-13T22:02:33.323509715Z [resource.labels.containerName: nginx-gateway] > sigs.k8s.io/controller-runtime/pkg/metrics/server.(*defaultServer).Start.func1() + ERROR 2023-10-13T22:02:33.323514683Z [resource.labels.containerName: nginx-gateway] > pkg/mod/sigs.k8s.io/controller-runtime@v0.16.0/pkg/metrics/server/server.go:231 +0x74 + ERROR 2023-10-13T22:02:33.323519441Z [resource.labels.containerName: nginx-gateway] > created by sigs.k8s.io/controller-runtime/pkg/metrics/server.(*defaultServer).Start in goroutine 65 + ERROR 2023-10-13T22:02:33.323524696Z [resource.labels.containerName: nginx-gateway] > pkg/mod/sigs.k8s.io/controller-runtime@v0.16.0/pkg/metrics/server/server.go:229 +0x825 + INFO 2023-10-13T22:02:33.325625043Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Wait completed, proceeding to shutdown the manager", "ts":"2023-10-13T22:02:33Z"} + ``` + + This error is specific to 0.6.0 release, graceful shutdown of the used edge version (tried separately) doesn't + lead to such an error. + - NGINX + - Access logs - all responses are 200. + - Error logs - no errors or warnings. +- nginx-gateway-55cb958549-nj95k + - NGF - same error. + - NGINX + - Access logs - only 200 responses. + - Error logs - no errors or warnings. + +#### New pods + +- nginx-gateway-578b49bc58-hmx5x + - NGF - no errors + - NGINX + - Access logs - 46 responses like below: + + ```text + INFO 2023-10-13T22:03:17.618556029Z [resource.labels.containerName: nginx] 10.128.0.9 - - [13/Oct/2023:22:03:17 +0000] "GET /coffee HTTP/1.1" 499 0 "-" "-" + . . . + INFO 2023-10-13T22:03:22.747023411Z [resource.labels.containerName: nginx] 10.128.15.241 - - [13/Oct/2023:22:03:22 +0000] "GET /tea HTTP/1.1" 499 0 "-" "-" + ``` + + Meaning clients closed connection (499 status code). + Those requests belong to wrk (curl requests have `curl` user agent in the logs). + All requests for coffee are during the same time. Same for tea. + That's probably how wrk closed connections before it exited after 60s. + - Error logs - No errors or warnings. +- nginx-gateway-578b49bc58-r4ckb + - NGF - no errors. + - NGINX + - Access logs - 35 responses similar to the first pod. Same conclusion as above. + - Error logs - No errors or warnings. + +## Opened Issues + +- Traffic loss during an upgrade. (to be opened soon) + +## Future Improvements + +- Use helm for upgrade, to catch any helm-related bugs preventing an upgrade. In this test, + we didn't use helm because 0.6.0 release does not allow you to configure the number of replicas + and pod affinity. diff --git a/tests/zero-downtime-upgrades/results/1.0.0/http.csv b/tests/zero-downtime-upgrades/results/1.0.0/http.csv new file mode 100644 index 000000000..30d50c4c5 --- /dev/null +++ b/tests/zero-downtime-upgrades/results/1.0.0/http.csv @@ -0,0 +1,600 @@ +2023-10-13 22:02:16.401267754+00:00,1 +2023-10-13 22:02:16.517072322+00:00,1 +2023-10-13 22:02:16.630980807+00:00,1 +2023-10-13 22:02:16.746924746+00:00,1 +2023-10-13 22:02:16.864835368+00:00,1 +2023-10-13 22:02:16.981624780+00:00,1 +2023-10-13 22:02:17.099343751+00:00,1 +2023-10-13 22:02:17.215674226+00:00,1 +2023-10-13 22:02:17.332868389+00:00,1 +2023-10-13 22:02:17.449274400+00:00,1 +2023-10-13 22:02:17.565786594+00:00,1 +2023-10-13 22:02:17.704246902+00:00,1 +2023-10-13 22:02:17.845398819+00:00,1 +2023-10-13 22:02:17.981309833+00:00,1 +2023-10-13 22:02:18.145217228+00:00,1 +2023-10-13 22:02:18.278931776+00:00,1 +2023-10-13 22:02:18.417801725+00:00,1 +2023-10-13 22:02:18.548661822+00:00,1 +2023-10-13 22:02:18.699792112+00:00,1 +2023-10-13 22:02:18.835705391+00:00,1 +2023-10-13 22:02:18.974460460+00:00,1 +2023-10-13 22:02:19.116605364+00:00,1 +2023-10-13 22:02:19.254361903+00:00,1 +2023-10-13 22:02:19.391363458+00:00,1 +2023-10-13 22:02:19.534092239+00:00,1 +2023-10-13 22:02:19.673546496+00:00,1 +2023-10-13 22:02:19.810678816+00:00,1 +2023-10-13 22:02:19.945477351+00:00,1 +2023-10-13 22:02:20.082527062+00:00,1 +2023-10-13 22:02:20.216756932+00:00,1 +2023-10-13 22:02:20.355692658+00:00,1 +2023-10-13 22:02:20.493617689+00:00,1 +2023-10-13 22:02:20.618808138+00:00,1 +2023-10-13 22:02:20.752063198+00:00,1 +2023-10-13 22:02:20.909011702+00:00,1 +2023-10-13 22:02:21.047813748+00:00,1 +2023-10-13 22:02:21.181231455+00:00,1 +2023-10-13 22:02:21.347212639+00:00,1 +2023-10-13 22:02:21.477722536+00:00,1 +2023-10-13 22:02:21.609348537+00:00,1 +2023-10-13 22:02:21.744690966+00:00,1 +2023-10-13 22:02:21.887596612+00:00,1 +2023-10-13 22:02:22.023796658+00:00,1 +2023-10-13 22:02:22.164814593+00:00,1 +2023-10-13 22:02:22.300710442+00:00,1 +2023-10-13 22:02:22.441841371+00:00,1 +2023-10-13 22:02:22.579973798+00:00,1 +2023-10-13 22:02:22.717057907+00:00,1 +2023-10-13 22:02:22.866475270+00:00,1 +2023-10-13 22:02:23.023339741+00:00,1 +2023-10-13 22:02:23.185375712+00:00,1 +2023-10-13 22:02:23.340773178+00:00,1 +2023-10-13 22:02:23.517860180+00:00,1 +2023-10-13 22:02:23.691238114+00:00,1 +2023-10-13 22:02:23.874298745+00:00,1 +2023-10-13 22:02:24.033022752+00:00,1 +2023-10-13 22:02:24.191817349+00:00,1 +2023-10-13 22:02:24.339531648+00:00,1 +2023-10-13 22:02:24.491559332+00:00,1 +2023-10-13 22:02:24.659885507+00:00,1 +2023-10-13 22:02:24.817504273+00:00,1 +2023-10-13 22:02:24.970862241+00:00,1 +2023-10-13 22:02:25.132621091+00:00,1 +2023-10-13 22:02:25.298381903+00:00,1 +2023-10-13 22:02:25.454528214+00:00,1 +2023-10-13 22:02:25.666781921+00:00,1 +2023-10-13 22:02:25.819514974+00:00,1 +2023-10-13 22:02:25.972053208+00:00,1 +2023-10-13 22:02:26.128448810+00:00,1 +2023-10-13 22:02:26.291277182+00:00,1 +2023-10-13 22:02:26.457262866+00:00,1 +2023-10-13 22:02:26.621374575+00:00,1 +2023-10-13 22:02:26.782638599+00:00,1 +2023-10-13 22:02:26.924320813+00:00,1 +2023-10-13 22:02:27.084548447+00:00,1 +2023-10-13 22:02:27.245900192+00:00,1 +2023-10-13 22:02:27.390848169+00:00,1 +2023-10-13 22:02:27.550779399+00:00,1 +2023-10-13 22:02:27.709282648+00:00,1 +2023-10-13 22:02:27.846770454+00:00,1 +2023-10-13 22:02:27.986144605+00:00,1 +2023-10-13 22:02:28.142165944+00:00,1 +2023-10-13 22:02:28.324334787+00:00,1 +2023-10-13 22:02:28.509048145+00:00,1 +2023-10-13 22:02:28.668261018+00:00,1 +2023-10-13 22:02:28.820619095+00:00,1 +2023-10-13 22:02:29.001766630+00:00,1 +2023-10-13 22:02:29.170908421+00:00,1 +2023-10-13 22:02:29.342051527+00:00,1 +2023-10-13 22:02:29.489414563+00:00,1 +2023-10-13 22:02:29.648566894+00:00,1 +2023-10-13 22:02:29.811374505+00:00,1 +2023-10-13 22:02:29.984628642+00:00,1 +2023-10-13 22:02:30.151872620+00:00,1 +2023-10-13 22:02:30.312363421+00:00,1 +2023-10-13 22:02:30.470920586+00:00,1 +2023-10-13 22:02:30.639996237+00:00,1 +2023-10-13 22:02:30.796254353+00:00,1 +2023-10-13 22:02:30.992835066+00:00,1 +2023-10-13 22:02:31.177160462+00:00,1 +2023-10-13 22:02:31.348744188+00:00,1 +2023-10-13 22:02:31.514879298+00:00,1 +2023-10-13 22:02:31.669582967+00:00,1 +2023-10-13 22:02:31.831035610+00:00,1 +2023-10-13 22:02:31.989780701+00:00,1 +2023-10-13 22:02:32.162517034+00:00,1 +2023-10-13 22:02:32.326925680+00:00,1 +2023-10-13 22:02:32.499661198+00:00,1 +2023-10-13 22:02:32.649564862+00:00,1 +2023-10-13 22:02:32.843295959+00:00,1 +2023-10-13 22:02:32.969887599+00:00,1 +2023-10-13 22:02:33.126156503+00:00,1 +2023-10-13 22:02:33.287255016+00:00,1 +2023-10-13 22:02:33.457019900+00:00,1 +2023-10-13 22:02:34.675234820+00:00,0 +2023-10-13 22:02:36.690478761+00:00,1 +2023-10-13 22:02:36.875470301+00:00,1 +2023-10-13 22:02:37.085451651+00:00,1 +2023-10-13 22:02:37.261998248+00:00,1 +2023-10-13 22:02:37.442375520+00:00,0 +2023-10-13 22:02:39.458650544+00:00,0 +2023-10-13 22:02:41.473866940+00:00,0 +2023-10-13 22:02:43.487966100+00:00,0 +2023-10-13 22:02:45.502372654+00:00,0 +2023-10-13 22:02:47.516837206+00:00,0 +2023-10-13 22:02:49.530364255+00:00,0 +2023-10-13 22:02:51.544570033+00:00,0 +2023-10-13 22:02:53.558568756+00:00,1 +2023-10-13 22:02:53.698340492+00:00,1 +2023-10-13 22:02:53.868853750+00:00,0 +2023-10-13 22:02:55.883441757+00:00,0 +2023-10-13 22:02:57.898610598+00:00,0 +2023-10-13 22:02:59.914635782+00:00,0 +2023-10-13 22:03:01.930556357+00:00,0 +2023-10-13 22:03:03.946743117+00:00,1 +2023-10-13 22:03:04.086644904+00:00,1 +2023-10-13 22:03:04.217027306+00:00,1 +2023-10-13 22:03:04.342423437+00:00,1 +2023-10-13 22:03:04.513518511+00:00,1 +2023-10-13 22:03:04.636036685+00:00,1 +2023-10-13 22:03:04.761123289+00:00,1 +2023-10-13 22:03:04.912162539+00:00,1 +2023-10-13 22:03:05.106096044+00:00,1 +2023-10-13 22:03:05.232308411+00:00,1 +2023-10-13 22:03:05.354976094+00:00,1 +2023-10-13 22:03:05.475825973+00:00,1 +2023-10-13 22:03:05.649688247+00:00,1 +2023-10-13 22:03:05.770811466+00:00,1 +2023-10-13 22:03:05.922898479+00:00,1 +2023-10-13 22:03:06.073930857+00:00,1 +2023-10-13 22:03:06.224405148+00:00,1 +2023-10-13 22:03:06.369697004+00:00,1 +2023-10-13 22:03:06.492797511+00:00,1 +2023-10-13 22:03:06.653686514+00:00,1 +2023-10-13 22:03:06.814933922+00:00,1 +2023-10-13 22:03:06.964783765+00:00,1 +2023-10-13 22:03:07.118853001+00:00,1 +2023-10-13 22:03:07.262523221+00:00,1 +2023-10-13 22:03:07.383893775+00:00,1 +2023-10-13 22:03:07.531555807+00:00,1 +2023-10-13 22:03:07.649608349+00:00,1 +2023-10-13 22:03:07.801743244+00:00,1 +2023-10-13 22:03:07.938675507+00:00,1 +2023-10-13 22:03:08.057765190+00:00,1 +2023-10-13 22:03:08.178800499+00:00,1 +2023-10-13 22:03:08.309831929+00:00,1 +2023-10-13 22:03:08.458225092+00:00,1 +2023-10-13 22:03:08.595358403+00:00,1 +2023-10-13 22:03:08.741849374+00:00,1 +2023-10-13 22:03:08.914195300+00:00,1 +2023-10-13 22:03:09.050413306+00:00,1 +2023-10-13 22:03:09.185885297+00:00,1 +2023-10-13 22:03:09.348828973+00:00,1 +2023-10-13 22:03:09.473147151+00:00,1 +2023-10-13 22:03:09.608633229+00:00,1 +2023-10-13 22:03:09.740745006+00:00,1 +2023-10-13 22:03:09.873536292+00:00,1 +2023-10-13 22:03:09.993061361+00:00,1 +2023-10-13 22:03:10.118795471+00:00,1 +2023-10-13 22:03:10.256107626+00:00,1 +2023-10-13 22:03:10.387025485+00:00,1 +2023-10-13 22:03:10.519129775+00:00,1 +2023-10-13 22:03:10.727938970+00:00,1 +2023-10-13 22:03:10.964045946+00:00,1 +2023-10-13 22:03:11.104379761+00:00,1 +2023-10-13 22:03:11.255892441+00:00,1 +2023-10-13 22:03:11.382794529+00:00,1 +2023-10-13 22:03:11.522340463+00:00,1 +2023-10-13 22:03:11.659804048+00:00,1 +2023-10-13 22:03:11.791812891+00:00,1 +2023-10-13 22:03:11.931213335+00:00,1 +2023-10-13 22:03:12.056934577+00:00,1 +2023-10-13 22:03:12.215684050+00:00,1 +2023-10-13 22:03:12.363564242+00:00,1 +2023-10-13 22:03:12.502069973+00:00,1 +2023-10-13 22:03:12.641964201+00:00,1 +2023-10-13 22:03:12.803468796+00:00,1 +2023-10-13 22:03:12.939324324+00:00,1 +2023-10-13 22:03:13.105831517+00:00,1 +2023-10-13 22:03:13.262822099+00:00,1 +2023-10-13 22:03:13.396586107+00:00,1 +2023-10-13 22:03:13.527261475+00:00,1 +2023-10-13 22:03:13.668955579+00:00,1 +2023-10-13 22:03:13.812379073+00:00,1 +2023-10-13 22:03:13.953277964+00:00,1 +2023-10-13 22:03:14.144827802+00:00,1 +2023-10-13 22:03:14.378206687+00:00,1 +2023-10-13 22:03:14.513231935+00:00,1 +2023-10-13 22:03:14.639906755+00:00,1 +2023-10-13 22:03:14.787302797+00:00,1 +2023-10-13 22:03:14.929731038+00:00,1 +2023-10-13 22:03:15.081391365+00:00,1 +2023-10-13 22:03:15.233099182+00:00,1 +2023-10-13 22:03:15.375624567+00:00,1 +2023-10-13 22:03:15.569605419+00:00,1 +2023-10-13 22:03:15.741449851+00:00,1 +2023-10-13 22:03:15.937712820+00:00,1 +2023-10-13 22:03:16.074272503+00:00,1 +2023-10-13 22:03:16.218844393+00:00,1 +2023-10-13 22:03:16.354464203+00:00,1 +2023-10-13 22:03:16.497887921+00:00,1 +2023-10-13 22:03:16.646961409+00:00,1 +2023-10-13 22:03:16.805857945+00:00,1 +2023-10-13 22:03:16.964259876+00:00,1 +2023-10-13 22:03:17.114069451+00:00,1 +2023-10-13 22:03:17.260543437+00:00,1 +2023-10-13 22:03:17.410165017+00:00,1 +2023-10-13 22:03:17.548306648+00:00,1 +2023-10-13 22:03:17.686913059+00:00,1 +2023-10-13 22:03:17.819833338+00:00,1 +2023-10-13 22:03:17.939432599+00:00,1 +2023-10-13 22:03:18.063856984+00:00,1 +2023-10-13 22:03:18.186225002+00:00,1 +2023-10-13 22:03:18.311172535+00:00,1 +2023-10-13 22:03:18.453862667+00:00,1 +2023-10-13 22:03:18.575840579+00:00,1 +2023-10-13 22:03:18.700464978+00:00,1 +2023-10-13 22:03:18.830411559+00:00,1 +2023-10-13 22:03:18.956539606+00:00,1 +2023-10-13 22:03:19.092666874+00:00,1 +2023-10-13 22:03:19.221199267+00:00,1 +2023-10-13 22:03:19.351408377+00:00,1 +2023-10-13 22:03:19.477805768+00:00,1 +2023-10-13 22:03:19.603699312+00:00,1 +2023-10-13 22:03:19.732808383+00:00,1 +2023-10-13 22:03:19.858914693+00:00,1 +2023-10-13 22:03:19.980187466+00:00,1 +2023-10-13 22:03:20.109461663+00:00,1 +2023-10-13 22:03:20.249326315+00:00,1 +2023-10-13 22:03:20.375360148+00:00,1 +2023-10-13 22:03:20.497820553+00:00,1 +2023-10-13 22:03:20.621289053+00:00,1 +2023-10-13 22:03:20.742800655+00:00,1 +2023-10-13 22:03:20.873062897+00:00,1 +2023-10-13 22:03:21.007924433+00:00,1 +2023-10-13 22:03:21.140099734+00:00,1 +2023-10-13 22:03:21.266642490+00:00,1 +2023-10-13 22:03:21.393285879+00:00,1 +2023-10-13 22:03:21.513091009+00:00,1 +2023-10-13 22:03:21.641472202+00:00,1 +2023-10-13 22:03:21.767076415+00:00,1 +2023-10-13 22:03:21.911954305+00:00,1 +2023-10-13 22:03:22.040241808+00:00,1 +2023-10-13 22:03:22.174208424+00:00,1 +2023-10-13 22:03:22.295018103+00:00,1 +2023-10-13 22:03:22.465931820+00:00,1 +2023-10-13 22:03:22.592940782+00:00,1 +2023-10-13 22:03:22.721731511+00:00,1 +2023-10-13 22:03:22.848523342+00:00,1 +2023-10-13 22:03:22.965041242+00:00,1 +2023-10-13 22:03:23.081484265+00:00,1 +2023-10-13 22:03:23.199838887+00:00,1 +2023-10-13 22:03:23.315903415+00:00,1 +2023-10-13 22:03:23.432998590+00:00,1 +2023-10-13 22:03:23.549290736+00:00,1 +2023-10-13 22:03:23.666451937+00:00,1 +2023-10-13 22:03:23.783512469+00:00,1 +2023-10-13 22:03:23.900062337+00:00,1 +2023-10-13 22:03:24.016975750+00:00,1 +2023-10-13 22:03:24.134122669+00:00,1 +2023-10-13 22:03:24.251166980+00:00,1 +2023-10-13 22:03:24.367795273+00:00,1 +2023-10-13 22:03:24.484597151+00:00,1 +2023-10-13 22:03:24.600395352+00:00,1 +2023-10-13 22:03:24.716741566+00:00,1 +2023-10-13 22:03:24.833061371+00:00,1 +2023-10-13 22:03:24.949702306+00:00,1 +2023-10-13 22:03:25.065614363+00:00,1 +2023-10-13 22:03:25.182807197+00:00,1 +2023-10-13 22:03:25.299959894+00:00,1 +2023-10-13 22:03:25.420025301+00:00,1 +2023-10-13 22:03:25.538973088+00:00,1 +2023-10-13 22:03:25.655301686+00:00,1 +2023-10-13 22:03:25.772412671+00:00,1 +2023-10-13 22:03:25.889647814+00:00,1 +2023-10-13 22:03:26.006769246+00:00,1 +2023-10-13 22:03:26.124124578+00:00,1 +2023-10-13 22:03:26.240372166+00:00,1 +2023-10-13 22:03:26.356742609+00:00,1 +2023-10-13 22:03:26.473973209+00:00,1 +2023-10-13 22:03:26.590731969+00:00,1 +2023-10-13 22:03:26.707930673+00:00,1 +2023-10-13 22:03:26.824644560+00:00,1 +2023-10-13 22:03:26.941088293+00:00,1 +2023-10-13 22:03:27.058076025+00:00,1 +2023-10-13 22:03:27.174029492+00:00,1 +2023-10-13 22:03:27.290701184+00:00,1 +2023-10-13 22:03:27.407839184+00:00,1 +2023-10-13 22:03:27.524239718+00:00,1 +2023-10-13 22:03:27.641195050+00:00,1 +2023-10-13 22:03:27.760419381+00:00,1 +2023-10-13 22:03:27.877599436+00:00,1 +2023-10-13 22:03:27.993948763+00:00,1 +2023-10-13 22:03:28.110887333+00:00,1 +2023-10-13 22:03:28.227752397+00:00,1 +2023-10-13 22:03:28.344167278+00:00,1 +2023-10-13 22:03:28.460750869+00:00,1 +2023-10-13 22:03:28.578341685+00:00,1 +2023-10-13 22:03:28.694956575+00:00,1 +2023-10-13 22:03:28.811983845+00:00,1 +2023-10-13 22:03:28.928898015+00:00,1 +2023-10-13 22:03:29.045882114+00:00,1 +2023-10-13 22:03:29.162295662+00:00,1 +2023-10-13 22:03:29.280598889+00:00,1 +2023-10-13 22:03:29.398526455+00:00,1 +2023-10-13 22:03:29.514498638+00:00,1 +2023-10-13 22:03:29.631428088+00:00,1 +2023-10-13 22:03:29.748490625+00:00,1 +2023-10-13 22:03:29.864491856+00:00,1 +2023-10-13 22:03:29.981544037+00:00,1 +2023-10-13 22:03:30.100156070+00:00,1 +2023-10-13 22:03:30.216825443+00:00,1 +2023-10-13 22:03:30.332970597+00:00,1 +2023-10-13 22:03:30.449452949+00:00,1 +2023-10-13 22:03:30.566133743+00:00,1 +2023-10-13 22:03:30.682419377+00:00,1 +2023-10-13 22:03:30.799550992+00:00,1 +2023-10-13 22:03:30.916057435+00:00,1 +2023-10-13 22:03:31.033456469+00:00,1 +2023-10-13 22:03:31.149942289+00:00,1 +2023-10-13 22:03:31.266591655+00:00,1 +2023-10-13 22:03:31.383622595+00:00,1 +2023-10-13 22:03:31.499844231+00:00,1 +2023-10-13 22:03:31.616232982+00:00,1 +2023-10-13 22:03:31.733519544+00:00,1 +2023-10-13 22:03:31.849843448+00:00,1 +2023-10-13 22:03:31.967277483+00:00,1 +2023-10-13 22:03:32.083312271+00:00,1 +2023-10-13 22:03:32.200161935+00:00,1 +2023-10-13 22:03:32.317035405+00:00,1 +2023-10-13 22:03:32.433863288+00:00,1 +2023-10-13 22:03:32.550372393+00:00,1 +2023-10-13 22:03:32.666806674+00:00,1 +2023-10-13 22:03:32.783835430+00:00,1 +2023-10-13 22:03:32.900507542+00:00,1 +2023-10-13 22:03:33.016851251+00:00,1 +2023-10-13 22:03:33.132950737+00:00,1 +2023-10-13 22:03:33.249277868+00:00,1 +2023-10-13 22:03:33.366254065+00:00,1 +2023-10-13 22:03:33.482444809+00:00,1 +2023-10-13 22:03:33.598638010+00:00,1 +2023-10-13 22:03:33.714924729+00:00,1 +2023-10-13 22:03:33.832095110+00:00,1 +2023-10-13 22:03:33.949655578+00:00,1 +2023-10-13 22:03:34.066460488+00:00,1 +2023-10-13 22:03:34.182882666+00:00,1 +2023-10-13 22:03:34.298995234+00:00,1 +2023-10-13 22:03:34.415613834+00:00,1 +2023-10-13 22:03:34.531904058+00:00,1 +2023-10-13 22:03:34.647668996+00:00,1 +2023-10-13 22:03:34.764135195+00:00,1 +2023-10-13 22:03:34.880264878+00:00,1 +2023-10-13 22:03:34.996519731+00:00,1 +2023-10-13 22:03:35.112868652+00:00,1 +2023-10-13 22:03:35.228814924+00:00,1 +2023-10-13 22:03:35.344849894+00:00,1 +2023-10-13 22:03:35.461895101+00:00,1 +2023-10-13 22:03:35.577794017+00:00,1 +2023-10-13 22:03:35.693502568+00:00,1 +2023-10-13 22:03:35.809818188+00:00,1 +2023-10-13 22:03:35.925565705+00:00,1 +2023-10-13 22:03:36.041577955+00:00,1 +2023-10-13 22:03:36.157562405+00:00,1 +2023-10-13 22:03:36.273611687+00:00,1 +2023-10-13 22:03:36.389839888+00:00,1 +2023-10-13 22:03:36.506451912+00:00,1 +2023-10-13 22:03:36.622311793+00:00,1 +2023-10-13 22:03:36.738495245+00:00,1 +2023-10-13 22:03:36.855095310+00:00,1 +2023-10-13 22:03:36.971341835+00:00,1 +2023-10-13 22:03:37.087735763+00:00,1 +2023-10-13 22:03:37.203878798+00:00,1 +2023-10-13 22:03:37.319836438+00:00,1 +2023-10-13 22:03:37.435822799+00:00,1 +2023-10-13 22:03:37.552114326+00:00,1 +2023-10-13 22:03:37.668341929+00:00,1 +2023-10-13 22:03:37.785602301+00:00,1 +2023-10-13 22:03:37.902336468+00:00,1 +2023-10-13 22:03:38.018729012+00:00,1 +2023-10-13 22:03:38.135622404+00:00,1 +2023-10-13 22:03:38.252226479+00:00,1 +2023-10-13 22:03:38.368606362+00:00,1 +2023-10-13 22:03:38.485760253+00:00,1 +2023-10-13 22:03:38.601651841+00:00,1 +2023-10-13 22:03:38.718590077+00:00,1 +2023-10-13 22:03:38.835363940+00:00,1 +2023-10-13 22:03:38.952154931+00:00,1 +2023-10-13 22:03:39.068445383+00:00,1 +2023-10-13 22:03:39.185385667+00:00,1 +2023-10-13 22:03:39.300959581+00:00,1 +2023-10-13 22:03:39.417610737+00:00,1 +2023-10-13 22:03:39.533556011+00:00,1 +2023-10-13 22:03:39.649516250+00:00,1 +2023-10-13 22:03:39.765381369+00:00,1 +2023-10-13 22:03:39.882010410+00:00,1 +2023-10-13 22:03:39.998764045+00:00,1 +2023-10-13 22:03:40.114780059+00:00,1 +2023-10-13 22:03:40.230825292+00:00,1 +2023-10-13 22:03:40.347509169+00:00,1 +2023-10-13 22:03:40.463819742+00:00,1 +2023-10-13 22:03:40.579911691+00:00,1 +2023-10-13 22:03:40.695877615+00:00,1 +2023-10-13 22:03:40.811768605+00:00,1 +2023-10-13 22:03:40.928281302+00:00,1 +2023-10-13 22:03:41.045033479+00:00,1 +2023-10-13 22:03:41.161046106+00:00,1 +2023-10-13 22:03:41.277745277+00:00,1 +2023-10-13 22:03:41.393933707+00:00,1 +2023-10-13 22:03:41.510114180+00:00,1 +2023-10-13 22:03:41.626022414+00:00,1 +2023-10-13 22:03:41.742935880+00:00,1 +2023-10-13 22:03:41.859937458+00:00,1 +2023-10-13 22:03:41.975765066+00:00,1 +2023-10-13 22:03:42.092355947+00:00,1 +2023-10-13 22:03:42.209383000+00:00,1 +2023-10-13 22:03:42.326092155+00:00,1 +2023-10-13 22:03:42.442088974+00:00,1 +2023-10-13 22:03:42.558464691+00:00,1 +2023-10-13 22:03:42.674280248+00:00,1 +2023-10-13 22:03:42.791379253+00:00,1 +2023-10-13 22:03:42.908070489+00:00,1 +2023-10-13 22:03:43.026073412+00:00,1 +2023-10-13 22:03:43.143387681+00:00,1 +2023-10-13 22:03:43.260332007+00:00,1 +2023-10-13 22:03:43.377631840+00:00,1 +2023-10-13 22:03:43.494211447+00:00,1 +2023-10-13 22:03:43.611301480+00:00,1 +2023-10-13 22:03:43.727961656+00:00,1 +2023-10-13 22:03:43.844835615+00:00,1 +2023-10-13 22:03:43.961273953+00:00,1 +2023-10-13 22:03:44.077695217+00:00,1 +2023-10-13 22:03:44.193889980+00:00,1 +2023-10-13 22:03:44.310945488+00:00,1 +2023-10-13 22:03:44.428692399+00:00,1 +2023-10-13 22:03:44.544730304+00:00,1 +2023-10-13 22:03:44.660930278+00:00,1 +2023-10-13 22:03:44.778067251+00:00,1 +2023-10-13 22:03:44.894214519+00:00,1 +2023-10-13 22:03:45.010329940+00:00,1 +2023-10-13 22:03:45.126184596+00:00,1 +2023-10-13 22:03:45.241934435+00:00,1 +2023-10-13 22:03:45.359568640+00:00,1 +2023-10-13 22:03:45.476953775+00:00,1 +2023-10-13 22:03:45.593264520+00:00,1 +2023-10-13 22:03:45.709174385+00:00,1 +2023-10-13 22:03:45.826104338+00:00,1 +2023-10-13 22:03:45.943084678+00:00,1 +2023-10-13 22:03:46.059576773+00:00,1 +2023-10-13 22:03:46.175956114+00:00,1 +2023-10-13 22:03:46.292341533+00:00,1 +2023-10-13 22:03:46.408506480+00:00,1 +2023-10-13 22:03:46.524077853+00:00,1 +2023-10-13 22:03:46.639982135+00:00,1 +2023-10-13 22:03:46.756035890+00:00,1 +2023-10-13 22:03:46.873474618+00:00,1 +2023-10-13 22:03:46.989418992+00:00,1 +2023-10-13 22:03:47.105524005+00:00,1 +2023-10-13 22:03:47.221085049+00:00,1 +2023-10-13 22:03:47.337085134+00:00,1 +2023-10-13 22:03:47.452806824+00:00,1 +2023-10-13 22:03:47.569265473+00:00,1 +2023-10-13 22:03:47.685746684+00:00,1 +2023-10-13 22:03:47.802134561+00:00,1 +2023-10-13 22:03:47.918292076+00:00,1 +2023-10-13 22:03:48.034881960+00:00,1 +2023-10-13 22:03:48.151203813+00:00,1 +2023-10-13 22:03:48.267733709+00:00,1 +2023-10-13 22:03:48.384411934+00:00,1 +2023-10-13 22:03:48.500953909+00:00,1 +2023-10-13 22:03:48.617464042+00:00,1 +2023-10-13 22:03:48.733106306+00:00,1 +2023-10-13 22:03:48.849462605+00:00,1 +2023-10-13 22:03:48.966071240+00:00,1 +2023-10-13 22:03:49.082465195+00:00,1 +2023-10-13 22:03:49.197986828+00:00,1 +2023-10-13 22:03:49.313859552+00:00,1 +2023-10-13 22:03:49.430306867+00:00,1 +2023-10-13 22:03:49.547776785+00:00,1 +2023-10-13 22:03:49.663773918+00:00,1 +2023-10-13 22:03:49.779911657+00:00,1 +2023-10-13 22:03:49.895967206+00:00,1 +2023-10-13 22:03:50.012298967+00:00,1 +2023-10-13 22:03:50.130216593+00:00,1 +2023-10-13 22:03:50.247070556+00:00,1 +2023-10-13 22:03:50.363302052+00:00,1 +2023-10-13 22:03:50.479769887+00:00,1 +2023-10-13 22:03:50.596167860+00:00,1 +2023-10-13 22:03:50.712540010+00:00,1 +2023-10-13 22:03:50.829929144+00:00,1 +2023-10-13 22:03:50.945875310+00:00,1 +2023-10-13 22:03:51.061940002+00:00,1 +2023-10-13 22:03:51.178657450+00:00,1 +2023-10-13 22:03:51.294643419+00:00,1 +2023-10-13 22:03:51.410575223+00:00,1 +2023-10-13 22:03:51.526553594+00:00,1 +2023-10-13 22:03:51.643412049+00:00,1 +2023-10-13 22:03:51.759089152+00:00,1 +2023-10-13 22:03:51.875257397+00:00,1 +2023-10-13 22:03:51.991136009+00:00,1 +2023-10-13 22:03:52.107694196+00:00,1 +2023-10-13 22:03:52.223874536+00:00,1 +2023-10-13 22:03:52.340870362+00:00,1 +2023-10-13 22:03:52.456915567+00:00,1 +2023-10-13 22:03:52.572992006+00:00,1 +2023-10-13 22:03:52.689119749+00:00,1 +2023-10-13 22:03:52.805867281+00:00,1 +2023-10-13 22:03:52.921557120+00:00,1 +2023-10-13 22:03:53.038327880+00:00,1 +2023-10-13 22:03:53.154521521+00:00,1 +2023-10-13 22:03:53.270567345+00:00,1 +2023-10-13 22:03:53.387600861+00:00,1 +2023-10-13 22:03:53.504755014+00:00,1 +2023-10-13 22:03:53.620734064+00:00,1 +2023-10-13 22:03:53.736949981+00:00,1 +2023-10-13 22:03:53.855054496+00:00,1 +2023-10-13 22:03:53.971330261+00:00,1 +2023-10-13 22:03:54.087573799+00:00,1 +2023-10-13 22:03:54.204598724+00:00,1 +2023-10-13 22:03:54.320647952+00:00,1 +2023-10-13 22:03:54.436971449+00:00,1 +2023-10-13 22:03:54.553515908+00:00,1 +2023-10-13 22:03:54.669796528+00:00,1 +2023-10-13 22:03:54.786006060+00:00,1 +2023-10-13 22:03:54.901954444+00:00,1 +2023-10-13 22:03:55.017903480+00:00,1 +2023-10-13 22:03:55.134342837+00:00,1 +2023-10-13 22:03:55.250825934+00:00,1 +2023-10-13 22:03:55.366937071+00:00,1 +2023-10-13 22:03:55.483427642+00:00,1 +2023-10-13 22:03:55.599551801+00:00,1 +2023-10-13 22:03:55.715816011+00:00,1 +2023-10-13 22:03:55.833075311+00:00,1 +2023-10-13 22:03:55.950062560+00:00,1 +2023-10-13 22:03:56.066246570+00:00,1 +2023-10-13 22:03:56.182844419+00:00,1 +2023-10-13 22:03:56.299472552+00:00,1 +2023-10-13 22:03:56.415931672+00:00,1 +2023-10-13 22:03:56.532222612+00:00,1 +2023-10-13 22:03:56.648741586+00:00,1 +2023-10-13 22:03:56.766476002+00:00,1 +2023-10-13 22:03:56.884079768+00:00,1 +2023-10-13 22:03:57.000162046+00:00,1 +2023-10-13 22:03:57.116598453+00:00,1 +2023-10-13 22:03:57.232377346+00:00,1 +2023-10-13 22:03:57.347984635+00:00,1 +2023-10-13 22:03:57.464166860+00:00,1 +2023-10-13 22:03:57.580052154+00:00,1 +2023-10-13 22:03:57.696762158+00:00,1 +2023-10-13 22:03:57.813942527+00:00,1 +2023-10-13 22:03:57.930403715+00:00,1 +2023-10-13 22:03:58.047200508+00:00,1 +2023-10-13 22:03:58.163766199+00:00,1 +2023-10-13 22:03:58.279836028+00:00,1 +2023-10-13 22:03:58.396179231+00:00,1 +2023-10-13 22:03:58.511771293+00:00,1 +2023-10-13 22:03:58.627676352+00:00,1 +2023-10-13 22:03:58.743988103+00:00,1 +2023-10-13 22:03:58.860144131+00:00,1 +2023-10-13 22:03:58.976318672+00:00,1 +2023-10-13 22:03:59.092089305+00:00,1 +2023-10-13 22:03:59.208609380+00:00,1 +2023-10-13 22:03:59.324888205+00:00,1 +2023-10-13 22:03:59.441815787+00:00,1 +2023-10-13 22:03:59.558670020+00:00,1 +2023-10-13 22:03:59.674762764+00:00,1 +2023-10-13 22:03:59.793043614+00:00,1 +2023-10-13 22:03:59.910217777+00:00,1 +2023-10-13 22:04:00.027179784+00:00,1 +2023-10-13 22:04:00.144635362+00:00,1 +2023-10-13 22:04:00.262162419+00:00,1 +2023-10-13 22:04:00.379458299+00:00,1 +2023-10-13 22:04:00.495714913+00:00,1 +2023-10-13 22:04:00.611902177+00:00,1 +2023-10-13 22:04:00.728601484+00:00,1 +2023-10-13 22:04:00.845492153+00:00,1 +2023-10-13 22:04:00.961872709+00:00,1 +2023-10-13 22:04:01.078414449+00:00,1 +2023-10-13 22:04:01.194583271+00:00,1 +2023-10-13 22:04:01.310817545+00:00,1 +2023-10-13 22:04:01.427235330+00:00,1 diff --git a/tests/zero-downtime-upgrades/results/1.0.0/http.png b/tests/zero-downtime-upgrades/results/1.0.0/http.png new file mode 100644 index 0000000000000000000000000000000000000000..8139920d3f75fac17e25cdecdeea1cd8dd8508ce GIT binary patch literal 6548 zcmeHLX;f3mw%(1>+9J?S(4q{^HngCiAVy4_v0H`~5EKDHL768>7?S{Putl3_iy%Qj zL{Mf35J;Hp2tj5+K!%V;49FB_Vu0k_gRZ{!y><1|Vg%av5sLdhT-!HGt_^^OFdIpq8Yhq@_nC02!$S1O$Rg z9MsgnQRhCSQVqzsN;0zdGC2|eM+St9kdU{})YZ|hxWAjxeo9J8IyyQ=Mn+CfP9zfP z(W6InI=!;8vcA5a&*!5?)X%fhk#-l!Ni_VQejo zu7lwXFr)y3KLB3|K5T;b%Fw3*JT>5Mg`REDwF5eLLdR}sQ-`-2(6R@bwV-J)G-yM; z4%8li8a?0~gsMaE3Il%`KqVF`a8Py(N{<8cBov zNVkCJmXKxz&n`lWH6+_Wk{u*ohQ|&Ne+A+kA@&-?T!&~Ec;E{6-GS-}QC@J@2k!X7 zZ9lk0fCzuM83?}xL0AZoNkAe)NEihD27x!hKLQB1!0$Hr-T|Mx;1va)RB*o!t`EQ^ z8m`B{wODYBgDdgi@E9&9f?X2WB!hJdTzm#rX<+#rEYiU&6V7FUX%3j=f>A#FQUE^} zf*}J=7sE*=9501qWq_*yY$X`{1(;WGs0t2pK(7W4)PhbuXg9##CeUhzJuRT|7S!8d zcL(h3gdJV5tp~PpL5&A0eW3gvHhlmkKKw8U3PZ4A7}kx#+A;Wk9ONcJb_!%>VAU+F z6vDS6ke-KSiy$Rl$~6N$*#b0!zwi3Nooc4bU$&GYM#a)8%a0uo-DzjRTW8NmUGZWt_$FmOF`2sk%>Dl^DV)*Ta2I3- z-^EnVNDVsD98TL4l^!A*{(fGyJK6^6*rsu{_f<&A_`EJbPj!+yLMwZgZkKQ7to;); znV4cu-&u6Q-okHWjeJR7r**aT*0SbeT%2S)TeFU-?2##_IDDKN_ryp1Y^XO|zlSx# znbEplowc9u?KnM<{HYJm*)0OAX3!>5x61Cf&ldQA;y{YMGD=91JD=v^Cz6QQqT*QL$?b z)9me!-$SK;?mxfHuw!#=hcb4T!6hZ)?ZmQIaROz#Mq!kdXl({Ylj_F6uD@epMC#74 zu6Q@v_lB+DByYQ4GPb-jY=AFYeV7v~x;ZllVbYZ&m|e zsw96lvOz^ol}+vpx77Nc7xM9FD<@67%itt~*MzN7IwE%dBZd;hVvbd-4zW6|KZx`K zTazZ&&`t8@$L27XG^rX(Shw{1FRZ^YEA@zCNV^(36G4?|+;ZXB8HC7dUmC7XT;v2T z@FdTtJ(3R}xFNx8S@MvkwZ(}SJ`+V#s42{{^O!4Uw4w;8oxXqCvSW6%C~JN{B&T@m zzw>ze5R(4DUox8?+Y{+CSl%Spzg;xHtMbg$$911_$79!@y`Mvpir=nD?U4PaMMhzz z(>p@S;%$50y%y=PUaqIgobr!0Y9G1c0n{|Lpm=NMRMPpd zYx&7HzP7%BpKM5YN?+y<#a*S6p3biS*e8Q!xPn5N5937<;TONVojh1NFf-9MiSQQu zuOPMV2OppDh<$vDrH>hZ?M@1^PFVFkI%#)cqpPtzuip5kA2Tn*Wbg?s@l+5|BE(1} zT;?&{zsF?vJ#(S7o^P`I)vtA<@wK!!w!Sk5v!eEBEPwm6>3V~~9cdBR>%s^{GrEpM zkU?az>ZG-?$O5X{mL(M7+3F^Gx@}&kS~O%nj4qDriN&mto8*6^1Rg(RVYrlEU=NzTz>GaE{$?j=Hh%{dK$}wJIPlsoV z^xU{&%@?h0E~!N=J9$rPiO!EpyuVhz>hMnzBzQVnTRN7FulzN|GKd+?n#w(hw^(*- zYgtcgR_3r|+(US8H;FegEJ+|a=q1+oZ<{T4_Z25gx42S%@k0n5n{DxhbQv3)Mvr$5 zTA5K(L$8C}PFV&9x2ncc>^$n+uw}Tod;{Aa;;K73OQNhg9umk;Ta3rNN2pg`vVgb?v?41mVTO~ zdH@Z_}kyU&l#0t*h)k_JvQCE*|YLUNq$8% z>4>guMbGV$BYtxm@DaxibxQw8_fS2WY*6m*oF>#_hYQBvAJ-xHkxUvJ%XUqFGBofI z2cwLuhsWy#6KNmi6)bt-Z?9d^RA`Pvkuy01zR&&#@*@N8kscjdk$q3o2L{rt(49cl z)~b-}RjhAgrOI#B5!T$up8J%F)kZ;3hKLfg%n*x*ZC=~oSvYd<^1|cG=W1?BtfSJA zkOzH=X7db{G?@`p&;ga$##vO*6-$C@cPH`2CulRXUjk<%b7Fq4sm9S?Q=o9|`je}^ zR=?`-H3_~>hOaBZ|6xV+KdnLFCmC>awW3IS-p39dtAUTA93lG5={^0zbZMzT3oE!_3iMVy~3R#qr{FJN~_eyhjkNw zxmYRLoUv9TN#gyyiGFyIDAHv0+tX#!EAw;aZ|2kXP<^ny#8UGToUS}-_72@|(XG1O zNhU?(N@;*-r-5)xjv#)DB}wX;#oZ0rjat|o{~NxDmxqKv`XeN`Tk6Gp_=zC?O6|;V zqtxW(KD$`$nuP9~t!vmqHBD;HL~o20R}dUJljyL4OB_$LBHo_e(OZmTrXoro+(I{d zWYoIwcw<5}dHJ_1_FdFZIwA617*4dbouSQm;+!2rZ*P@e`bj^x^r!r_`c;Sjha`|c zB^u2gHxzAPmv4I}v%HMOJ3NXx$HX^H{uPbXGJgm6+W z-wzhA`+K{0XsO3PEuI(D);3kxD*Wd(NTS6|RUgT&$jzxuA2KEs7+t_eauyG}5&hV? z4wf+%++HPoP|r~_U28&zoL&8StQ?JV@i4wFH`=V^ha$XsOprAuy3V_b`Qp-QYlCQd zgO=c(_Nc_X{I~@n!a#Y>%+meM`L-X=ly$2;k4w0tZ7?usPtUNX+@<_ptUcOWoZWSIn&C*GliFBpr#uCM+s5;m@wHXC zg=PUI42L#z+dI~J(UJBeX;THFDrIZR2144Da-7RN%S4CQx3Bj7KCCRpmU}p}_yFoG zwGrF?klP@!AgSnzx7DKzGv8@PS6wkHajWfm@skRz_I>D!!|p_H&O`eu?u+1vbQMBe z21~M{J35c@uf^Zt6q_~6z9}WNxZ6|SmQvTU+l57Y~9o{E0bbXDz-uB152pBgxcFW)hCa2 zz)Zh;(IE&`FwstJdYOf2WpjV<52k>eg)tHE0ERv)CO52MbBQhy=SN86OC*)G6v|cHQ3C;#Et~wMURD8Vb_|gtxKtr z$8g}9qwVuodZ$tMQ}?0*A~&6;%7@fr7NkBF-j_bC^I>uBbeoP zZ)!hj6CsB1kPu;%H>DoLWmwnQYoz(?`%D-OLfwOm)TvrQ`&oX0$jGc@GoGZ*>zg3D zF8)=n6y?~kuhzftNKkzfJ2itwQBD{d?NCQ`GVa8PiLWy71g3LfPcftO3S+gk{ukE! z+-A)+&9dWb-C39l=k^(N#w_{V@xM=fdxx&}#&#Q-mmkx+vtx#i)1ygD60v5FGOHiL$a_sKGxw1mk37W!2Qw%*@QLUcHLL z;T}JJTwY#YS6BDp!v`jlDNwO+>{uczvABO?!FJ)F2*RSrAjCH%EEduS0&Q?baL4V9 z->VvL|HpUy&Yk`ODgdqcwpkETM4+t`kxE8zmJ3l)h%u~11x!B-U6}a2hU8rtC$_vy z%Z^e+o&YO`^#Ho*+4Mw916yxYloUisNkth5T2$1vKEkqY+~yhCduLE!OpbiZb1E@Ix)kZ)zhDsA4nZm1IpyCXanL()qlvqOX zIViG*=eF?79`et_lM9gN2szG>?Fxj8ka-C*E<>6-Jo124Pe{2254_>NFC_Xwf1*Rl0sRUy(7*)fG8Ze*$x(@W-f=)eXzXQ!i&}ahnW>EV8$6Db?JD@t?a2Ke2 z1mzw$&<9Eb@Y5jtNQ1p}P-KAo2<#pOxp9zX!j4Iho`xT0VEY_=KM&j3uyqk6mOzXH z-z~#8T-dw{qHC~e9YpxTT+`E#s1RiExBjLMfdE_Uh1XXpDqH0Mq?1mcG_(oHn5T=~ z&$*C$((U^9+jbZpsmMb1&3yB8$xL8&_x(+4ACNI& zjxhN@+d4eB!3)jak9EvQ!H?4SaL`25Agjc~uBq}j{Afq8FGhwG(mYRBJy=NEWh2WG zyJD|r{oHDMdQoYa)8Obqu`1_)3yNKGnbM|oKHbjPN}Vs+G8Q{?oDb{ybG}jLwl84B}1Q!yZVeQb~8V&U)d>j;I~7ZgaN1r@_KnyRurQ z#E+rt{26(NAaKL|zJ7T-&#E7hc{Z zL$!kc>4}kpM&lEInwev7ReBhKO+-KD@Dz@-g4M{VWYJ?^h^u1H>bG6feK^u-Mh_8<%*E4NYud4nsr)@diw)(?2{KI_pl89~N#$2sq^>bWlk?A0i}Uk8(WO-gmtIYZ-scDB$5`8TGbSd6 zOM=5_WCqo9%wcT5#mj~*J0y~hzxPfI;KWCUZ$8+bs^s^z#tr1Vi&Kv3H`Tk+jgOOO_{F|G75T2yyv{S@i~B>V z+1=8)*q3B}@RkfzqMT6Y(Xr`CqmCZ2D?9f3-WTo4JV8!SY&OC!FKLW>CPFWMnvh#j-bKOr9Jgifr*$Zw`kfRZyD2WQsq+}8vQ#fog=S*IEm#SZHlcl z_`5tt>|^Ucky79N)8&Be!qmWDPv#zaJ1{n?&O%$OUu}(DGmB@8tAVPtJ~T=_EA{<3hl4Y#)W|JO8IBdo}cPH}`M_kE@zU@J;_qy5$p6 z%U#9O)kqt&XP0hu_IdUtv-KQSx(5R64T`@u?(`o1w7jr9RamNu_9<-PcHE_$3R~y1 zvFqzT&U^j>ADezs#49P5N9SI{`)TfuxVw^8-sm3T2%b+@i&GWHGfZqI-YkhU?Y*S$G2#_#ag@^jG!2p zqbuh;SNQCfaG5h-3zZVX%p%vesH~p)Oy=##!yOG=d*g0W&c?Y;Oe&t>@myK6`=>>C zJh3Z3TB;aYaSo~`PDbaLes%4q19zl^rhj?;;)j1HK?$j>pgBjo;@1ORRW-Eci0e+f zcx9VUCJnQgb`Nd2*eaak144P{yTieBBy%N*r^#31Zzw#%EQRpiE1~1jAg5{%v?sR`Ilw%2jOj`}Wb4Cx_O4 z;wsdupiqfq&$d)mvAlyb!F?a8u`8b$D;>TLwsgtC&e+or5-b{L-lt*8E(J8EuOD)8 zXv=X8U>jDiEQx4k$u);ESPf!%%7J66Z>K4Avv;d`R!({cRrJq2o<3@4(HoTfL$c2B zxW{61jX+(C91^!#?eazA=OasH-r2CBk%{GBNxJ09+K`&{@{?Ulyd@%H6ns&6tn*=!|& zMDSTE4|2z%<0Id{y{yR#oSxH}OUO?o}00laFjG=7AajqYsqmfTUQ)jh$G!hcdlYT zG)C-fYd)*?3jQMZEQ->IXu6Z?+1&8(V|Erj0=upMv^zkWf1ekT6~=ciGIEe|$0_xv z8uw)BZ+RU)mn25dmv`r~t=4dlx@tI;PLEFAWc+;SY{N`}&;{52^mnts--uzg$QpNd zl@xrpX={34sG?2Hr3%ri=AZ5*h`zTQHK)I8mM8)a{F5)lrOp_;K%r9`jrM`j|=W} z=Ur`zzP3`f2+iA79u*!Gy*};aT#cK(LVcp*NFunrPBL~<(7#n-`&#aBBJtVKN=J>h z*BzrFJA#uUZo}!hCOHJ@eIIwmo{0GaxK5wDm;aGZbd*RH?Lwt){ANq}d2lpPY`mRt z<_8lQHbZ;Bf4aGMKHna5<-Z2@w=3ey91rk1MGkJwClF96s@q;h_(y+q@6OzJN-Xd2 zC}(UnoX7w)Na?l z*&<^~**x=JWM|us+RdHzpXaQLg3->3^s4h!EB*`4MO?Z?#7BXUBUa&-pwL_NaJP~t zr}KB6WB`Y)v%b$+7vYE`xrlz*;C|WN|36ka%w(%btf2f|zauDkyF_F3X%yW#opWS^ z4m>XvA7Rpra!|x#c=UhQ%vm*?D04bWqRHd4Y}o&s*mwk0^K|XTR{#6qEh0%F8!Kgr z|EbM6xXxa8t5@_`Bf-H^*y`%kjC*<`HJJDOK;hL-rMcv1sd0zzd9PWVZ^7f`8q<8M z+|OIf7>B*Wn^1ernd_b^`2qSqqyS})F*^&UCsHm)6nE4~>2{NRy0pFdxt{!VNXB!Y zU^9m-oPKTSz{FxH{~uaPsIt>xUeu0Dkql@e(L zAvE)PuE{cM%6 zrgt&NBBuu;p?`)+K-9(noi#^|@yHeeyqV9-bbj-e}GDlE;WU z%J)g7PUcBUV(;`w%F#>R`!OLl8e1Lp1lD=y2k82=*xYL<&T4MiAek|P`Z4o1E)rF0 z0)o9`MZO3J6B1(HzVdWPSJrN~>$G}0{sw8>sz#ou`$DUN9@w?AIMVjgJYmM4`p_*` zLIA7vC#<#)-^Tp!V2|jXwLLp5{=NrqMv!Pxp&X{JxnA*>X`1uTb)mT?P{q2I#ja?C z@v8!&Ki~zl281D!C3-1);2qUca{28mB!LA?8y86-$s<};J>7kSlZ6T93*~N;=^EnR zym%FYjPA|KmgQ*)$GG|9y6@7q{t-$G8p+IhwSJ_5N0|2$M1ovUeRQtPCi3m)Cv9$n z)h#_+N^-W^$I@+8T*~j_)h7|c%ZWvm7CHRvV}3H zGrYgKAVVVTND9F}{BiJFU9E0r%gm>$?Wu*izVebfr{wvLQ9abqRGh+f^*JV0VkmQ~ z#$a$uY4@8#K^mGfefrCV6Gi`Kr5lp|R4^f#Hf?cwkAC^z2g!K}Ym#wpJrCCcP%45G PYdCFee)7eMi!uKKYl4f; literal 0 HcmV?d00001 diff --git a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md new file mode 100644 index 000000000..6bf695673 --- /dev/null +++ b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md @@ -0,0 +1,252 @@ +# Zero-Downtime Upgrades + +This document describes a test plan for testing zero-downtime upgrades of NGF. + +*Zero-downtime upgrades* means that during an NGF upgrade clients don't experience any +interruptions to the traffic they send to applications exposed via NGF. + + + +- [Zero-Downtime Upgrades](#zero-downtime-upgrades) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Test Environment](#test-environment) + - [Steps](#steps) + - [Start](#start) + - [Upgrade](#upgrade) + - [After Upgrade](#after-upgrade) + - [Analyze](#analyze) + - [Results](#results) + - [Appendix](#appendix) + - [Pod Affinity](#pod-affinity) + - [Converting Curl Output to a Graph](#converting-curl-output-to-a-graph) + + + +## Goals + +- Ensure that upgrading NFG doesn't lead to any loss of traffic flowing through the data plane. +- Ensure that after an upgrade, NGF can process changes to resources. +- Detect if any special instructions will be required to provide to users to perform + an upgrade. + +## Non-Goals + +During an upgrade, Kubernetes will shut down existing NGF pods by sending a SIGTERM. If the pod doesn't terminate in 30 +seconds (the default period) , Kubernetes will send a SIGKILL. + +When proxying Websocket or any long-lived connections, NGINX will not terminate until +that connection is closed by either the client or the backend. This means that unless all those connections are closed +by clients/backends before or during an upgrade (which is highly unlikely), NGINX will not terminate, which means +Kubernetes will kill NGINX. As a result, the clients will see the connections abruptly closed and thus experience +downtime. + +As a result, we *will not* use any long-live connections in this test, because NGF cannot support zero-downtime upgrades +in this case. + +## Test Environment + +- A Kubernetes cluster with 10 nodes on GKE + - Node: e2-medium (2 vCPU, 4GB memory) + - Enabled GKE logging. +- Tester VMs on Google Cloud Platform: + - Configuration: + - Debian + - Install packages: wrk, curl, gnuplot + - Location - same zone as the Kubernetes cluster. + - First VM for HTTP traffic + - Second VM - for sending HTTPs traffic +- NGF + - Deployment with 2 replicas scheduled on different nodes. + - Exposed via a Service with type LoadBalancer, private IP + - Gateway, two listeners - HTTP and HTTPs + - Two backends: + - Coffee - 3 replicas + - Tea - 3 replicas + - Two HTTPRoutes + - Coffee (HTTP) + - Tea (HTTPS) + +Notes: + +- For sending traffic, we will use both wrk and curl. + - *wrk* will generate a lot of traffic continuously, and it will have a high chance of catching of any + (however small) periods of downtime. + - *curl* will generate 1 request every 0.1s. While it might not catch small periods of downtime, it will + give us timeline of failed request for big periods of downtime, which wrk doesn't do. +- We use pod anti-affinity to tell Kubernetes to schedule NGF pods on different nodes. We also use a 10 node cluster so + that the chance of Kubernetes scheduling new pods on the same + nodes is minimal. Scheduling new pods on different nodes will help better catch + any interdependencies with an external load balancer (typically the node of a new pod will be added + to the pool in the load balancer, and the node of an old one will be removed). + +## Steps + +### Start + +1. Create a cluster. +2. Deploy a previous latest stable version with 2 replicas with added [anti-affinity](#pod-affinity). +3. Expose NGF via a Service Load Balancer, internal (only accessible within the Google Cloud region) by adding + `networking.gke.io/load-balancer-type: "Internal"` annotation to the Service. +4. Deploy backend apps: + + ```console + kubectl apply -f manifests/cafe.yaml + ``` + +5. Configure Gateway: + + ```console + kubectl apply -f manifests/cafe-secret.yaml + kubectl apply -f manifests/gateway.yaml + ``` + +6. Expose apps via HTTPRoutes + + ```console + kubectl apply -f manifests/cafe-routes.yaml + ``` + +7. Check statuses of the Gateway and HTTPRoutes for errors. +8. In Google Monitoring, check NGF and NGINX error logs for errors. +9. In Tester VMs, update `/etc/hosts` to have an entry with the External IP of the NGF Service (`10.128.0.10` in this + case): + + ```text + 10.128.0.10 cafe.example.com + ``` + +### Upgrade + +1. Follow the [upgrade instructions](/docs/installation.md#upgrade-nginx-gateway-fabric-from-manifests) to: + 1. Upgrade Gateway API version to the one that matches the supported version of new release. + 2. Upgrade NGF CRDs. +2. Start sending traffic using wrk from tester VMs for 1 minute: + - Tester VM 1: + - wrk: + + ```console + wrk -t2 -c100 -d60s --latency --timeout 2s http://cafe.example.com/coffee + ``` + + - curl: + + ```console + for i in `seq 1 600`; do printf "\nRequest $i\n" && date --rfc-3339=ns && curl -sS --connect-timeout 2 http://cafe.example.com/coffee 2>&1 && sleep 0.1s; done > results.txt + ``` + + - Tester VM 2: + - wrk: + + ```console + wrk -t2 -c100 -d60s --latency --timeout 2s https://cafe.example.com/tea + ``` + + - curl: + + ```console + for i in `seq 1 600`; do printf "\nRequest $i\n" && date --rfc-3339=ns && curl -k -sS --connect-timeout 2 https://cafe.example.com/tea 2>&1 && sleep 0.1s; done > results.txt + ``` + +3. **Immediately** upgrade NFG manifests by + following [upgrade instructions](/docs/installation.md#upgrade-nginx-gateway-fabric-from-manifests). + > Don't forget to modify the manifests to have 2 replicas and pod affinity. +4. Ensure the new pods are running and the old ones terminate. + +### After Upgrade + +1. Update the Gateway resource by adding one new listener `http-new`: + + ```console + kubectl apply -f manifests/gateway-updated.yaml + ``` + +2. Check that at NGF has a leader elected among the new pods: + + ```console + kubectl -n nginx-gateway logs | grep leader + ``` + +3. Ensure the status of the Gateway resource includes the new listener. + +### Analyze + +- Tester VMs: + - Analyze the output of wrk commands for errors and latencies. + - Create graphs from curl output (see [instructions](#converting-curl-output-to-a-graph) in Appendix) and check for + any failures on them. +- Check the old pods logs in Google Monitoring + - NGINX Access logs - we expect only 200 responses. + Google Monitoring query: + + ```text + severity=INFO + "GET" "HTTP/1.1" -"200" + ``` + + - NGINX Error logs - we expect no errors or warnings + Google Monitoring query: + + ```text + severity=ERROR + SEARCH("`[warn]`") OR SEARCH("`[error]`") + ``` + + - NGF logs - we expect no errors + - Specifically look at the NFG logs before it exited, to make sure all components shutdown correctly. +- Check the new pods (in Google Monitoring) + - NGINX Access logs - only 200 responses. + - NGINX Error logs - no errors or warnings. + - NGF logs - no errors + +## Results + +- [1.0.0](results/1.0.0/1.0.0.md) + +## Appendix + +### Pod Affinity + +- To ensure Kubernetes doesn't schedule NFG pods on the same nodes, use an anti-affinity rule: + + ```yaml + spec: + affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - topologyKey: kubernetes.io/hostname + labelSelector: + matchLabels: + app.kubernetes.io/name: nginx-gateway + ``` + +### Converting Curl Output to a Graph + +The output of a curl command is saved in `results.txt`. To convert it into a graph, +go through the following steps: + +1. Convert the output into a csv file: + + ```console + awk ' + /Request [0-9]+/ { + getline + datetime = $0 + getline + if ($1 == "curl:") { + print datetime ",0" # Failed + } else { + print datetime ",1" # Success + } + }' results.txt > results.csv + ``` + +2. Plot a graph using the csv file: + + ```console + gnuplot requests-plot.gp + ``` + + As a result, gnuplot will create `graph.png` with a graph. +3. Download the resulting `graph.png` to you local machine. +4. Also download `results.csv`. From 59d3d066698322f3e1e77137ed8177a16a0a2e7b Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 16 Oct 2023 13:08:21 -0400 Subject: [PATCH 3/7] NFG -> NFG --- tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md | 2 +- tests/zero-downtime-upgrades/zero-downtime-upgrades.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md index 4bc9e4cc9..95ad8d7d0 100644 --- a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -159,7 +159,7 @@ to the coffee app. #### Old pods - nginx-gateway-55cb958549-nbdqz - - NFG handled a panic before exiting: + - NGG handled a panic before exiting: ```text INFO 2023-10-13T22:02:33.321381968Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Stopping and waiting for caches", "ts":"2023-10-13T22:02:33Z"} diff --git a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md index 6bf695673..f4b62f12d 100644 --- a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md +++ b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md @@ -25,7 +25,7 @@ interruptions to the traffic they send to applications exposed via NGF. ## Goals -- Ensure that upgrading NFG doesn't lead to any loss of traffic flowing through the data plane. +- Ensure that upgrading NGF doesn't lead to any loss of traffic flowing through the data plane. - Ensure that after an upgrade, NGF can process changes to resources. - Detect if any special instructions will be required to provide to users to perform an upgrade. @@ -148,7 +148,7 @@ Notes: for i in `seq 1 600`; do printf "\nRequest $i\n" && date --rfc-3339=ns && curl -k -sS --connect-timeout 2 https://cafe.example.com/tea 2>&1 && sleep 0.1s; done > results.txt ``` -3. **Immediately** upgrade NFG manifests by +3. **Immediately** upgrade NGF manifests by following [upgrade instructions](/docs/installation.md#upgrade-nginx-gateway-fabric-from-manifests). > Don't forget to modify the manifests to have 2 replicas and pod affinity. 4. Ensure the new pods are running and the old ones terminate. @@ -193,7 +193,7 @@ Notes: ``` - NGF logs - we expect no errors - - Specifically look at the NFG logs before it exited, to make sure all components shutdown correctly. + - Specifically look at the NGF logs before it exited, to make sure all components shutdown correctly. - Check the new pods (in Google Monitoring) - NGINX Access logs - only 200 responses. - NGINX Error logs - no errors or warnings. @@ -207,7 +207,7 @@ Notes: ### Pod Affinity -- To ensure Kubernetes doesn't schedule NFG pods on the same nodes, use an anti-affinity rule: +- To ensure Kubernetes doesn't schedule NGF pods on the same nodes, use an anti-affinity rule: ```yaml spec: From cd798d65dbf8c2112454f3dbd5115e1ed53c54b8 Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 16 Oct 2023 13:13:02 -0400 Subject: [PATCH 4/7] pod -> Pod --- .../results/1.0.0/1.0.0.md | 18 +++++++-------- .../zero-downtime-upgrades.md | 22 +++++++++---------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md index 95ad8d7d0..69d010786 100644 --- a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -7,8 +7,8 @@ - [Upgrades](#upgrades) - [Analyze](#analyze) - [Tester VMs](#tester-vms) - - [Old pods](#old-pods) - - [New pods](#new-pods) + - [Old Pods](#old-pods) + - [New Pods](#new-pods) - [Opened Issues](#opened-issues) - [Future Improvements](#future-improvements) @@ -75,16 +75,16 @@ Logs check: ## Upgrades -New pods: +New Pods: ```text nginx-gateway-578b49bc58-hmx5x 2/2 Running 0 11s 10.112.1.9 gke-michael-2-default-pool-18ad0f59-7dqr nginx-gateway-578b49bc58-r4ckb 2/2 Running 0 17s 10.112.5.26 gke-michael-2-default-pool-18ad0f59-l0cq ``` -Note: the new pods were scheduled on different from the old pods nodes, as we wanted. +Note: the new Pods were scheduled on different from the old Pods nodes, as we wanted. -Check that one of the NGF pods became the leader: +Check that one of the NGF Pods became the leader: ```text I1013 22:02:32.226414 7 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election... @@ -156,7 +156,7 @@ Tester 2 graph: As we can see, there is period where curl failed to send requests to the coffee app. -#### Old pods +#### Old Pods - nginx-gateway-55cb958549-nbdqz - NGG handled a panic before exiting: @@ -196,7 +196,7 @@ to the coffee app. - Access logs - only 200 responses. - Error logs - no errors or warnings. -#### New pods +#### New Pods - nginx-gateway-578b49bc58-hmx5x - NGF - no errors @@ -217,7 +217,7 @@ to the coffee app. - nginx-gateway-578b49bc58-r4ckb - NGF - no errors. - NGINX - - Access logs - 35 responses similar to the first pod. Same conclusion as above. + - Access logs - 35 responses similar to the first Pod. Same conclusion as above. - Error logs - No errors or warnings. ## Opened Issues @@ -228,4 +228,4 @@ to the coffee app. - Use helm for upgrade, to catch any helm-related bugs preventing an upgrade. In this test, we didn't use helm because 0.6.0 release does not allow you to configure the number of replicas - and pod affinity. + and Pod affinity. diff --git a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md index f4b62f12d..e2edd9afa 100644 --- a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md +++ b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md @@ -32,7 +32,7 @@ interruptions to the traffic they send to applications exposed via NGF. ## Non-Goals -During an upgrade, Kubernetes will shut down existing NGF pods by sending a SIGTERM. If the pod doesn't terminate in 30 +During an upgrade, Kubernetes will shut down existing NGF Pods by sending a SIGTERM. If the Pod doesn't terminate in 30 seconds (the default period) , Kubernetes will send a SIGKILL. When proxying Websocket or any long-lived connections, NGINX will not terminate until @@ -74,10 +74,10 @@ Notes: (however small) periods of downtime. - *curl* will generate 1 request every 0.1s. While it might not catch small periods of downtime, it will give us timeline of failed request for big periods of downtime, which wrk doesn't do. -- We use pod anti-affinity to tell Kubernetes to schedule NGF pods on different nodes. We also use a 10 node cluster so - that the chance of Kubernetes scheduling new pods on the same - nodes is minimal. Scheduling new pods on different nodes will help better catch - any interdependencies with an external load balancer (typically the node of a new pod will be added +- We use Pod anti-affinity to tell Kubernetes to schedule NGF Pods on different nodes. We also use a 10 node cluster so + that the chance of Kubernetes scheduling new Pods on the same + nodes is minimal. Scheduling new Pods on different nodes will help better catch + any interdependencies with an external load balancer (typically the node of a new Pod will be added to the pool in the load balancer, and the node of an old one will be removed). ## Steps @@ -150,8 +150,8 @@ Notes: 3. **Immediately** upgrade NGF manifests by following [upgrade instructions](/docs/installation.md#upgrade-nginx-gateway-fabric-from-manifests). - > Don't forget to modify the manifests to have 2 replicas and pod affinity. -4. Ensure the new pods are running and the old ones terminate. + > Don't forget to modify the manifests to have 2 replicas and Pod affinity. +4. Ensure the new Pods are running and the old ones terminate. ### After Upgrade @@ -161,7 +161,7 @@ Notes: kubectl apply -f manifests/gateway-updated.yaml ``` -2. Check that at NGF has a leader elected among the new pods: +2. Check that at NGF has a leader elected among the new Pods: ```console kubectl -n nginx-gateway logs | grep leader @@ -175,7 +175,7 @@ Notes: - Analyze the output of wrk commands for errors and latencies. - Create graphs from curl output (see [instructions](#converting-curl-output-to-a-graph) in Appendix) and check for any failures on them. -- Check the old pods logs in Google Monitoring +- Check the old Pods logs in Google Monitoring - NGINX Access logs - we expect only 200 responses. Google Monitoring query: @@ -194,7 +194,7 @@ Notes: - NGF logs - we expect no errors - Specifically look at the NGF logs before it exited, to make sure all components shutdown correctly. -- Check the new pods (in Google Monitoring) +- Check the new Pods (in Google Monitoring) - NGINX Access logs - only 200 responses. - NGINX Error logs - no errors or warnings. - NGF logs - no errors @@ -207,7 +207,7 @@ Notes: ### Pod Affinity -- To ensure Kubernetes doesn't schedule NGF pods on the same nodes, use an anti-affinity rule: +- To ensure Kubernetes doesn't schedule NGF Pods on the same nodes, use an anti-affinity rule: ```yaml spec: From 00567b57fa2f6d6ab10a08d01ab685c5caad8399 Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 16 Oct 2023 13:16:57 -0400 Subject: [PATCH 5/7] Improve leader election check --- tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md | 6 +++--- tests/zero-downtime-upgrades/zero-downtime-upgrades.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md index 69d010786..b974e755f 100644 --- a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -87,9 +87,9 @@ Note: the new Pods were scheduled on different from the old Pods nodes, as we wa Check that one of the NGF Pods became the leader: ```text -I1013 22:02:32.226414 7 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election... -I1013 22:02:32.248857 7 leaderelection.go:260] successfully acquired lease nginx-gateway/nginx-gateway-leader-election -{"level":"info","ts":"2023-10-13T22:02:32Z","logger":"leaderElector","msg":"Started leading"} +kubectl -n nginx-gateway get lease +NAME HOLDER AGE +nginx-gateway-leader-election nginx-gateway-578b49bc58-r4ckb 1m ``` Pod nginx-gateway-578b49bc58-r4ckb is the leader. diff --git a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md index e2edd9afa..06c7bada3 100644 --- a/tests/zero-downtime-upgrades/zero-downtime-upgrades.md +++ b/tests/zero-downtime-upgrades/zero-downtime-upgrades.md @@ -164,7 +164,7 @@ Notes: 2. Check that at NGF has a leader elected among the new Pods: ```console - kubectl -n nginx-gateway logs | grep leader + kubectl -n nginx-gateway get lease ``` 3. Ensure the status of the Gateway resource includes the new listener. From b475b35e0ec59ccbf7f20007738150fe6612e973 Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 16 Oct 2023 13:18:20 -0400 Subject: [PATCH 6/7] Add a link to the opened issue --- tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md index b974e755f..d9a736d8d 100644 --- a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -222,7 +222,7 @@ to the coffee app. ## Opened Issues -- Traffic loss during an upgrade. (to be opened soon) +- Clients experience downtime during NGF upgrade -- https://github.com/nginxinc/nginx-gateway-fabric/issues/1143 ## Future Improvements From 314d4c90d91f20c8887a95b31df73f7009f28749 Mon Sep 17 00:00:00 2001 From: Michael Pleshakov Date: Mon, 16 Oct 2023 13:21:49 -0400 Subject: [PATCH 7/7] Fix linting --- tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md index d9a736d8d..f61124703 100644 --- a/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md +++ b/tests/zero-downtime-upgrades/results/1.0.0/1.0.0.md @@ -89,7 +89,7 @@ Check that one of the NGF Pods became the leader: ```text kubectl -n nginx-gateway get lease NAME HOLDER AGE -nginx-gateway-leader-election nginx-gateway-578b49bc58-r4ckb 1m +nginx-gateway-leader-election nginx-gateway-578b49bc58-r4ckb 1m ``` Pod nginx-gateway-578b49bc58-r4ckb is the leader.