Merge branch 'main' into chore/release-process-netlify

nginxinc · Dec 8, 2023 · 619185f · 619185f
2 parents 8aae6d5 + b77d74b
commit 619185f
Show file tree

Hide file tree

Showing 4 changed files with 114 additions and 16 deletions.
diff --git a/tests/reconfig/results/1.0.0/1.0.0.md b/tests/reconfig/results/1.0.0/1.0.0.md
@@ -56,7 +56,7 @@ NGF deployment:
 ## NumResources -> Total Resources
 
 | NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources |
-| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- |
+|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|-----------------|
 | x            | 1        | 1       | 1               | x+1        | 2x               | 2x                   | 3x         | <total>         |
 | 30           | 1        | 1       | 1               | 31         | 60               | 60                   | 90         | 244             |
 | 150          | 1        | 1       | 1               | 151        | 300              | 300                  | 450        | 1204            |

diff --git a/tests/reconfig/results/1.1.0/1.1.0.md b/tests/reconfig/results/1.1.0/1.1.0.md
@@ -0,0 +1,92 @@
+# Reconfiguration testing Results
+
+<!-- TOC -->
+- [Reconfiguration testing Results](#reconfiguration-testing-results)
+  - [Summary](#summary)
+  - [Test environment](#test-environment)
+  - [Results Tables](#results-tables)
+    - [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready)
+    - [Event Batch Processing](#event-batch-processing)
+  - [NumResources to Total Resources](#numresources-to-total-resources)
+  - [Observations](#observations)
+  - [Future Improvements](#future-improvements)
+<!-- TOC -->
+
+## Summary
+
+- Better reload times across all tests
+- Similar TimeToReadyTotal and TimeToReadyAveSingle times
+- Similar event batch totals
+- Slightly better event batch processing average times
+- No new errors or issues
+
+## Test environment
+
+GKE cluster:
+
+- Node count: 4
+- Instance Type: n2d-standard-2
+- k8s version: 1.27.3-gke.100
+- Zone: us-west2-a
+- Total vCPUs: 8
+- Total RAM: 32GB
+- Max pods per node: 110
+
+NGF deployment:
+
+- NGF version: edge - git commit 3cab370a46bccd55c115c16e23a475df2497a3d2
+- NGINX Version: 1.25.3
+
+## Results Tables
+
+### NGINX Reloads and Time to Ready
+
+| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms |
+|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------|
+| 1           | 30           | 1.5                  | <1                       | 2             | 158.5                      | 100%     | 100%      |
+| 1           | 150          | 3.5                  | 1                        | 2             | 272.5                      | 100%     | 100%      |
+| 2           | 30           | 34                   | <1                       | 93            | 136                        | 100%     | 100%      |
+| 2           | 150          | 176.5                | <1                       | 451           | 203.98                     | 100%     | 100%      |
+| 3           | 30           | <1                   | 1                        | 93            | 125.7                      | 100%     | 100%      |
+| 3           | 150          | 1                    | 1                        | 453           | 126.71                     | 100%     | 100%      |
+
+
+### Event Batch Processing
+
+| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms |
+|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------|
+| 1           | 30           | 70                | 5.12                                 | 100%     | 100%      | 100%      | 100%       | 100%       |
+| 1           | 150          | 309               | 2.14                                 | 100%     | 100%      | 100%      | 100%       | 100%       |
+| 2           | 30           | 442               | 35.4                                 | 100%     | 100%      | 100%      | 100%       | 100%       |
+| 2           | 150          | 2009              | 54.76                                | 100%     | 100%      | 100%      | 100%       | 100%       |
+| 3           | 30           | 373               | 35.72                                | 99.73%   | 99.73%    | 100%      | 100%       | 100%       |
+| 3           | 150          | 1813              | 39.46                                | 99.94%   | 99.94%    | 99.94%    | 99.94%     | 100%       |
+
+> Note: The outlier for test #3 is the event batch that contains the Gateway. It took ~13s to process.
+
+## NumResources to Total Resources
+
+| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Attached HTTPRoutes | Total Resources |
+|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|---------------------|-----------------|
+| x            | 1        | 1       | 1               | x+1        | 2x               | 2x                   | 3x         | 2x                  | <total>         |
+| 30           | 1        | 1       | 1               | 31         | 60               | 60                   | 90         | 60                  | 244             |
+| 150          | 1        | 1       | 1               | 151        | 300              | 300                  | 450        | 300                 | 1204            |
+
+> Note: Only 2x HTTPRoutes attach to the Gateway because the parentRef name in the `cafe-tls-redirect` HTTPRoute is incorrect. This will be fixed in the next release.
+
+## Observations
+
+1. The following issues still exist:
+
+   - https://github.com/nginxinc/nginx-gateway-fabric/issues/1124
+   - https://github.com/nginxinc/nginx-gateway-fabric/issues/1123
+
+2. All NGINX reloads were in the <= 500ms bucket. An increase in the reload time based on number of configured resources resulting in NGINX configuration changes was observed.
+
+3. No errors (NGF or NGINX) were observed in any test run.
+
+4. The majority of the event batches were processed in 500ms or less except the 3rd test. In the 3rd test, we create the Gateway resource after all the apps and routes. The batch that contains the Gateway is the only one that takes longer than 500ms. It takes ~13s.
+
+## Future Improvements
+
+1. Fix the parentRef name in the `cafe-tls-redirect` [HTTPRoute](/tests/reconfig/scripts/cafe-routes.yaml), so it matches the deployed Gateway.
diff --git a/tests/reconfig/scripts/delete-multiple.sh b/tests/reconfig/scripts/delete-multiple.sh
@@ -3,11 +3,13 @@
 num_namespaces=$1
 
 # Delete namespaces
+namespaces=""
 for ((i=1; i<=$num_namespaces; i++)); do
-    namespace_name="namespace$i"
-    kubectl delete namespace "$namespace_name"
+    namespaces+="namespace$i "
 done
 
+kubectl delete namespace $namespaces
+
 # Delete single instance resources
 kubectl delete -f gateway.yaml
 kubectl delete -f reference-grant.yaml

diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md
@@ -26,7 +26,7 @@
 
  The following cluster will be sufficient:
 
-- A Kubernetes cluster with 3 nodes on GKE
+- A Kubernetes cluster with 4 nodes on GKE
   - Node: e2-medium (2 vCPU, 4GB memory)
 
 ## Setup
@@ -43,7 +43,7 @@
 
    ```console
    helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric  --version 0.0.0-edge \
-      --create-namespace --wait -n nginx-gateway
+      --create-namespace --wait -n nginx-gateway --set nginxGateway.config.logging.level=debug
    ```
 
 4. Run tests:
@@ -58,13 +58,17 @@
       - Note: Clean up after each test run for isolated results. There's a script provided for removing all the test
         fixtures `scripts/delete-multiple.sh` which takes a number (needs to be the same number as what was used in the
         create script.)
-5. After each individual test run, grab logs of both NGF containers and grab metrics.
-   Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`:
-
-   ```console
-   GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
-   kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
-   ```
+5. After each individual test:
+    - Describe the Gateway resource and make sure the status is correct.
+    - Check the logs of both NGF containers for errors.
+    - Parse the logs for TimeToReady numbers (see steps 6-7 below).
+    - Grab metrics.
+      Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`:
+
+       ```console
+       GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
+       kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
+       ```
 
 6. Measure NGINX Reloads and Time to Ready Results
    1. TimeToReadyTotal as described in each test - NGF logs.
@@ -75,11 +79,11 @@
       1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum`
          metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value.
 7. Measure Event Batch Processing Results
-   1. Event Batch Total - metrics.
+   1. Event Batch Total - `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric.
    2. Average Event Batch Processing duration - metrics.
-      1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum`
+      1. The average event batch processing duration can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum`
          metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value.
-8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers.
+8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomalies or outliers.
 
 ## Tests
 
@@ -90,7 +94,7 @@
       e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait
       60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes.
    2. Deploy NGF
-   3. Measure TimeToReadyTotal as the time it takes from start-up -> config written and
+   3. Measure TimeToReadyTotal as the time it takes from start-up -> final config written and
       NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section.
 
 ### Test 2: Start NGF, deploy Gateway, create many resources attached to GW