diff --git a/tests/reconfig/results/v1.0.0.md b/tests/reconfig/results/v1.0.0.md index 65d915305d..b6493cbaf2 100644 --- a/tests/reconfig/results/v1.0.0.md +++ b/tests/reconfig/results/v1.0.0.md @@ -19,23 +19,34 @@ NGF deployment: ## Results Table -| Test number | NumResources | TimeToReadyTotal | TimeToReadyAvgSingle | NGINX reloads (total) | NGINX reload avg time (ms) | -| ----------- | ------------ | ---------------- | -------------------- | --------------------- | -------------------------- | -| 1 | 30 | 5 | 5 | 1 (2) | 166 | -| 1 | 150 | 7 | 7 | 1 (2) | 353 | -| 2 | 30 | 21 | <1 | 29 (30) | 142 | -| 2 | 150 | 123 | <1 | 45 (46) | 190 | -| 3 | 30 | <1 | <1 | 92 (93) | 137 | -| 3 | 150 | 1 | 1 | 452 (453) | 127 | +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | +| ----------- | ------------ | -------------------- | ------------------------ | ------------- | -------------------------- | +| 1 | 30 | 5 | 5 | 2 | 166 | +| 1 | 150 | 7 | 7 | 2 | 353 | +| 2 | 30 | 21 | <1 | 30 | 142 | +| 2 | 150 | 123 | <1 | 46 | 190 | +| 3 | 30 | <1 | <1 | 93 | 137 | +| 3 | 150 | 1 | 1 | 453 | 127 | + +## NumResources -> Total Resources +| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources | +| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- | +| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | | +| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 | +| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 | ## Observations 1. We are reloading after reconciling a ReferenceGrant even when there is no Gateway. This is because we treat every - upsert/delete of a ReferenceGrant as a change. This means we will regenerate nginx config every time a ReferenceGrant + upsert/delete of a ReferenceGrant as a change. This means we will regenerate NGINX config every time a ReferenceGrant is created, updated (generation must change), or deleted, even if it does not apply to the accepted Gateway. + Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1124 + 2. We are reloading after reconciling a HTTPRoute even when there is no accepted Gateway and no config being generated. + Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123 + 3. All reloads were in the <500ms bucket. A slight increase in the reload time based on number of configured resources resulting in NGINX configuration changes was observed. diff --git a/tests/reconfig/setup.md b/tests/reconfig/setup.md index d2705e0ad3..684e46b051 100644 --- a/tests/reconfig/setup.md +++ b/tests/reconfig/setup.md @@ -1,5 +1,23 @@ # Reconfig tests +## Goals + +- Measure how long it takes NGF to reconfigure NGINX when a number of Gateway API and referenced core Kubernetes + resources are created at once. +- Two runs of each test should be ran with differing numbers of resources. Each run will deploy: + - a single Gateway, Secret, and ReferenceGrant resources + - `x+1` number of namespaces + - `2x` number of backend apps and services + - `3x` number of HTTPRoutes. +- Where x=30 OR x=150. + +## Test Environment + + The following cluster will be sufficient: + +- A Kubernetes cluster with 3 nodes on GKE + - Node: e2-medium (2 vCPU, 4GB memory) + ## Setup 1. Create cloud cluster @@ -16,8 +34,7 @@ --create-namespace --wait -n nginx-gateway ``` -4. Optional: Add pod scrape if running in GKE (see [GKE Pod scrape config](#gke-pod-scrape-config)). -5. Run tests: +4. Run tests: 1. There are 3 versions of the reconfiguration tests that need to be ran, with a low and high number of resources. Therefore, a full test suite includes 6 test runs. 2. There are scripts to generate the required resources and config changes. @@ -29,7 +46,7 @@ - Note: Clean up after each test run for isolated results. There's a script provided for removing all the test fixtures `scripts/delete-multiple.sh` which takes a number (needs to be the same number as what was used in the create script.) -6. After each individual test run, grab logs of both NGF containers and grab metrics. +5. After each individual test run, grab logs of both NGF containers and grab metrics. Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`: ```bash @@ -37,8 +54,10 @@ kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 & ``` -7. Measure Time To Ready by as described in each test, get the reload count, and get the average NGINX reload duration. -8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies/ outliers. +6. Measure Time To Ready as described in each test, get the reload count, and get the average NGINX reload duration. + The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum` + metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value. +7. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers. ## Tests @@ -48,49 +67,28 @@ 1. Use either of the provided scripts with the required number of resources, e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes. - 2. Deploy NFG - 3. Check logs for time takes from start-up -> config written and NGINX reloaded. Get reload count and average reload + 2. Deploy NGF + 3. Check logs for time it takes from start-up -> config written and NGINX reloaded. Get reload count and average reload duration from metrics and logs. ### Test 2: Start NGF, deploy Gateway, create many resources attached to GW -1. Deploy all Gateway resources, NFG running: - 1. Deploy NFG +1. Deploy all Gateway resources, NGF running: + 1. Deploy NGF 2. Run the provided script with the required number of resources, e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services, wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time. - 3. Check logs for time takes from NFG receiving first resource update -> final config written, and NGINX's final + 3. Check logs for time it takes from NGF receiving first resource update -> final config written, and NGINX's final reload. Check logs for average individual HTTPRoute TTR also. Get reload count and average reload duration from metrics and logs. ### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway -1. Deploy HTTPRoute resources, NFG running, Gateway last: - 1. Deploy NFG +1. Deploy HTTPRoute resources, NGF running, Gateway last: + 1. Deploy NGF 2. Run the provided script with the required number of resources, e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy the namespaces, backend apps and services, 1 Secret, 1 ReferenceGrant, and the HTTPRoutes; wait 60 seconds for the backend apps to be ready, and then deploy 1 Gateway for all HTTPRoutes. - 3. Check logs for time takes from NFG receiving gateway resource -> config written and NGINX reloaded. Get reload + 3. Check logs for time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. Get reload count and average reload duration from metrics and logs. - -## GKE Pod scrape config - -To create a Pod scrape config, you can run the following: - - ```bash - cat <