Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add event batch processing results and rerun reconfig test (#1186) #1188

Merged
merged 1 commit into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions tests/reconfig/results/1.0.0/1.0.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Reconfiguration testing Results

<!-- TOC -->
- [Reconfiguration testing Results](#reconfiguration-testing-results)
- [Test environment](#test-environment)
- [Results Tables](#results-tables)
- [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready)
- [Event Batch Processing](#event-batch-processing)
- [NumResources -> Total Resources](#numresources---total-resources)
- [Observations](#observations)
<!-- TOC -->

## Test environment

GKE cluster:

- Node count: 3
- Instance Type: e2-medium
- k8s version: 1.27.3-gke.100
- Zone: us-central1-c
- Total vCPUs: 6
- Total RAM: 12GB
- Max pods per node: 110

NGF deployment:

- NGF version: edge - git commit 29b45e38bacd7c4f22834938105e3cda4f29f6d1
- NGINX Version: 1.25.2

## Results Tables

### NGINX Reloads and Time to Ready

| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms |
|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------|
| 1 | 30 | 1 | 1 | 2 | 191 | 100% | 100% |
| 1 | 150 | 2 | 2 | 2 | 440 | 50% | 100% |
| 2 | 30 | 50 | <1 | 93 | 162 | 100% | 100% |
| 2 | 150 | 208 | <1 | 396 | 281 | 96.46% | 100% |
| 3 | 30 | 1 | 1 | 93 | 129 | 100% | 100% |
| 3 | 150 | 1 | 1 | 453 | 130 | 100% | 100% |


### Event Batch Processing

| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms |
|-------------|--------------|-------------------|--------------------------------------|----------|-----------|
| 1 | 30 | 69 | 6.232 | 100% | 100% |
| 1 | 150 | 309 | 3.638 | 99.68% | 100% |
| 2 | 30 | 465 | 38.759 | 100% | 100% |
| 2 | 150 | 1941 | 68.539 | 98.51% | 100% |
| 3 | 30 | 374 | 36.834 | 99.73% | 99.73% |
| 3 | 150 | 1812 | 40.411 | 99.94% | 99.94% |


## NumResources -> Total Resources
| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources |
| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- |
| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | <total> |
| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 |
| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 |

## Observations

1. We are reloading after reconciling a ReferenceGrant even when there is no Gateway. This is because we treat every
upsert/delete of a ReferenceGrant as a change. This means we will regenerate NGINX config every time a ReferenceGrant
is created, updated (generation must change), or deleted, even if it does not apply to the accepted Gateway.

Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1124

2. We are reloading after reconciling a HTTPRoute even when there is no accepted Gateway and no config being generated.

Issue filed: https://github.com/nginxinc/nginx-gateway-fabric/issues/1123

3. Majority of NGINX reloads were in the <= 500ms bucket, with all of them being in the <= 1000ms bucket. An increase
in the reload time based on number of configured resources resulting in NGINX configuration changes was observed.

4. No errors (NGF or NGINX) were observed in any test run.
61 changes: 0 additions & 61 deletions tests/reconfig/results/v1.0.0.md

This file was deleted.

38 changes: 24 additions & 14 deletions tests/reconfig/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@

## Goals

- Measure how long it takes NGF to reconfigure NGINX when a number of Gateway API and referenced core Kubernetes
resources are created at once.
- Measure how long it takes NGF to reconfigure NGINX and update statuses when a number of Gateway API and
referenced core Kubernetes resources are created at once.
- Two runs of each test should be ran with differing numbers of resources. Each run will deploy:
- a single Gateway, Secret, and ReferenceGrant resources
- `x+1` number of namespaces
Expand All @@ -38,7 +38,8 @@
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.8.1/standard-install.yaml
```

3. Deploy NGF from edge using Helm install (NOTE: For Test 1, deploy AFTER resources):
3. Deploy NGF from edge using Helm install and wait for LoadBalancer Service to be ready
(NOTE: For Test 1, deploy AFTER resources):

```console
helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \
Expand All @@ -65,10 +66,20 @@
kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
```

6. Measure Time To Ready as described in each test, get the reload count, and get the average NGINX reload duration.
The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum`
metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value.
7. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers.
6. Measure NGINX Reloads and Time to Ready Results
1. TimeToReadyTotal as described in each test - NGF logs.
2. TimeToReadyAvgSingle which is the average time between updating any resource and the
NGINX configuration being reloaded - NGF logs.
3. NGINX Reload count - metrics.
4. Average NGINX reload duration - metrics.
1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum`
metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value.
7. Measure Event Batch Processing Results
1. Event Batch Total - metrics.
2. Average Event Batch Processing duration - metrics.
1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum`
metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value.
8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers.

## Tests

Expand All @@ -79,8 +90,8 @@
e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait
60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes.
2. Deploy NGF
3. Check logs for time it takes from start-up -> config written and NGINX reloaded. Get reload count and average reload
duration from metrics and logs.
3. Measure TimeToReadyTotal as the time it takes from start-up -> config written and
NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section.

### Test 2: Start NGF, deploy Gateway, create many resources attached to GW

Expand All @@ -89,9 +100,8 @@
2. Run the provided script with the required number of resources,
e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services,
wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time.
3. Check logs for time it takes from NGF receiving first resource update -> final config written, and NGINX's final
reload. Check logs for average individual HTTPRoute TTR also. Get reload count and average reload duration from
metrics and logs.
3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update -> final
config written and NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section.

### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway

Expand All @@ -101,5 +111,5 @@
e.g. `cd scripts && bash create-resources-gw-last.sh 30`.
The script will deploy the namespaces, backend apps and services, 1 Secret, 1 ReferenceGrant, and the HTTPRoutes;
wait 60 seconds for the backend apps to be ready, and then deploy 1 Gateway for all HTTPRoutes.
3. Check logs for time it takes from NGF receiving gateway resource -> config written and NGINX reloaded. Get reload
count and average reload duration from metrics and logs.
3. Measure TimeToReadyTotal as the time it takes from NGF receiving gateway resource -> config written and NGINX reloaded.
Measure the other results as described in steps 6-7 of the [Setup](#setup) section.