Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 1.3 reconfig test results #2114

Merged
merged 1 commit into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions tests/reconfig/results/1.3.0/1.3.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Reconfiguration testing Results

<!-- TOC -->
- [Reconfiguration testing Results](#reconfiguration-testing-results)
- [Summary](#summary)
- [Test environment](#test-environment)
- [Results Tables](#results-tables)
- [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready)
- [Event Batch Processing](#event-batch-processing)
- [NumResources to Total Resources](#numresources-to-total-resources)
- [Observations](#observations)
- [Future Improvements](#future-improvements)
<!-- TOC -->

## Summary

- Due to fix https://github.com/nginxinc/nginx-gateway-fabric/issues/1107, time to ready, reload time, and event batch processing
time increased for all 150 resource tests.
- For all 30 resource tests, results were mostly consistent to prior results.

## Test environment

GKE cluster:

- Node count: 3
- Instance Type: e2-medium
- k8s version: 1.28.9-gke.1000000
- Zone: us-central1-c
- Total vCPUs: 6
- Total RAM: 12GB
- Max pods per node: 110

NGF deployment:

- NGF version: edge - git commit 7c9bf23ed89861c9ce7b725f2c1686f4c24ef2f9
- NGINX OSS Version: 1.27.0
- NGINX Plus Version: R32

## Results Tables

### NGINX Reloads and Time to Ready

#### OSS

| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms |
|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------|
| 1 | 30 | 2 | <1 | 2 | 190 | 100% | 100% |
| 1 | 150 | 2 | <1 | 2 | 542 | 50% | 100% |
| 2 | 30 | 37 | <1 | 94 | 169 | 100% | 100% |
| 2 | 150 | 204 | <1 | 387 | 326 | 88% | 100% |
| 3 | 30 | <1 | <1 | 94 | 129 | 100% | 100% |
| 3 | 150 | <1 | <1 | 454 | 130 | 100% | 100% |

#### Plus

| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms |
|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------|
| 1 | 30 | 1 | <1 | 2 | 220.5 | 100% | 100% |
| 1 | 150 | 1.5 | <1 | 2 | 528.5 | 50% | 100% |
| 2 | 30 | 41 | <1 | 94 | 176.8 | 100% | 100% |
| 2 | 150 | 199 | <1 | 391 | 320.56 | 94.1% | 100% |
| 3 | 30 | <1 | <1 | 94 | 128.5 | 100% | 100% |
| 3 | 150 | <1 | <1 | 454 | 129.2 | 100% | 100% |

### Event Batch Processing

#### OSS

| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms |
|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------|
| 1 | 30 | 5 | 726.6 | 80% | 80% | 100% | 100% | 100% |
| 1 | 150 | 5 | 4457 | 40% | 80% | 80% | 80% | 100% |
| 2 | 30 | 371 | 59.5 | 99.7% | 100% | 100% | 100% | 100% |
| 2 | 150 | 1742 | 93.5 | 92.9% | 99.99% | 100% | 100% | 100% |
| 3 | 30 | 370 | 43.9 | 99.85% | 99.85% | 100% | 100% | 100% |
| 3 | 150 | 1810 | 44.8 | 99.99% | 99.99% | 99.99% | 100% | 100% |

#### Plus

| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms |
|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|--------------|
| 1 | 30 | 6 | 84 | 100% | 100% | 100% | 100% | 100% |
| 1 | 150 | 5 | 4544.3 | 40% | 80% | 80% | 80% | 100% |
| 2 | 30 | 370 | 59.1 | 100% | 100% | 100% | 100% | 100% |
| 2 | 150 | 1747 | 93.2 | 94.1% | 99.99% | 100% | 100% | 100% |
| 3 | 30 | 370 | 41.33 | 99.99% | 99.99% | 100% | 100% | 100% |
| 3 | 150 | 1809 | 44.88 | 99.99% | 99.99% | 99.99% | 99.99% | 100% |

## NumResources to Total Resources

| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources |
|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|-----------------|
| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | <total> |
| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 |
| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 |

## Observations

1. Reload time and time to ready have increased in 150 resource tests. This is probably due, in part, to the fix of https://github.com/nginxinc/nginx-gateway-fabric/issues/1107 causing the prior
kate-osborn marked this conversation as resolved.
Show resolved Hide resolved
test to only attach 2x of the HTTPRoutes while this test attaches all of them. In the 30 resource tests, results were mostly consistent to prior results.

2. Event batch processing time increased notably in the 150 resource tests, probably for the same reason mentioned in observation #1.
In the 30 resource tests, results were mostly consistent to prior results.

3. No errors in the logs.


## Future Improvements

None.
6 changes: 3 additions & 3 deletions tests/reconfig/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The following cluster will be sufficient:

```console
helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \
--create-namespace --wait -n nginx-gateway --set nginxGateway.config.logging.level=debug
--create-namespace --wait -n nginx-gateway --set nginxGateway.productTelemetry.enable=false
```

4. Run tests:
Expand All @@ -67,7 +67,7 @@ The following cluster will be sufficient:
Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`:

```console
GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
GW_POD=$(kubectl get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
```

Expand Down Expand Up @@ -105,7 +105,7 @@ The following cluster will be sufficient:
2. Run the provided script with the required number of resources,
e.g. `cd scripts && bash create-resources-routes-last.sh 30`. The script will deploy backend apps and services,
wait 60 seconds for them to be ready, and deploy 1 Gateway, 1 Secret, 1 RefGrant, and HTTPRoutes at the same time.
3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update -> final
3. Measure TimeToReadyTotal as the time it takes from NGF receiving the first HTTPRoute resource update (logs will say "reconciling") -> final
config written and NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section.

### Test 3: Start NGF, create many resources attached to a Gateway, deploy the Gateway
Expand Down
Loading