Skip to content

Commit 72a716f

Browse files
committed
Add back Webhook validation by NGF
1 parent 4a5d428 commit 72a716f

File tree

2 files changed

+72
-6
lines changed

2 files changed

+72
-6
lines changed

design/resource-validation.md

+39-2
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,36 @@ Design a validation mechanism for Gateway API resources.
9191

9292
## Design
9393

94+
We will introduce two validation methods to be run by NGF control plane:
95+
96+
1. Re-run of the Gateway API webhook validation
97+
2. NGF-specific field validation
98+
99+
### Re-run of Webhook Validation
100+
101+
Before processing a resource, NGF will validate it using the functions from
102+
the [validation package](https://github.com/kubernetes-sigs/gateway-api/tree/b241afc88e68c952cc0a59a5c72a51358dc2bada/apis/v1beta1/validation)
103+
from the Gateway API. This will ensure that the webhook validation cannot be bypassed (it can be bypassed if the webhook
104+
is not installed, misconfigured, or running a different version), and it will allow us to avoid repeating the same
105+
validation in our code.
106+
107+
If a resource is invalid:
108+
109+
- NGF will not process it -- it will treat it as if the resource didn't exist. This also means that if the resource was
110+
updated from a valid to an invalid state, NGF will also ignore any previous valid state. For example, it will remove
111+
the generation configuration for an HTTPRoute resource.
112+
- NGF will report the validation error as a
113+
Warning [Event](https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/)
114+
for that resource. The Event message will describe the error and explain that the resource was ignored. We chose to
115+
report an Event instead of updating the status, because to update the status, NGF first needs to look inside the
116+
resource to determine whether it belongs to it or not. However, since the webhook validation applies to all parts of
117+
the spec of resource, it means NGF has to look inside the invalid resource and parse potentially invalid parts. To
118+
avoid that, NGF will report an Event. The owner of the resource will be able to see the Event.
119+
- NGF will also report the validation error in the NGF logs.
120+
94121
### NGF-specific validation
95122

96-
NGF will run NGF-specific validation written in go.
123+
After re-running the webhook validation, NGF will run NGF-specific validation written in go.
97124

98125
NGF-specific validation will:
99126

@@ -105,7 +132,7 @@ NGF-specific validation will not include:
105132

106133
- *All* validation done by CRDs. NGF will only repeat the validation that addresses (1) and (2) in the list above with
107134
extra rules required by NGINX but missing in the CRDs. For example, NGF will not ensure the limits of field values.
108-
- The validation done by the webhook.
135+
- The validation done by the webhook (because it is done in the previous step).
109136

110137
If a resource is invalid, NGF will report the error in its status.
111138

@@ -119,6 +146,7 @@ following methods in order of their appearance in the table.
119146
| Name | Type | Component | Scope | Feedback loop for errors | Can be bypassed? |
120147
|------------------------------|----------------------------|-----------------------|-------------------------|----------------------------------------------------------------------------------|--------------------------------|
121148
| CRD validation | OpenAPI and CEL validation | Kubernetes API server | Structure, field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the CRDs are modified. |
149+
| Re-run of webhook validation | Go code | NGF control plane | Field values | Errors are reported as Event for the resource. | No |
122150
| NGF-specific validation | Go code | NGF control plane | Field values | Errors are reported in the status of a resource after its creation/modification. | No |
123151

124152

@@ -128,6 +156,7 @@ following methods in order of their appearance in the table.
128156
|------------------------------|---------|-----------------------|-------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
129157
| CRD validation | OpenAPI | Kubernetes API server | Structure, field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the CRDs are modified. |
130158
| Webhook validation | Go code | Gateway API webhook | Field values | Kubernetes API server returns any errors a response for an API call. | Yes, if the webhook is not installed, misconfigured, or running a different version. |
159+
| Re-run of webhook validation | Go code | NGF control plane | Field values | Errors are reported as Event for the resource. | No |
131160
| NGF-specific validation | Go code | NGF control plane | Field values | Errors are reported in the status of a resource after its creation/modification. | No |
132161

133162

@@ -153,6 +182,14 @@ We will not introduce any NGF webhook in the cluster (it adds operational comple
153182
source of potential downtime -- a webhook failure disables CRUD operations on the relevant resources) unless we find
154183
good reasons for that.
155184

185+
### Upgrades
186+
187+
Since NGF will use the validation package from the Gateway API project, when a new release happens, we will need to
188+
upgrade the dependency and release a new version of NGF, provided that the validation code changed. However, if it did
189+
not change, we do not need to release a new version. Note: other things from a new Gateway API release might prompt us
190+
to release a new version like supporting a new field. See also
191+
[GEP-922](https://gateway-api.sigs.k8s.io/geps/gep-922/#).
192+
156193
### Reliability
157194

158195
NGF processes two kinds of transactions:

docs/resource-validation.md

+33-4
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,15 @@ A Gateway API resource (a new resource or an update for the existing one) is val
1919

2020
1. OpenAPI schema validation by the Kubernetes API server.
2121
2. CEL validation by the Kubernetes API server.
22-
3. Validation by NGF.
22+
3. Webhook validation by NGF.
23+
4. Validation by NGF.
2324

2425
### For Kubernetes 1.23 and 1.24
2526

2627
1. OpenAPI schema validation by the Kubernetes API server.
2728
2. Webhook validation by the Gateway API webhook.
28-
3. Validation by NGF.
29+
3. Webhook validation by NGF.
30+
4. Validation by NGF.
2931

3032
To confirm that a resource is valid and accepted by NGF, check that the `Accepted` condition in the resource status
3133
has the Status field set to `True`. For example, in a status of a valid HTTPRoute, if NGF accepts a parentRef,
@@ -69,7 +71,7 @@ The HTTPRoute "coffee" is invalid: spec.hostnames[0]: Invalid value: "cafe.!@#$%
6971
```
7072

7173
> While unlikely, bypassing this validation step is possible if the Gateway API CRDs are modified to remove the validation.
72-
> If this happens, Step 3 will reject any invalid values (from NGINX perspective).
74+
> If this happens, Step 4 will reject any invalid values (from NGINX perspective).
7375
7476
### Step 2 - For Kubernetes 1.25+ - CEL Validation by Kubernetes API Server
7577

@@ -104,7 +106,34 @@ kubectl apply -f some-gateway.yaml
104106
Error from server: error when creating "some-gateway.yaml": admission webhook "validate.gateway.networking.k8s.io" denied the request: spec.listeners[1].hostname: Forbidden: should be empty for protocol TCP
105107
```
106108

107-
### Step 3 - Validation by NGF
109+
> Bypassing this validation step is possible if the webhook is not running in the cluster.
110+
> If this happens, Step 3 will reject the invalid values.
111+
112+
### Step 3 - Webhook validation by NGF
113+
To ensure that the resources are validated with the webhook validation rules, even if the webhook is not running,
114+
NGF performs the same validation. However, NGF performs the validation *after* the Kubernetes API server accepts
115+
the resource.
116+
117+
Below is an example of how NGF rejects an invalid resource (a Gateway resource with a TCP listener that configures a
118+
hostname) with a Kubernetes event:
119+
120+
```shell
121+
kubectl describe gateway some-gateway
122+
```
123+
124+
```text
125+
. . .
126+
Events:
127+
Type Reason Age From Message
128+
---- ------ ---- ---- -------
129+
Warning Rejected 6s nginx-gateway-fabric-nginx the resource failed webhook validation, however the Gateway API webhook failed to reject it with the error; make sure the webhook is installed and running correctly; validation error: spec.listeners[1].hostname: Forbidden: should be empty for protocol TCP; NGF will delete any existing NGINX configuration that corresponds to the resource
130+
```
131+
132+
> This validation step always runs and cannot be bypassed.
133+
> NGF will ignore any resources that fail the webhook validation, like in the example above.
134+
> If the resource previously existed, NGF will remove any existing NGINX configuration for that resource.
135+
136+
### Step 4 - Validation by NGF
108137

109138
This step catches the following cases of invalid values:
110139

0 commit comments

Comments
 (0)