Skip to content

Commit 5f19a33

Browse files
committed
ObservabilityPolicy Enhancement Proposal
Problem: We want a design for Observability-related configuration settings, such as tracing, to be applied at the HTTPRoute level. Solution: Add enhancement proposal introducing `ObservabilityPolicy`.
1 parent 03e24fe commit 5f19a33

File tree

1 file changed

+311
-0
lines changed

1 file changed

+311
-0
lines changed

docs/proposals/observability.md

+311
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Enhancement Proposal-1778: Observability Policy
2+
3+
- Issue: https://github.com/nginxinc/nginx-gateway-fabric/issues/1778
4+
- Status: Implementable
5+
6+
## Summary
7+
8+
This Enhancement Proposal introduces the `ObservabilityPolicy` API, which allows Application Developers to define settings related to tracing, metrics, or logging at the HTTPRoute level.
9+
10+
## Goals
11+
12+
- Define the Observability policy.
13+
- Define an API for the Observability policy.
14+
15+
## Non-Goals
16+
17+
- Provide implementation details for implementing the Observability policy.
18+
19+
## Introduction
20+
21+
### Observability Policy
22+
23+
The Observability Policy contains settings to configure NGINX to expose information through tracing, metrics, and/or logging. This is a Direct Policy that is attached to an HTTPRoute or HTTPRoute Rule by an Application Developer. It works in conjunction with a [Gateway Settings](gateway-settings.md) configuration that contains higher level settings to enable Observability at this lower level. The [Gateway Settings](gateway-settings.md) configuration is managed by a Cluster Operator.
24+
25+
Since this policy is attached to an HTTPRoute or HTTPRoute rule, the Observability settings should just apply to the relevant `location` contexts of the NGINX config for that route or rule.
26+
27+
To begin, the Observability Policy will include the following NGINX directives (focusing on OpenTelemetry tracing):
28+
29+
- [`otel_trace`](https://nginx.org/en/docs/ngx_otel_module.html#otel_trace): enable tracing and set sampler rate
30+
- [`otel_trace_context`](https://nginx.org/en/docs/ngx_otel_module.html#otel_trace_context): export, inject, propagate, ignore.
31+
- [`otel_span_name`](https://nginx.org/en/docs/ngx_otel_module.html#otel_span_name)
32+
- [`otel_span_attr`](https://nginx.org/en/docs/ngx_otel_module.html#otel_span_attr)
33+
34+
Tracing will be disabled by default. The Application Developer will be able to use this Policy to enable and configure tracing for their routes. This Policy will only be applied if the OTel endpoint has been set by the Cluster Operator on the [Gateway Settings](gateway-settings.md).
35+
36+
Ratio and parent-based tracing should be supported as shown in the [nginx-otel examples](https://github.com/nginxinc/nginx-otel?tab=readme-ov-file#examples).
37+
38+
In the future, this config will be extended to support other functionality, such as those defined in the [NGINX Extensions Proposal](nginx-extensions.md#observability).
39+
40+
## API, Customer Driven Interfaces, and User Experience
41+
42+
The `ObservabilityPolicy` API is a CRD that is a part of the `gateway.nginx.org` Group. It is a namespaced resource that will reference an HTTPRoute or HTTPRoute Rule as its target.
43+
44+
### Go
45+
46+
Below is the Golang API for the `ObservabilityPolicy` API:
47+
48+
```go
49+
package v1alpha1
50+
51+
import (
52+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
53+
gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
54+
)
55+
56+
type ObservabilityPolicy struct {
57+
metav1.TypeMeta `json:",inline"`
58+
metav1.ObjectMeta `json:"metadata,omitempty"`
59+
60+
// Spec defines the desired state of the ObservabilityPolicy.
61+
Spec ObservabilityPolicySpec `json:"spec"`
62+
63+
// Status defines the state of the ObservabilityPolicy.
64+
Status gatewayv1alpha2.PolicyStatus `json:"status,omitempty"`
65+
}
66+
67+
type ObservabilityPolicySpec struct {
68+
// TargetRef identifies an API object to apply the policy to.
69+
// Object must be in the same namespace as the policy.
70+
// Support: HTTPRoute and HTTPRoute rule
71+
TargetRef gatewayv1alpha2.PolicyTargetReferenceWithSectionName `json:"targetRef"`
72+
73+
// Tracing allows for enabling and configuring tracing.
74+
//
75+
// +optional
76+
Tracing *Tracing `json:"tracing,omitempty"`
77+
}
78+
79+
// Tracing allows for enabling and configuring OpenTelemetry tracing.
80+
type Tracing struct {
81+
// Ratio is the percentage of traffic that should be sampled. Integer from 0 to 100.
82+
// By default, 100% of http requests are traced. Not applicable for parent-based tracing.
83+
//
84+
// +optional
85+
Ratio *int32 `json:"ratio,omitempty"`
86+
87+
// Context specifies how to propagate traceparent/tracestate headers. By default is 'ignore'.
88+
//
89+
// +optional
90+
Context *TraceContext `json:"context,omitempty"`
91+
92+
// SpanName defines the name of the Otel span. By default is the name of the location for a request.
93+
//
94+
// +optional
95+
SpanName *string `json:"spanName,omitempty"`
96+
97+
// SpanAttributes are custom key/value attributes that are added to each span.
98+
//
99+
// +optional
100+
SpanAttributes map[string]string `json:"spanAttributes,omitempty"`
101+
102+
// Enable defines if tracing is enabled, disabled, or parent-based.
103+
Enable TraceType `json:"enable"`
104+
}
105+
106+
// TraceType defines if tracing is enabled.
107+
type TraceType string
108+
109+
const (
110+
// TraceTypeOn enables tracing.
111+
TraceTypeOn TraceType = "on"
112+
113+
// TraceTypeOff disables tracing.
114+
TraceTypeOff TraceType = "off"
115+
116+
// TraceTypeParent enables tracing and only records spans if the parent span was sampled.
117+
TraceTypeParent TraceType = "parent"
118+
)
119+
120+
// TraceContext specifies how to propagate traceparent/tracestate headers.
121+
type TraceContext string
122+
123+
const (
124+
// TraceContextExtract uses an existing trace context from the request, so that the identifiers
125+
// of a trace and the parent span are inherited from the incoming request.
126+
TraceContextExtract TraceContext = "extract"
127+
128+
// TraceContextInject adds a new context to the request, overwriting existing headers, if any.
129+
TraceContextInject TraceContext = "inject"
130+
131+
// TraceContextPropagate updates the existing context (combines extract and inject).
132+
TraceContextPropagate TraceContext = "propagate"
133+
134+
// TraceContextIgnore skips context headers processing.
135+
TraceContextIgnore TraceContext = "ignore"
136+
)
137+
```
138+
139+
### YAML
140+
141+
Below is an example YAML version of an `ObservabilityPolicy`:
142+
143+
```yaml
144+
apiVersion: gateway.nginx.org/v1alpha1
145+
kind: ObservabilityPolicy
146+
metadata:
147+
name: example-observability-policy
148+
namespace: default
149+
spec:
150+
targetRef:
151+
group: gateway.networking.k8s.io
152+
kind: HTTPRoute
153+
name: example-route
154+
sectionName: example-section
155+
tracing:
156+
ratio: 10
157+
context: inject
158+
spanName: example-span
159+
spanAttributes:
160+
attribute1: value1
161+
attribute2: value2
162+
enable: on
163+
status:
164+
ancestors:
165+
ancestorRef:
166+
group: gateway.networking.k8s.io
167+
kind: Gateway
168+
name: example-gateway
169+
namespace: default
170+
conditions:
171+
- type: Accepted
172+
status: "True"
173+
reason: Accepted
174+
message: Policy is accepted
175+
```
176+
177+
and the HTTPRoute it is attached to:
178+
179+
```yaml
180+
apiVersion: gateway.networking.k8s.io/v1
181+
kind: HTTPRoute
182+
metadata:
183+
name: example-route
184+
spec:
185+
gatewayClassName: nginx
186+
listeners:
187+
- name: example-section
188+
port: 80
189+
protocol: HTTP
190+
hostname: "*.example.com"
191+
status:
192+
conditions:
193+
...
194+
- type: gateway.nginx.org/ObservabilityPolicyAffected # new condition
195+
status: "True"
196+
reason: PolicyAffected
197+
message: Object affected by an ObservabilityPolicy.
198+
```
199+
200+
### Status
201+
202+
#### CRD Label
203+
204+
According to the [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/), the `ObservabilityPolicy` CRD must have the `gateway.networking.k8s.io/policy: direct` label to specify that it is a direct policy.
205+
This label will help with discoverability and will be used by the planned Gateway API Policy [kubectl plugin](https://gateway-api.sigs.k8s.io/geps/gep-713/#kubectl-plugin-or-command-line-tool).
206+
207+
#### Conditions/Policy Ancestor Status
208+
209+
According to the [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/), the `ObservabilityPolicy` CRD must include a `status` stanza with a slice of Conditions.
210+
211+
The `Accepted` Condition must be populated on the `ObservabilityPolicy` CRD using the reasons defined in the [PolicyCondition API](https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1alpha2/policy_types.go). If these reasons are not sufficient, we can add implementation-specific reasons.
212+
213+
The Condition stanza may need to be namespaced using the `controllerName` if more than one controller could reconcile the Policy.
214+
215+
In the updated version of the [Policy and Metaresources GEP](https://github.com/kubernetes-sigs/gateway-api/pull/2813/files), which is still under review, the `PolicyAncestorStatus` applies to Direct Policies.
216+
[`PolicyAncestorStatus`](https://github.com/kubernetes-sigs/gateway-api/blob/f1758d1bc233d78a3e1e6cfba34336526655d03d/apis/v1alpha2/policy_types.go#L156) contains a list of ancestor resources (usually Gateways) that are associated with the policy, and the status of the policy for each ancestor.
217+
This status provides a view of the resources the policy is affecting. It is beneficial for policies implemented by multiple controllers (e.g., BackendTLSPolicy) or that attach to resources with different capabilities.
218+
219+
#### Setting Status on Objects Affected by a Policy
220+
221+
In the Policy and Metaresources GEP, there's a provisional status described [here](https://gateway-api.sigs.k8s.io/geps/gep-713/#standard-status-condition-on-policy-affected-objects) that involves adding a Condition or annotation to all objects affected by a Policy.
222+
223+
This solution gives the object owners some knowledge that their object is affected by a policy but minimizes status updates by limiting them to when the affected object starts or stops being affected by a policy.
224+
Even though this status is provisional, implementing it now will help with discoverability and allow us to give feedback on the solution.
225+
226+
Implementing this involves defining a new Condition type and reason:
227+
228+
```go
229+
package conditions
230+
231+
import (
232+
gatewayv1alpha2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
233+
)
234+
235+
236+
const (
237+
ObservabilityPolicyAffected gatewayv1alpha2.PolicyConditionType = "gateway.nginx.org/ObservabilityPolicyAffected"
238+
PolicyAffectedReason gatewayv1alpha2.PolicyConditionReason = "PolicyAffected"
239+
)
240+
241+
```
242+
243+
NGINX Gateway Fabric must set this Condition on all HTTPRoutes affected by an `ObservabilityPolicy`.
244+
Below is an example of what this Condition may look like:
245+
246+
```yaml
247+
Conditions:
248+
Type: gateway.nginx.org/ObservabilityPolicyAffected
249+
Message: Object affected by a ObservabilityPolicy.
250+
Observed Generation: 1
251+
Reason: PolicyAffected
252+
Status: True
253+
```
254+
255+
Some additional rules:
256+
257+
- This Condition should be added when the affected object starts being affected by a `ObservabilityPolicy`.
258+
- When the last `ObservabilityPolicy` affecting that object is removed, the Condition should be removed.
259+
- The Observed Generation is the generation of the affected object, not the generation of the `ObservabilityPolicy`.
260+
261+
## Attachment
262+
263+
An `ObservabilityPolicy` can be attached to an HTTPRoute or an HTTPRoute rule (using a [sectionName](https://gateway-api.sigs.k8s.io/geps/gep-713/#apply-policies-to-sections-of-a-resource)).
264+
265+
The policy will only take effect if a [GatewaySettings](gateway-settings.md) configuration has been linked to the GatewayClass.
266+
267+
### Creating the Effective Policy in NGINX Config
268+
269+
To determine how to reliably and consistently create the effective policy in NGINX config, we need to apply the policies for each attachment scenario to the three NGINX mappings described [here](/docs/developer/mapping.md).
270+
271+
The following examples use the `ClientSettingsPolicy`, but the rules are the same for the `ObservabilityPolicy`.
272+
273+
A. Distinct Hostname:
274+
![example-a2](/docs/images/client-settings/example-a2.png)
275+
276+
B. Same Hostname:
277+
![example-b2](/docs/images/client-settings/example-b2.png)
278+
279+
C. Internal Redirect
280+
![example-c2](/docs/images/client-settings/example-c2.png)
281+
282+
For this attachment scenario, specifying the directives in the _final_ location blocks generated from the HTTPRoute with the policy attached achieves the effective policy. _Final_ means the location that ultimately handles the request.
283+
284+
## Use Cases
285+
286+
- As an Application Developer, I want to enable observability -- such as tracing -- for traffic flowing to my application, so I can easily debug issues or understand the use of my application.
287+
288+
## Testing
289+
290+
- Unit tests
291+
- Functional tests that verify the attachment of the CRD to a Route or Route rule, and that NGINX behaves properly based on the configuration. This includes verifying tracing works as expected.
292+
293+
## Security Considerations
294+
295+
Validating all fields in the `ObservabilityPolicy` is critical to ensuring that the NGINX config generated by NGINX Gateway Fabric is correct and secure.
296+
297+
All fields in the `ObservabilityPolicy` will be validated with Open API Schema. If the Open API Schema validation rules are not sufficient, we will use [CEL](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-rules).
298+
299+
RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing Gateway Settings.
300+
301+
## Alternatives
302+
303+
- Combine with OTel settings in Gateway Settings for one OTel Policy: Rather than splitting tracing across two Policies, we could create a single tracing Policy. The issue with this approach is that some tracing settings -- such as exporter endpoint -- should be restricted to Cluster Operators, while settings like attributes should be available to Application Developers. If we combine these settings, RBAC will not be sufficient to restrict access across the settings. We will have to disallow certain fields based on the resource the Policy is attached to. This is a bad user experience.
304+
- Inherited Policy: An Inherited Policy would be useful if there is a use case for the Cluster Operator enforcing or defaulting the OTel tracing settings included in this policy.
305+
306+
307+
## References
308+
309+
- [NGINX Extensions Enhancement Proposal](nginx-extensions.md)
310+
- [Policy and Metaresources GEP](https://gateway-api.sigs.k8s.io/geps/gep-713/)
311+
- [Kubernetes API Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md)

0 commit comments

Comments
 (0)